• Open

    A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning. (arXiv:2209.15634v1 [cs.LG])
    With the increasing need for handling large state and action spaces, general function approximation has become a key technique in reinforcement learning (RL). In this paper, we propose a general framework that unifies model-based and model-free RL, and an Admissible Bellman Characterization (ABC) class that subsumes nearly all Markov Decision Process (MDP) models in the literature for tractable RL. We propose a novel estimation function with decomposable structural properties for optimization-based exploration and the functional eluder dimension as a complexity measure of the ABC class. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed, achieving regret bounds that match or improve over the best-known results for a variety of MDP models. In particular, for MDPs with low Witness rank, under a slightly stronger assumption, OPERA improves the state-of-the-art sample complexity results by a factor of $dH$. Our framework provides a generic interface to design and analyze new RL models and algorithms.  ( 2 min )
    Mixture of experts models for multilevel data: modelling framework and approximation theory. (arXiv:2209.15207v1 [math.ST])
    Multilevel data are prevalent in many real-world applications. However, it remains an open research problem to identify and justify a class of models that flexibly capture a wide range of multilevel data. Motivated by the versatility of the mixture of experts (MoE) models in fitting regression data, in this article we extend upon the MoE and study a class of mixed MoE (MMoE) models for multilevel data. Under some regularity conditions, we prove that the MMoE is dense in the space of any continuous mixed effects models in the sense of weak convergence. As a result, the MMoE has a potential to accurately resemble almost all characteristics inherited in multilevel data, including the marginal distributions, dependence structures, regression links, random intercepts and random slopes. In a particular case where the multilevel data is hierarchical, we further show that a nested version of the MMoE universally approximates a broad range of dependence structures of the random effects among different factor levels.  ( 2 min )
    Improve learning combining crowdsourced labels by weighting Areas Under the Margin. (arXiv:2209.15380v1 [cs.LG])
    In supervised learning -- for instance in image classification -- modern massive datasets are commonly labeled by a crowd of workers. The obtained labels in this crowdsourcing setting are then aggregated for training. The aggregation step generally leverages a per worker trust score. Yet, such worker-centric approaches discard each task ambiguity. Some intrinsically ambiguous tasks might even fool expert workers, which could eventually be harmful for the learning step. In a standard supervised learning setting -- with one label per task and balanced classes -- the Area Under the Margin (AUM) statistic is tailored to identify mislabeled data. We adapt the AUM to identify ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted AUM (WAUM). The WAUM is an average of AUMs weighted by worker and task dependent scores. We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization or calibration performance. We report improvements with respect to feature-blind aggregation strategies both for simulated settings and for the CIFAR-10H crowdsourced dataset.  ( 2 min )
    Evaluation of importance estimators in deep learning classifiers for Computed Tomography. (arXiv:2209.15398v1 [cs.CV])
    Deep learning has shown superb performance in detecting objects and classifying images, ensuring a great promise for analyzing medical imaging. Translating the success of deep learning to medical imaging, in which doctors need to understand the underlying process, requires the capability to interpret and explain the prediction of neural networks. Interpretability of deep neural networks often relies on estimating the importance of input features (e.g., pixels) with respect to the outcome (e.g., class probability). However, a number of importance estimators (also known as saliency maps) have been developed and it is unclear which ones are more relevant for medical imaging applications. In the present work, we investigated the performance of several importance estimators in explaining the classification of computed tomography (CT) images by a convolutional deep network, using three distinct evaluation metrics. First, the model-centric fidelity measures a decrease in the model accuracy when certain inputs are perturbed. Second, concordance between importance scores and the expert-defined segmentation masks is measured on a pixel level by a receiver operating characteristic (ROC) curves. Third, we measure a region-wise overlap between a XRAI-based map and the segmentation mask by Dice Similarity Coefficients (DSC). Overall, two versions of SmoothGrad topped the fidelity and ROC rankings, whereas both Integrated Gradients and SmoothGrad excelled in DSC evaluation. Interestingly, there was a critical discrepancy between model-centric (fidelity) and human-centric (ROC and DSC) evaluation. Expert expectation and intuition embedded in segmentation maps does not necessarily align with how the model arrived at its prediction. Understanding this difference in interpretability would help harnessing the power of deep learning in medicine.  ( 3 min )
    Fault Prognosis in Particle Accelerator Power Electronics Using Ensemble Learning. (arXiv:2209.15570v1 [physics.acc-ph])
    Early fault detection and fault prognosis are crucial to ensure efficient and safe operations of complex engineering systems such as the Spallation Neutron Source (SNS) and its power electronics (high voltage converter modulators). Following an advanced experimental facility setup that mimics SNS operating conditions, the authors successfully conducted 21 fault prognosis experiments, where fault precursors are introduced in the system to a degree enough to cause degradation in the waveform signals, but not enough to reach a real fault. Nine different machine learning techniques based on ensemble trees, convolutional neural networks, support vector machines, and hierarchical voting ensembles are proposed to detect the fault precursors. Although all 9 models have shown a perfect and identical performance during the training and testing phase, the performance of most models has decreased in the prognosis phase once they got exposed to real-world data from the 21 experiments. The hierarchical voting ensemble, which features multiple layers of diverse models, maintains a distinguished performance in early detection of the fault precursors with 95% success rate (20/21 tests), followed by adaboost and extremely randomized trees with 52% and 48% success rates, respectively. The support vector machine models were the worst with only 24% success rate (5/21 tests). The study concluded that a successful implementation of machine learning in the SNS or particle accelerator power systems would require a major upgrade in the controller and the data acquisition system to facilitate streaming and handling big data for the machine learning models. In addition, this study shows that the best performing models were diverse and based on the ensemble concept to reduce the bias and hyperparameter sensitivity of individual models.  ( 3 min )
    Using Knowledge Distillation to improve interpretable models in a retail banking context. (arXiv:2209.15496v1 [cs.LG])
    This article sets forth a review of knowledge distillation techniques with a focus on their applicability to retail banking contexts. Predictive machine learning algorithms used in banking environments, especially in risk and control functions, are generally subject to regulatory and technical constraints limiting their complexity. Knowledge distillation gives the opportunity to improve the performances of simple models without burdening their application, using the results of other - generally more complex and better-performing - models. Parsing recent advances in this field, we highlight three main approaches: Soft Targets, Sample Selection and Data Augmentation. We assess the relevance of a subset of such techniques by applying them to open source datasets, before putting them to the test on the use cases of BPCE, a major French institution in the retail banking sector. As such, we demonstrate the potential of knowledge distillation to improve the performance of these models without altering their form and simplicity.  ( 2 min )
    Spikformer: When Spiking Neural Network Meets Transformer. (arXiv:2209.15425v1 [cs.NE])
    We consider two biologically plausible structures, the Spiking Neural Network (SNN) and the self-attention mechanism. The former offers an energy-efficient and event-driven paradigm for deep learning, while the latter has the ability to capture feature dependencies, enabling Transformer to achieve good performance. It is intuitively promising to explore the marriage between them. In this paper, we consider leveraging both self-attention capability and biological properties of SNNs, and propose a novel Spiking Self Attention (SSA) as well as a powerful framework, named Spiking Transformer (Spikformer). The SSA mechanism in Spikformer models the sparse visual feature by using spike-form Query, Key, and Value without softmax. Since its computation is sparse and avoids multiplication, SSA is efficient and has low computational energy consumption. It is shown that Spikformer with SSA can outperform the state-of-the-art SNNs-like frameworks in image classification on both neuromorphic and static datasets. Spikformer (66.3M parameters) with comparable size to SEW-ResNet-152 (60.2M,69.26%) can achieve 74.81% top1 accuracy on ImageNet using 4 time steps, which is the state-of-the-art in directly trained SNNs models.  ( 2 min )
    End-to-End Label Uncertainty Modeling in Speech Emotion Recognition using Bayesian Neural Networks and Label Distribution Learning. (arXiv:2209.15449v1 [eess.AS])
    To train machine learning algorithms to predict emotional expressions in terms of arousal and valence, annotated datasets are needed. However, as different people perceive others' emotional expressions differently, their annotations are per se subjective. For this, annotations are typically collected from multiple annotators and averaged to obtain ground-truth labels. However, when exclusively trained on this averaged ground-truth, the trained network is agnostic to the inherent subjectivity in emotional expressions. In this work, we therefore propose an end-to-end Bayesian neural network capable of being trained on a distribution of labels to also capture the subjectivity-based label uncertainty. Instead of a Gaussian, we model the label distribution using Student's t-distribution, which also accounts for the number of annotations. We derive the corresponding Kullback-Leibler divergence loss and use it to train an estimator for the distribution of labels, from which the mean and uncertainty can be inferred. We validate the proposed method using two in-the-wild datasets. We show that the proposed t-distribution based approach achieves state-of-the-art uncertainty modeling results in speech emotion recognition, and also consistent results in cross-corpora evaluations. Furthermore, analyses reveal that the advantage of a t-distribution over a Gaussian grows with increasing inter-annotator correlation and a decreasing number of annotators.  ( 3 min )
    Energy Efficient Hardware Acceleration of Neural Networks with Power-of-Two Quantisation. (arXiv:2209.15257v1 [cs.CV])
    Deep neural networks virtually dominate the domain of most modern vision systems, providing high performance at a cost of increased computational complexity.Since for those systems it is often required to operate both in real-time and with minimal energy consumption (e.g., for wearable devices or autonomous vehicles, edge Internet of Things (IoT), sensor networks), various network optimisation techniques are used, e.g., quantisation, pruning, or dedicated lightweight architectures. Due to the logarithmic distribution of weights in neural network layers, a method providing high performance with significant reduction in computational precision (for 4-bit weights and less) is the Power-of-Two (PoT) quantisation (and therefore also with a logarithmic distribution). This method introduces additional possibilities of replacing the typical for neural networks Multiply and ACcumulate (MAC -- performing, e.g., convolution operations) units, with more energy-efficient Bitshift and ACcumulate (BAC). In this paper, we show that a hardware neural network accelerator with PoT weights implemented on the Zynq UltraScale + MPSoC ZCU104 SoC FPGA can be at least $1.4x$ more energy efficient than the uniform quantisation version. To further reduce the actual power requirement by omitting part of the computation for zero weights, we also propose a new pruning method adapted to logarithmic quantisation.  ( 3 min )
    PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs. (arXiv:2203.10304v2 [cs.LG] UPDATED)
    Optimization of directed acyclic graph (DAG) structures has many applications, such as neural architecture search (NAS) and probabilistic graphical model learning. Encoding DAGs into real vectors is a dominant component in most neural-network-based DAG optimization frameworks. Currently, most DAG encoders use an asynchronous message passing scheme which sequentially processes nodes according to the dependency between nodes in a DAG. That is, a node must not be processed until all its predecessors are processed. As a result, they are inherently not parallelizable. In this work, we propose a Parallelizable Attention-based Computation structure Encoder (PACE) that processes nodes simultaneously and encodes DAGs in parallel. We demonstrate the superiority of PACE through encoder-dependent optimization subroutines that search the optimal DAG structure based on the learned DAG embeddings. Experiments show that PACE not only improves the effectiveness over previous sequential DAG encoders with a significantly boosted training and inference speed, but also generates smooth latent (DAG encoding) spaces that are beneficial to downstream optimization subroutines. Our source code is available at \url{https://github.com/zehao-dong/PACE}  ( 2 min )
    Diffusion-based Image Translation using Disentangled Style and Content Representation. (arXiv:2209.15264v1 [cs.CV])
    Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer which is not limited to the specific domains. Unfortunately, due to the stochastic nature of diffusion models, it is often difficult to maintain the original content of the image during the reverse diffusion. To address this, here we present a novel diffusion-based unsupervised image translation method using disentangled style and content representation. Specifically, inspired by the splicing Vision Transformer, we extract intermediate keys of multihead self attention layer from ViT model and used them as the content preservation loss. Then, an image guided style transfer is performed by matching the [CLS] classification token from the denoised samples and target image, whereas additional CLIP loss is used for the text-driven style transfer. To further accelerate the semantic change during the reverse diffusion, we also propose a novel semantic divergence loss and resampling strategy. Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks.  ( 2 min )
    Transfer Learning with Pre-trained Conditional Generative Models. (arXiv:2204.12833v2 [cs.LG] UPDATED)
    Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods always assume at least one of (i) source and target task label spaces overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, holding these assumptions is difficult in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to storage costs and privacy, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with an artificial dataset synthesized by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform the baselines of scratch training and knowledge distillation.  ( 2 min )
    A transformer-based model for default prediction in mid-cap corporate markets. (arXiv:2111.09902v2 [q-fin.GN] UPDATED)
    In this paper, we study mid-cap companies, i.e. publicly traded companies with less than US $10 billion in market capitalisation. Using a large dataset of US mid-cap companies observed over 30 years, we look to predict the default probability term structure over the medium term and understand which data sources (i.e. fundamental, market or pricing data) contribute most to the default risk. Whereas existing methods typically require that data from different time periods are first aggregated and turned into cross-sectional features, we frame the problem as a multi-label time-series classification problem. We adapt transformer models, a state-of-the-art deep learning model emanating from the natural language processing domain, to the credit risk modelling setting. We also interpret the predictions of these models using attention heat maps. To optimise the model further, we present a custom loss function for multi-label classification and a novel multi-channel architecture with differential training that gives the model the ability to use all input data efficiently. Our results show the proposed deep learning architecture's superior performance, resulting in a 13% improvement in AUC (Area Under the receiver operating characteristic Curve) over traditional models. We also demonstrate how to produce an importance ranking for the different data sources and the temporal relationships using a Shapley approach specific to these models.  ( 3 min )
    Finding NEEMo: Geometric Fitting using Neural Estimation of the Energy Mover's Distance. (arXiv:2209.15624v1 [stat.ML])
    A novel neural architecture was recently developed that enforces an exact upper bound on the Lipschitz constant of the model by constraining the norm of its weights in a minimal way, resulting in higher expressiveness compared to other techniques. We present a new and interesting direction for this architecture: estimation of the Wasserstein metric (Earth Mover's Distance) in optimal transport by employing the Kantorovich-Rubinstein duality to enable its use in geometric fitting applications. Specifically, we focus on the field of high-energy particle physics, where it has been shown that a metric for the space of particle-collider events can be defined based on the Wasserstein metric, referred to as the Energy Mover's Distance (EMD). This metrization has the potential to revolutionize data-driven collider phenomenology. The work presented here represents a major step towards realizing this goal by providing a differentiable way of directly calculating the EMD. We show how the flexibility that our approach enables can be used to develop novel clustering algorithms.  ( 2 min )
    The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation. (arXiv:2206.06487v2 [cs.CV] UPDATED)
    Crossmodal knowledge distillation (KD) extends traditional knowledge distillation to the area of multimodal learning and demonstrates great success in various applications. To achieve knowledge transfer across modalities, a pretrained network from one modality is adopted as the teacher to provide supervision signals to a student network learning from another modality. In contrast to the empirical success reported in prior works, the working mechanism of crossmodal KD remains a mystery. In this paper, we present a thorough understanding of crossmodal KD. We begin with two case studies and demonstrate that KD is not a universal cure in crossmodal knowledge transfer. We then present the modality Venn diagram to understand modality relationships and the modality focusing hypothesis revealing the decisive factor in the efficacy of crossmodal KD. Experimental results on 6 multimodal datasets help justify our hypothesis, diagnose failure cases, and point directions to improve crossmodal knowledge transfer in the future.
    Holographic-(V)AE: an end-to-end SO(3)-Equivariant (Variational) Autoencoder in Fourier Space. (arXiv:2209.15567v1 [cs.LG])
    Group-equivariant neural networks have emerged as a data-efficient approach to solve classification and regression tasks, while respecting the relevant symmetries of the data. However, little work has been done to extend this paradigm to the unsupervised and generative domains. Here, we present Holographic-(V)AE (H-(V)AE), a fully end-to-end SO(3)-equivariant (variational) autoencoder in Fourier space, suitable for unsupervised learning and generation of data distributed around a specified origin. H-(V)AE is trained to reconstruct the spherical Fourier encoding of data, learning in the process a latent space with a maximally informative invariant embedding alongside an equivariant frame describing the orientation of the data. We extensively test the performance of H-(V)AE on diverse datasets and show that its latent space efficiently encodes the categorical features of spherical images and structural features of protein atomic environments. Our work can further be seen as a case study for equivariant modeling of a data distribution by reconstructing its Fourier encoding.
    Identifying Weight-Variant Latent Causal Models. (arXiv:2208.14153v2 [cs.LG] UPDATED)
    The task of causal representation learning aims to uncover latent higher-level causal representations that affect lower-level observations. Identifying true latent causal representations from observed data, while allowing instantaneous causal relations among latent variables, remains a challenge, however. To this end, we start from the analysis of three intrinsic properties in identifying latent space from observations: transitivity, permutation indeterminacy, and scaling indeterminacy. We find that transitivity acts as a key role in impeding the identifiability of latent causal representations. To address the unidentifiable issue due to transitivity, we introduce a novel identifiability condition where the underlying latent causal model satisfies a linear-Gaussian model, in which the causal coefficients and the distribution of Gaussian noise are modulated by an additional observed variable. Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling. Furthermore, based on this theoretical result, we propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal representations and causal relationships among them, together with the mapping from the latent causal variables to the observed ones. We show that the proposed method learns the true parameters asymptotically. Experimental results on synthetic and real data demonstrate the identifiability and consistency results and the efficacy of the proposed method in learning latent causal representations.
    TinyTurbo: Efficient Turbo Decoders on Edge. (arXiv:2209.15614v1 [cs.IT])
    In this paper, we introduce a neural-augmented decoder for Turbo codes called TINYTURBO . TINYTURBO has complexity comparable to the classical max-log-MAP algorithm but has much better reliability than the max-log-MAP baseline and performs close to the MAP algorithm. We show that TINYTURBO exhibits strong robustness on a variety of practical channels of interest, such as EPA and EVA channels, which are included in the LTE standards. We also show that TINYTURBO strongly generalizes across different rate, blocklengths, and trellises. We verify the reliability and efficiency of TINYTURBO via over-the-air experiments.
    On the Subspace Structure of Gradient-Based Meta-Learning. (arXiv:2207.03804v2 [cs.LG] UPDATED)
    In this work we provide an analysis of the distribution of the post-adaptation parameters of Gradient-Based Meta-Learning (GBML) methods. Previous work has noticed how, for the case of image-classification, this adaptation only takes place on the last layers of the network. We propose the more general notion that parameters are updated over a low-dimensional \emph{subspace} of the same dimensionality as the task-space and show that this holds for regression as well. Furthermore, the induced subspace structure provides a method to estimate the intrinsic dimension of the space of tasks of common few-shot learning datasets.
    Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training. (arXiv:2208.06102v2 [cs.LG] UPDATED)
    Training deep neural networks (DNNs) is becoming increasingly more resource- and energy-intensive every year. Unfortunately, existing works primarily focus on optimizing DNN training for faster completion, often without considering the impact on energy efficiency. In this paper, we observe that common practices to improve training performance can often lead to inefficient energy usage. More importantly, we demonstrate that there is a tradeoff between energy consumption and performance optimization. To this end, we propose Zeus, an optimization framework to navigate this tradeoff by automatically finding optimal job- and GPU-level configurations for recurring DNN training jobs. Zeus uses an online exploration-exploitation approach in conjunction with just-in-time energy profiling, averting the need for expensive offline measurements, while adapting to data drifts over time. Our evaluation shows that Zeus can improve the energy efficiency of DNN training by 15.3%-75.8% for diverse workloads.
    Learning to Estimate Shapley Values with Vision Transformers. (arXiv:2206.05282v2 [cs.CV] UPDATED)
    Transformers have become a default architecture in computer vision, but understanding what drives their predictions remains a challenging problem. Current explanation approaches rely on attention values or input gradients, but these provide a limited understanding of a model's dependencies. Shapley values offer a theoretically sound alternative, but their computational cost makes them impractical for large, high-dimensional models. In this work, we aim to make Shapley values practical for vision transformers (ViTs). To do so, we first leverage an attention masking approach to evaluate ViTs with partial information, and we then develop a procedure for generating Shapley value explanations via a separate, learned explainer model. Our experiments compare Shapley values to many baseline methods (e.g., attention rollout, GradCAM, LRP), and we find that our approach provides more accurate explanations than existing methods for ViTs.
    Detecting Small Query Graphs in A Large Graph via Neural Subgraph Search. (arXiv:2207.10305v2 [cs.LG] UPDATED)
    Recent advances have shown the success of using reinforcement learning and search to solve NP-hard graph-related tasks, such as Traveling Salesman Optimization, Graph Edit Distance computation, etc. However, it remains unclear how one can efficiently and accurately detect the occurrences of a small query graph in a large target graph, which is a core operation in graph database search, biomedical analysis, social group finding, etc. This task is called Subgraph Matching which essentially performs subgraph isomorphism check between a query graph and a large target graph. One promising approach to this classical problem is the "learning-to-search" paradigm, where a reinforcement learning (RL) agent is designed with a learned policy to guide a search algorithm to quickly find the solution without any solved instances for supervision. However, for the specific task of Subgraph Matching, though the query graph is usually small given by the user as input, the target graph is often orders-of-magnitude larger. It poses challenges to the neural network design and can lead to solution and reward sparsity. In this paper, we propose NSUBS with two innovations to tackle the challenges: (1) A novel encoder-decoder neural network architecture to dynamically compute the matching information between the query and the target graphs at each search state; (2) A novel look-ahead loss function for training the policy network. Experiments on six large real-world target graphs show that NSUBS can significantly improve the subgraph matching performance.
    Scale-invariant Bayesian Neural Networks with Connectivity Tangent Kernel. (arXiv:2209.15208v1 [cs.LG])
    Explaining generalizations and preventing over-confident predictions are central goals of studies on the loss landscape of neural networks. Flatness, defined as loss invariability on perturbations of a pre-trained solution, is widely accepted as a predictor of generalization in this context. However, the problem that flatness and generalization bounds can be changed arbitrarily according to the scale of a parameter was pointed out, and previous studies partially solved the problem with restrictions: Counter-intuitively, their generalization bounds were still variant for the function-preserving parameter scaling transformation or limited only to an impractical network structure. As a more fundamental solution, we propose new prior and posterior distributions invariant to scaling transformations by \textit{decomposing} the scale and connectivity of parameters, thereby allowing the resulting generalization bound to describe the generalizability of a broad class of networks with the more practical class of transformations such as weight decay with batch normalization. We also show that the above issue adversely affects the uncertainty calibration of Laplace approximation and propose a solution using our invariant posterior. We empirically demonstrate our posterior provides effective flatness and calibration measures with low complexity in such a practical parameter transformation case, supporting its practical effectiveness in line with our rationale.
    Minimalistic Unsupervised Learning with the Sparse Manifold Transform. (arXiv:2209.15261v1 [cs.LG])
    We describe a minimalistic and interpretable method for unsupervised learning, without resorting to data augmentation, hyperparameter tuning, or other engineering designs, that achieves performance close to the SOTA SSL methods. Our approach leverages the sparse manifold transform, which unifies sparse coding, manifold learning, and slow feature analysis. With a one-layer deterministic sparse manifold transform, one can achieve 99.3% KNN top-1 accuracy on MNIST, 81.1% KNN top-1 accuracy on CIFAR-10 and 53.2% on CIFAR-100. With a simple gray-scale augmentation, the model gets 83.2% KNN top-1 accuracy on CIFAR-10 and 57% on CIFAR-100. These results significantly close the gap between simplistic ``white-box'' methods and the SOTA methods. Additionally, we provide visualization to explain how an unsupervised representation transform is formed. The proposed method is closely connected to latent-embedding self-supervised methods and can be treated as the simplest form of VICReg. Though there remains a small performance gap between our simple constructive model and SOTA methods, the evidence points to this as a promising direction for achieving a principled and white-box approach to unsupervised learning.
    Local Distance Preserving Auto-encoders using Continuous k-Nearest Neighbours Graphs. (arXiv:2206.05909v2 [cs.LG] UPDATED)
    Auto-encoder models that preserve similarities in the data are a popular tool in representation learning. In this paper we introduce several auto-encoder models that preserve local distances when mapping from the data space to the latent space. We use a local distance preserving loss that is based on the continuous k-nearest neighbours graph which is known to capture topological features at all scales simultaneously. To improve training performance, we formulate learning as a constraint optimisation problem with local distance preservation as the main objective and reconstruction accuracy as a constraint. We generalise this approach to hierarchical variational auto-encoders thus learning generative models with geometrically consistent latent and data spaces. Our method provides state-of-the-art performance across several standard datasets and evaluation metrics.
    PL-kNN: A Parameterless Nearest Neighbors Classifier. (arXiv:2209.12647v2 [cs.LG] UPDATED)
    Demands for minimum parameter setup in machine learning models are desirable to avoid time-consuming optimization processes. The $k$-Nearest Neighbors is one of the most effective and straightforward models employed in numerous problems. Despite its well-known performance, it requires the value of $k$ for specific data distribution, thus demanding expensive computational efforts. This paper proposes a $k$-Nearest Neighbors classifier that bypasses the need to define the value of $k$. The model computes the $k$ value adaptively considering the data distribution of the training set. We compared the proposed model against the standard $k$-Nearest Neighbors classifier and two parameterless versions from the literature. Experiments over 11 public datasets confirm the robustness of the proposed approach, for the obtained results were similar or even better than its counterpart versions.
    Approximate Conditional Coverage via Neural Model Approximations. (arXiv:2205.14310v2 [cs.LG] UPDATED)
    We propose a new approach for constructing prediction sets for Transformer networks via the strong signals for prediction reliability from KNN-based approximations. This enables a data-driven partitioning of the high-dimensional feature space and a new Inductive Venn Predictor for calibration, the Venn-ADMIT Predictor. Our approach more closely obtains approximate conditional coverage than recent work proposing adaptive and localized conformal score functions for deep networks. We analyze coverage on several representative natural language processing classification tasks, including class-imbalanced and distribution-shifted settings.
    Anomaly localization for copy detection patterns through print estimations. (arXiv:2209.15625v1 [cs.CV])
    Copy detection patterns (CDP) are recent technologies for protecting products from counterfeiting. However, in contrast to traditional copy fakes, deep learning-based fakes have shown to be hardly distinguishable from originals by traditional authentication systems. Systems based on classical supervised learning and digital templates assume knowledge of fake CDP at training time and cannot generalize to unseen types of fakes. Authentication based on printed copies of originals is an alternative that yields better results even for unseen fakes and simple authentication metrics but comes at the impractical cost of acquisition and storage of printed copies. In this work, to overcome these shortcomings, we design a machine learning (ML) based authentication system that only requires digital templates and printed original CDP for training, whereas authentication is based solely on digital templates, which are used to estimate original printed codes. The obtained results show that the proposed system can efficiently authenticate original and detect fake CDP by accurately locating the anomalies in the fake CDP. The empirical evaluation of the authentication system under investigation is performed on the original and ML-based fakes CDP printed on two industrial printers.
    Language Models Can Teach Themselves to Program Better. (arXiv:2207.14502v2 [cs.LG] UPDATED)
    Recent Language Models (LMs) achieve breakthrough performance in code generation when trained on human-authored problems, even solving some competitive-programming problems. Self-play has proven useful in games such as Go, and thus it is natural to ask whether LMs can generate their own instructive programming problems to improve their performance. We show that it is possible for an LM to synthesize programming problems and solutions, which are filtered for correctness by a Python interpreter. The LM's performance is then seen to improve when it is fine-tuned on its own synthetic problems and verified solutions; thus the model 'improves itself' using the Python interpreter. Problems are specified formally as programming puzzles [Schuster et al., 2021], a code-based problem format where solutions can easily be verified for correctness by execution. In experiments on publicly-available LMs, test accuracy more than doubles. This work demonstrates the potential for code LMs, with an interpreter, to generate instructive problems and improve their own performance.
    POETREE: Interpretable Policy Learning with Adaptive Decision Trees. (arXiv:2203.08057v2 [cs.LG] UPDATED)
    Building models of human decision-making from observed behaviour is critical to better understand, diagnose and support real-world policies such as clinical care. As established policy learning approaches remain focused on imitation performance, they fall short of explaining the demonstrated decision-making process. Policy Extraction through decision Trees (POETREE) is a novel framework for interpretable policy learning, compatible with fully-offline and partially-observable clinical decision environments -- and builds probabilistic tree policies determining physician actions based on patients' observations and medical history. Fully-differentiable tree architectures are grown incrementally during optimization to adapt their complexity to the modelling task, and learn a representation of patient history through recurrence, resulting in decision tree policies that adapt over time with patient information. This policy learning method outperforms the state-of-the-art on real and synthetic medical datasets, both in terms of understanding, quantifying and evaluating observed behaviour as well as in accurately replicating it -- with potential to improve future decision support systems.
    Static Hand Gesture Recognition for American Sign Language using Neuromorphic Hardware. (arXiv:2207.12559v2 [cs.LG] UPDATED)
    In this paper, we develop four spiking neural network (SNN) models for two static American Sign Language (ASL) hand gesture classification tasks, i.e., the ASL Alphabet and ASL Digits. The SNN models are deployed on Intel's neuromorphic platform, Loihi, and then compared against equivalent deep neural network (DNN) models deployed on an edge computing device, the Intel Neural Compute Stick 2 (NCS2). We perform a comprehensive comparison between the two systems in terms of accuracy, latency, power consumption, and energy. The best DNN model achieves an accuracy of 99.93% on the ASL Alphabet dataset, whereas the best performing SNN model has an accuracy of 99.30%. For the ASL-Digits dataset, the best DNN model achieves an accuracy of 99.76% accuracy while the SNN achieves 99.03%. Moreover, our obtained experimental results show that the Loihi neuromorphic hardware implementations achieve up to 20.64x and 4.10x reduction in power consumption and energy, respectively, when compared to NCS2.
    Evolutionary Deep Reinforcement Learning for Dynamic Slice Management in O-RAN. (arXiv:2208.14394v2 [eess.SY] UPDATED)
    The next-generation wireless networks are required to satisfy a variety of services and criteria concurrently. To address upcoming strict criteria, a new open radio access network (O-RAN) with distinguishing features such as flexible design, disaggregated virtual and programmable components, and intelligent closed-loop control was developed. O-RAN slicing is being investigated as a critical strategy for ensuring network quality of service (QoS) in the face of changing circumstances. However, distinct network slices must be dynamically controlled to avoid service level agreement (SLA) variation caused by rapid changes in the environment. Therefore, this paper introduces a novel framework able to manage the network slices through provisioned resources intelligently. Due to diverse heterogeneous environments, intelligent machine learning approaches require sufficient exploration to handle the harshest situations in a wireless network and accelerate convergence. To solve this problem, a new solution is proposed based on evolutionary-based deep reinforcement learning (EDRL) to accelerate and optimize the slice management learning process in the radio access network's (RAN) intelligent controller (RIC) modules. To this end, the O-RAN slicing is represented as a Markov decision process (MDP) which is then solved optimally for resource allocation to meet service demand using the EDRL approach. In terms of reaching service demands, simulation results show that the proposed approach outperforms the DRL baseline by 62.2%.
    Bayesian Neural Networks for Geothermal Resource Assessment: Prediction with Uncertainty. (arXiv:2209.15543v1 [physics.geo-ph])
    We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) and negative training sites (known drill sites with unsuitable geothermal conditions) and use these to constrain and optimize artificial neural networks for this classification task. The main objective is to predict the geothermal resource potential at unknown sites within a large geographic area where the defining features are known. These predictions could be used to target promising areas for further detailed investigations. We describe the evolution of our work from defining a specific neural network architecture to training and optimization trials. Upon analysis we expose the inevitable problems of model variability and resulting prediction uncertainty. Finally, to address these problems we apply the concept of Bayesian neural networks, a heuristic approach to regularization in network training, and make use of the practical interpretation of the formal uncertainty measures they provide.
    Reward Shaping for User Satisfaction in a REINFORCE Recommender. (arXiv:2209.15166v1 [cs.IR])
    How might we design Reinforcement Learning (RL)-based recommenders that encourage aligning user trajectories with the underlying user satisfaction? Three research questions are key: (1) measuring user satisfaction, (2) combatting sparsity of satisfaction signals, and (3) adapting the training of the recommender agent to maximize satisfaction. For measurement, it has been found that surveys explicitly asking users to rate their experience with consumed items can provide valuable orthogonal information to the engagement/interaction data, acting as a proxy to the underlying user satisfaction. For sparsity, i.e, only being able to observe how satisfied users are with a tiny fraction of user-item interactions, imputation models can be useful in predicting satisfaction level for all items users have consumed. For learning satisfying recommender policies, we postulate that reward shaping in RL recommender agents is powerful for driving satisfying user experiences. Putting everything together, we propose to jointly learn a policy network and a satisfaction imputation network: The role of the imputation network is to learn which actions are satisfying to the user; while the policy network, built on top of REINFORCE, decides which items to recommend, with the reward utilizing the imputed satisfaction. We use both offline analysis and live experiments in an industrial large-scale recommendation platform to demonstrate the promise of our approach for satisfying user experiences.
    DecisioNet: A Binary-Tree Structured Neural Network. (arXiv:2207.01127v4 [cs.CV] UPDATED)
    Deep neural networks (DNNs) and decision trees (DTs) are both state-of-the-art classifiers. DNNs perform well due to their representational learning capabilities, while DTs are computationally efficient as they perform inference along one route (root-to-leaf) that is dependent on the input data. In this paper, we present DecisioNet (DN), a binary-tree structured neural network. We propose a systematic way to convert an existing DNN into a DN to create a lightweight version of the original model. DecisioNet takes the best of both worlds - it uses neural modules to perform representational learning and utilizes its tree structure to perform only a portion of the computations. We evaluate various DN architectures, along with their corresponding baseline models on the FashionMNIST, CIFAR10, and CIFAR100 datasets. We show that the DN variants achieve similar accuracy while significantly reducing the computational cost of the original network.
    Amplitude Scintillation Forecasting Using Bagged Trees. (arXiv:2207.08745v2 [cs.LG] UPDATED)
    Electron density irregularities present within the ionosphere induce significant fluctuations in global navigation satellite system (GNSS) signals. Fluctuations in signal power are referred to as amplitude scintillation and can be monitored through the S4 index. Forecasting the severity of amplitude scintillation based on historical S4 index data is beneficial when real-time data is unavailable. In this work, we study the possibility of using historical data from a single GPS scintillation monitoring receiver to train a machine learning (ML) model to forecast the severity of amplitude scintillation, either weak, moderate, or severe, with respect to temporal and spatial parameters. Six different ML models were evaluated and the bagged trees model was the most accurate among them, achieving a forecasting accuracy of $81\%$ using a balanced dataset, and $97\%$ using an imbalanced dataset.
    Switching One-Versus-the-Rest Loss to Increase the Margin of Logits for Adversarial Robustness. (arXiv:2207.10283v2 [cs.LG] UPDATED)
    Adversarial training is a promising method to improve the robustness against adversarial attacks. To enhance its performance, recent methods impose high weights on the cross-entropy loss for important data points near the decision boundary. However, these importance-aware methods are vulnerable to sophisticated attacks, e.g., Auto-Attack. In this paper, we experimentally investigate the cause of their vulnerability via margins between logits for the true label and the other labels because they should be large enough to prevent the largest logit from being flipped by the attacks. Our experiments reveal that the histogram of the logit margins of na\"ive adversarial training has two peaks. Thus, the levels of difficulty in increasing logit margins are roughly divided into two: difficult samples (small logit margins) and easy samples (large logit margins). On the other hand, only one peak near zero appears in the histogram of importance-aware methods, i.e., they reduce the logit margins of easy samples. To increase logit margins of difficult samples without reducing those of easy samples, we propose switching one-versus-the-rest loss (SOVR), which switches from cross-entropy to one-versus-the-rest loss (OVR) for difficult samples. We derive trajectories of logit margins for a simple problem and prove that OVR increases logit margins two times larger than the weighted cross-entropy loss. Thus, SOVR increases logit margins of difficult samples, unlike existing methods. We experimentally show that SOVR achieves better robustness against Auto-Attack than importance-aware methods.
    A Comprehensive Review of Digital Twin -- Part 1: Modeling and Twinning Enabling Technologies. (arXiv:2208.14197v2 [cs.CE] UPDATED)
    As an emerging technology in the era of Industry 4.0, digital twin is gaining unprecedented attention because of its promise to further optimize process design, quality control, health monitoring, decision and policy making, and more, by comprehensively modeling the physical world as a group of interconnected digital models. In a two-part series of papers, we examine the fundamental role of different modeling techniques, twinning enabling technologies, and uncertainty quantification and optimization methods commonly used in digital twins. This first paper presents a thorough literature review of digital twin trends across many disciplines currently pursuing this area of research. Then, digital twin modeling and twinning enabling technologies are further analyzed by classifying them into two main categories: physical-to-virtual, and virtual-to-physical, based on the direction in which data flows. Finally, this paper provides perspectives on the trajectory of digital twin technology over the next decade, and introduces a few emerging areas of research which will likely be of great use in future digital twin research. In part two of this review, the role of uncertainty quantification and optimization are discussed, a battery digital twin is demonstrated, and more perspectives on the future of digital twin are shared.
    FixEval: Execution-based Evaluation of Program Fixes for Programming Problems. (arXiv:2206.07796v3 [cs.SE] UPDATED)
    The increasing complexity of software has led to a drastic rise in time and costs for identifying and fixing bugs. Various approaches are explored in the literature to generate fixes for buggy code automatically. However, few tools and datasets are available to evaluate model-generated fixes effectively due to the large combinatorial space of possible fixes for a particular bug. In this work, we introduce FIXEVAL, a benchmark comprising buggy code submissions to competitive programming problems and their respective fixes. FIXEVAL is composed of a rich test suite to evaluate and assess the correctness of model-generated program fixes and further information regarding time and memory constraints and acceptance based on a verdict. We consider two Transformer language models pretrained on programming languages as our baselines and compare them using match-based and execution-based evaluation metrics. Our experiments show that match-based metrics do not reflect model-generated program fixes accurately. At the same time, execution-based methods evaluate programs through all cases and scenarios designed explicitly for that solution. Therefore, we believe FIXEVAL provides a step towards real-world automatic bug fixing and model-generated code evaluation. The dataset and models are open-sourced.\footnote{\url{https://github.com/mahimanzum/FixEval}}
    Efficient Graph based Recommender System with Weighted Averaging of Messages. (arXiv:2209.15238v1 [cs.LG])
    We showcase a novel solution to a recommendation system problem where we face a perpetual soft item cold start issue. Our system aims to recommend demanded products to prospective sellers for listing in Amazon stores. These products always have only few interactions thereby giving rise to a perpetual soft item cold start situation. Modern collaborative filtering methods solve cold start using content attributes and exploit the existing implicit signals from warm start items. This approach fails in our use-case since our entire item set faces cold start issue always. Our Product Graph has over 500 Million nodes and over 5 Billion edges which makes training and inference using modern graph algorithms very compute intensive. To overcome these challenges we propose a system which reduces the dataset size and employs an improved modelling technique to reduce storage and compute without loss in performance. Particularly, we reduce our graph size using a filtering technique and then exploit this reduced product graph using Weighted Averaging of Messages over Layers (WAML) algorithm. WAML simplifies training on large graphs and improves over previous methods by reducing compute time to 1/7 of LightGCN and 1/26 of Graph Attention Network (GAT) and increasing recall$@100$ by 66% over LightGCN and 2.3x over GAT.
    AudioGen: Textually Guided Audio Generation. (arXiv:2209.15352v1 [cs.SD])
    We tackle the problem of generating audio samples conditioned on descriptive text captions. In this work, we propose AaudioGen, an auto-regressive generative model that generates audio samples conditioned on text inputs. AudioGen operates on a learnt discrete audio representation. The task of text-to-audio generation poses multiple challenges. Due to the way audio travels through a medium, differentiating ``objects'' can be a difficult task (e.g., separating multiple people simultaneously speaking). This is further complicated by real-world recording conditions (e.g., background noise, reverberation, etc.). Scarce text annotations impose another constraint, limiting the ability to scale models. Finally, modeling high-fidelity audio requires encoding audio at high sampling rate, leading to extremely long sequences. To alleviate the aforementioned challenges we propose an augmentation technique that mixes different audio samples, driving the model to internally learn to separate multiple sources. We curated 10 datasets containing different types of audio and text annotations to handle the scarcity of text-audio data points. For faster inference, we explore the use of multi-stream modeling, allowing the use of shorter sequences while maintaining a similar bitrate and perceptual quality. We apply classifier-free guidance to improve adherence to text. Comparing to the evaluated baselines, AudioGen outperforms over both objective and subjective metrics. Finally, we explore the ability of the proposed method to generate audio continuation conditionally and unconditionally. Samples: https://tinyurl.com/audiogen-text2audio
    Vertical Semi-Federated Learning for Efficient Online Advertising. (arXiv:2209.15635v1 [cs.LG])
    As an emerging secure learning paradigm in leveraging cross-silo private data, vertical federated learning (VFL) is expected to improve advertising models by enabling the joint learning of complementary user attributes privately owned by the advertiser and the publisher. However, the 1) restricted applicable scope to overlapped samples and 2) high system challenge of real-time federated serving have limited its application to advertising systems. In this paper, we advocate new learning setting Semi-VFL (Vertical Semi-Federated Learning) as a lightweight solution to utilize all available data (both the overlapped and non-overlapped data) that is free from federated serving. Semi-VFL is expected to perform better than single-party models and maintain a low inference cost. It's notably important to i) alleviate the absence of the passive party's feature and ii) adapt to the whole sample space to implement a good solution for Semi-VFL. Thus, we propose a carefully designed joint privileged learning framework (JPL) as an efficient implementation of Semi-VFL. Specifically, we build an inference-efficient single-party student model applicable to the whole sample space and meanwhile maintain the advantage of the federated feature extension. Novel feature imitation and ranking consistency restriction methods are proposed to extract cross-party feature correlations and maintain cross-sample-space consistency for both the overlapped and non-overlapped data. We conducted extensive experiments on real-world advertising datasets. The results show that our method achieves the best performance over baseline methods and validate its effectiveness in maintaining cross-view feature correlation.
    Identifying Latent Causal Content for Multi-Source Domain Adaptation. (arXiv:2208.14161v2 [cs.LG] UPDATED)
    Multi-source domain adaptation (MSDA) learns to predict the labels in target domain data, under the setting that data from multiple source domains are labelled and data from the target domain are unlabelled. Most methods for this task focus on learning invariant representations across domains. However, their success relies heavily on the assumption that the label distribution remains consistent across domains, which may not hold in general real-world problems. In this paper, we propose a new and more flexible assumption, termed \textit{latent covariate shift}, where a latent content variable $\mathbf{z}_c$ and a latent style variable $\mathbf{z}_s$ are introduced in the generative process, with the marginal distribution of $\mathbf{z}_c$ changing across domains and the conditional distribution of the label given $\mathbf{z}_c$ remaining invariant across domains. We show that although (completely) identifying the proposed latent causal model is challenging, the latent content variable can be identified up to scaling by using its dependence with labels from source domains, together with the identifiability conditions of nonlinear ICA. This motivates us to propose a novel method for MSDA, which learns the invariant label distribution conditional on the latent content variable, instead of learning invariant representations. Empirical evaluation on simulation and real data demonstrates the effectiveness of the proposed method.
    Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction. (arXiv:2209.00188v3 [cs.AR] UPDATED)
    Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average across a wide range of workloads, and 2) due to the increasing size and complexity of on-chip caches, a large fraction of the latency of an off-chip load request is spent accessing the on-chip cache hierarchy. The goal of this work is to accelerate off-chip load requests by removing the on-chip cache access latency from their critical path. To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads. To enable Hermes, we develop a new lightweight, perceptron-based off-chip load prediction technique that learns to identify off-chip load requests using multiple program features (e.g., sequence of program counters). For every load request, the predictor observes a set of program features to predict whether or not the load would go off-chip. If the load is predicted to go off-chip, Hermes issues a speculative request directly to the memory controller once the load's physical address is generated. If the prediction is correct, the load eventually misses the cache hierarchy and waits for the ongoing speculative request to finish, thus hiding the on-chip cache hierarchy access latency from the critical path of the off-chip load. Our evaluation shows that Hermes significantly improves performance of a state-of-the-art baseline. We open-source Hermes.
    FP8 Formats for Deep Learning. (arXiv:2209.05433v2 [cs.LG] UPDATED)
    FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings - E4M3 (4-bit exponent and 3-bit mantissa) and E5M2 (5-bit exponent and 2-bit mantissa). While E5M2 follows IEEE 754 conventions for representatio of special values, E4M3's dynamic range is extended by not representing infinities and having only one mantissa bit-pattern for NaNs. We demonstrate the efficacy of the FP8 format on a variety of image and language tasks, effectively matching the result quality achieved by 16-bit training sessions. Our study covers the main modern neural network architectures - CNNs, RNNs, and Transformer-based models, leaving all the hyperparameters unchanged from the 16-bit baseline training sessions. Our training experiments include large, up to 175B parameter, language models. We also examine FP8 post-training-quantization of language models trained using 16-bit formats that resisted fixed point int8 quantization.
    Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio. (arXiv:2209.15575v1 [cs.SD])
    Self-supervised learning (SSL) has proven vital in speech and audio-related applications. The paradigm trains a general model on unlabeled data that can later be used to solve specific downstream tasks. This type of model is costly to train as it requires manipulating long input sequences that can only be handled by powerful centralised servers. Surprisingly, despite many attempts to increase training efficiency through model compression, the effects of truncating input sequence lengths to reduce computation have not been studied. In this paper, we provide the first empirical study of SSL pre-training for different specified sequence lengths and link this to various downstream tasks. We find that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks. This simple one-line change would promote the migration of SSL training from data centres to user-end edge devices for more realistic and personalised applications.
    Provable Defense Against Geometric Transformations. (arXiv:2207.11177v2 [cs.LG] UPDATED)
    Geometric image transformations that arise in the real world, such as scaling and rotation, have been shown to easily deceive deep neural networks (DNNs). Hence, training DNNs to be certifiably robust to these perturbations is critical. However, no prior work has been able to incorporate the objective of deterministic certified robustness against geometric transformations into the training procedure, as existing verifiers are exceedingly slow. To address these challenges, we propose the first provable defense for deterministic certified geometric robustness. Our framework leverages a novel GPU-optimized verifier that can certify images between 60$\times$ to 42,600$\times$ faster than existing geometric robustness verifiers, and thus unlike existing works, is fast enough for use in training. Our results across multiple datasets show that networks trained via our framework consistently achieve state-of-the-art deterministic certified geometric robustness and clean accuracy. Furthermore, for the first time, we verify the geometric robustness of a neural network for the challenging, real-world setting of autonomous driving.
    Hierarchical Label-wise Attention Transformer Model for Explainable ICD Coding. (arXiv:2204.10716v2 [cs.LG] UPDATED)
    International Classification of Diseases (ICD) coding plays an important role in systematically classifying morbidity and mortality data. In this study, we propose a hierarchical label-wise attention Transformer model (HiLAT) for the explainable prediction of ICD codes from clinical documents. HiLAT firstly fine-tunes a pretrained Transformer model to represent the tokens of clinical documents. We subsequently employ a two-level hierarchical label-wise attention mechanism that creates label-specific document representations. These representations are in turn used by a feed-forward neural network to predict whether a specific ICD code is assigned to the input clinical document of interest. We evaluate HiLAT using hospital discharge summaries and their corresponding ICD-9 codes from the MIMIC-III database. To investigate the performance of different types of Transformer models, we develop ClinicalplusXLNet, which conducts continual pretraining from XLNet-Base using all the MIMIC-III clinical notes. The experiment results show that the F1 scores of the HiLAT+ClinicalplusXLNet outperform the previous state-of-the-art models for the top-50 most frequent ICD-9 codes from MIMIC-III. Visualisations of attention weights present a potential explainability tool for checking the face validity of ICD code predictions.
    Look Back When Surprised: Stabilizing Reverse Experience Replay for Neural Approximation. (arXiv:2206.03171v2 [cs.LG] UPDATED)
    Experience replay-based sampling techniques are essential to several reinforcement learning (RL) algorithms since they aid in convergence by breaking spurious correlations. The most popular techniques, such as uniform experience replay (UER) and prioritized experience replay (PER), seem to suffer from sub-optimal convergence and significant bias error, respectively. To alleviate this, we introduce a new experience replay method for reinforcement learning, called Introspective Experience Replay (IER). IER picks batches corresponding to data points consecutively before the 'surprising' points. Our proposed approach is based on the theoretically rigorous reverse experience replay (RER), which can be shown to remove bias in the linear approximation setting but can be sub-optimal with neural approximation. We show empirically that IER is stable with neural function approximation and has a superior performance compared to the state-of-the-art techniques like uniform experience replay (UER), prioritized experience replay (PER), and hindsight experience replay (HER) on the majority of tasks.
    Building Normalizing Flows with Stochastic Interpolants. (arXiv:2209.15571v1 [cs.LG])
    A simple generative model based on a continuous-time normalizing flow between any pair of base and target distributions is proposed. The velocity field of this flow is inferred from the probability current of a time-dependent distribution that interpolates between the base and the target in finite time. Unlike conventional normalizing flow inference methods based the maximum likelihood principle, which require costly backpropagation through ODE solvers, our interpolant approach leads to a simple quadratic loss for the velocity itself which is expressed in terms of expectations that are readily amenable to empirical estimation. The flow can be used to generate samples from either the base or target, and can be used to estimate the likelihood at any time along the interpolant. The approach is contextualized in its relation to diffusions. In particular, in situations where the base is a Gaussian distribution, we show that the velocity of our normalizing flow can also be used to construct a diffusion model to sample the target as well as estimating its score. This allows one to map methods based on stochastic differential equations to those of ordinary differential equations, simplifying the mechanics of the model, but capturing equivalent dynamics. Benchmarking on density estimation tasks illustrates that the learned flow can match and surpass maximum likelihood continuous flows at a fraction of the conventional ODE training costs.
    Adaptive Discretization in Online Reinforcement Learning. (arXiv:2110.15843v2 [stat.ML] UPDATED)
    Discretization based approaches to solving online reinforcement learning problems have been studied extensively in practice on applications ranging from resource allocation to cache management. Two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. While there have been several experimental results investigating heuristic solutions to these questions, there has been little theoretical treatment. In this paper we provide a unified theoretical analysis of tree-based hierarchical partitioning methods for online reinforcement learning, providing model-free and model-based algorithms. We show how our algorithms are able to take advantage of inherent structure of the problem by providing guarantees that scale with respect to the 'zooming dimension' instead of the ambient dimension, an instance-dependent quantity measuring the benignness of the optimal $Q_h^\star$ function. Many applications in computing systems and operations research requires algorithms that compete on three facets: low sample complexity, mild storage requirements, and low computational burden. Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets. This motivates its use in practical applications as our approach automatically adapts to underlying problem structure even when very little is known a priori about the system.
    A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games. (arXiv:2207.08894v2 [cs.LG] UPDATED)
    This paper proposes novel, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games. Different from prior efforts on training agents to beat a fixed set of opponents, our objective is to find the Nash equilibrium policies that are free from exploitation by even the adversarial opponents. We propose (1) Nash DQN algorithm, which integrates DQN with a Nash finding subroutine for the joint value functions; and (2) Nash DQN Exploiter algorithm, which additionally adopts an exploiter for guiding agent's exploration. Our algorithms are the practical variants of theoretical algorithms which are guaranteed to converge to Nash equilibria in the basic tabular setting. Experimental evaluation on both tabular examples and two-player Atari games demonstrates the robustness of the proposed algorithms against adversarial opponents, as well as their advantageous performance over existing methods.
    Retrieval-based Controllable Molecule Generation. (arXiv:2208.11126v2 [q-bio.QM] UPDATED)
    Generating new molecules with specified chemical and biological properties via generative models has emerged as a promising direction for drug discovery. However, existing methods require extensive training/fine-tuning with a large dataset, often unavailable in real-world generation tasks. In this work, we propose a new retrieval-based framework for controllable molecule generation. We use a small set of exemplar molecules, i.e., those that (partially) satisfy the design criteria, to steer the pre-trained generative model towards synthesizing molecules that satisfy the given design criteria. We design a retrieval mechanism that retrieves and fuses the exemplar molecules with the input molecule, which is trained by a new self-supervised objective that predicts the nearest neighbor of the input molecule. We also propose an iterative refinement process to dynamically update the generated molecules and retrieval database for better generalization. Our approach is agnostic to the choice of generative models and requires no task-specific fine-tuning. On various tasks ranging from simple design criteria to a challenging real-world scenario for designing lead compounds that bind to the SARS-CoV-2 main protease, we demonstrate our approach extrapolates well beyond the retrieval database, and achieves better performance and wider applicability than previous methods.
    Parea: multi-view ensemble clustering for cancer subtype discovery. (arXiv:2209.15399v1 [cs.LG])
    Multi-view clustering methods are essential for the stratification of patients into sub-groups of similar molecular characteristics. In recent years, a wide range of methods has been developed for this purpose. However, due to the high diversity of cancer-related data, a single method may not perform sufficiently well in all cases. We present Parea, a multi-view hierarchical ensemble clustering approach for disease subtype discovery. We demonstrate its performance on several machine learning benchmark datasets. We apply and validate our methodology on real-world multi-view cancer patient data. Parea outperforms the current state-of-the-art on six out of seven analysed cancer types. We have integrated the Parea method into our developed Python package Pyrea (https://github.com/mdbloice/Pyrea), which enables the effortless and flexible design of ensemble workflows while incorporating a wide range of fusion and clustering algorithms.
    Towards Multi-spatiotemporal-scale Generalized PDE Modeling. (arXiv:2209.15616v1 [cs.LG])
    Partial differential equations (PDEs) are central to describing complex physical system simulations. Their expensive solution techniques have led to an increased interest in deep neural network based surrogates. However, the practical utility of training such surrogates is contingent on their ability to model complex multi-scale spatio-temporal phenomena. Various neural network architectures have been proposed to target such phenomena, most notably Fourier Neural Operators (FNOs) which give a natural handle over local \& global spatial information via parameterization of different Fourier modes, and U-Nets which treat local and global information via downsampling and upsampling paths. However, generalizing across different equation parameters or different time-scales still remains a challenge. In this work, we make a comprehensive comparison between various FNO and U-Net like approaches on fluid mechanics problems in both vorticity-stream and velocity function form. For U-Nets, we transfer recent architectural improvements from computer vision, most notably from object segmentation and generative modeling. We further analyze the design considerations for using FNO layers to improve performance of U-Net architectures without major degradation of computational performance. Finally, we show promising results on generalization to different PDE parameters and time-scales with a single surrogate model.
    Linear Convergence for Natural Policy Gradient with Log-linear Policy Parametrization. (arXiv:2209.15382v1 [cs.LG])
    We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-linear policy parametrizations in infinite-horizon discounted Markov decision processes. In the deterministic case, when the Q-value is known and can be approximated by a linear combination of a known feature function up to a bias error, we show that a geometrically-increasing step size yields a linear convergence rate towards an optimal policy. We then consider the sample-based case, when the best representation of the Q- value function among linear combinations of a known feature function is known up to an estimation error. In this setting, we show that the algorithm enjoys the same linear guarantees as in the deterministic case up to an error term that depends on the estimation error, the bias error, and the condition number of the feature covariance matrix. Our results build upon the general framework of policy mirror descent and extend previous findings for the softmax tabular parametrization to the log-linear policy class.
    Higher-order Neural Additive Models: An Interpretable Machine Learning Model with Feature Interactions. (arXiv:2209.15409v1 [cs.LG])
    Black-box models, such as deep neural networks, exhibit superior predictive performances, but understanding their behavior is notoriously difficult. Many explainable artificial intelligence methods have been proposed to reveal the decision-making processes of black box models. However, their applications in high-stakes domains remain limited. Recently proposed neural additive models (NAM) have achieved state-of-the-art interpretable machine learning. NAM can provide straightforward interpretations with slight performance sacrifices compared with multi-layer perceptron. However, NAM can only model 1$^{\text{st}}$-order feature interactions; thus, it cannot capture the co-relationships between input features. To overcome this problem, we propose a novel interpretable machine learning method called higher-order neural additive models (HONAM) and a feature interaction method for high interpretability. HONAM can model arbitrary orders of feature interactions. Therefore, it can provide the high predictive performance and interpretability that high-stakes domains need. In addition, we propose a novel hidden unit to effectively learn sharp-shape functions. We conducted experiments using various real-world datasets to examine the effectiveness of HONAM. Furthermore, we demonstrate that HONAM can achieve fair AI with a slight performance sacrifice. The source code for HONAM is publicly available.
    GPNet: Simplifying Graph Neural Networks via Multi-channel Geometric Polynomials. (arXiv:2209.15454v1 [cs.LG])
    Graph Neural Networks (GNNs) are a promising deep learning approach for circumventing many real-world problems on graph-structured data. However, these models usually have at least one of four fundamental limitations: over-smoothing, over-fitting, difficult to train, and strong homophily assumption. For example, Simple Graph Convolution (SGC) is known to suffer from the first and fourth limitations. To tackle these limitations, we identify a set of key designs including (D1) dilated convolution, (D2) multi-channel learning, (D3) self-attention score, and (D4) sign factor to boost learning from different types (i.e. homophily and heterophily) and scales (i.e. small, medium, and large) of networks, and combine them into a graph neural network, GPNet, a simple and efficient one-layer model. We theoretically analyze the model and show that it can approximate various graph filters by adjusting the self-attention score and sign factor. Experiments show that GPNet consistently outperforms baselines in terms of average rank, average accuracy, complexity, and parameters on semi-supervised and full-supervised tasks, and achieves competitive performance compared to state-of-the-art model with inductive learning task.
    End-to-end P300 BCI using Bayesian accumulation of Riemannian probabilities. (arXiv:2203.07807v2 [cs.LG] UPDATED)
    In brain-computer interfaces (BCI), most of the approaches based on event-related potential (ERP) focus on the detection of P300, aiming for single trial classification for a speller task. While this is an important objective, existing P300 BCI still require several repetitions to achieve a correct classification accuracy. Signal processing and machine learning advances in P300 BCI mostly revolve around the P300 detection part, leaving the character classification out of the scope. To reduce the number of repetitions while maintaining a good character classification, it is critical to embrace the full classification problem. We introduce an end-to-end pipeline, starting from feature extraction, and is composed of an ERP-level classification using probabilistic Riemannian MDM which feeds a character-level classification using Bayesian accumulation of confidence across trials. Whereas existing approaches only increase the confidence of a character when it is flashed, our new pipeline, called Bayesian accumulation of Riemannian probabilities (ASAP), update the confidence of each character after each flash. We provide the proper derivation and theoretical reformulation of this Bayesian approach for a seamless processing of information from signal to BCI characters. We demonstrate that our approach performs significantly better than standard methods on public P300 datasets.
    Contextual Bandits with Knapsacks for a Conversion Model. (arXiv:2206.00314v2 [cs.LG] UPDATED)
    We consider contextual bandits with knapsacks, with an underlying structure between rewards generated and cost vectors suffered. We do so motivated by sales with commercial discounts. At each round, given the stochastic i.i.d.\ context $\mathbf{x}_t$ and the arm picked $a_t$ (corresponding, e.g., to a discount level), a customer conversion may be obtained, in which case a reward $r(a,\mathbf{x}_t)$ is gained and vector costs $c(a_t,\mathbf{x}_t)$ are suffered (corresponding, e.g., to losses of earnings). Otherwise, in the absence of a conversion, the reward and costs are null. The reward and costs achieved are thus coupled through the binary variable measuring conversion or the absence thereof. This underlying structure between rewards and costs is different from the linear structures considered by Agrawal and Devanur [2016] (but we show that the techniques introduced in the present article may also be applied to the case of these linear structures). The adaptive policies exhibited solve at each round a linear program based on upper-confidence estimates of the probabilities of conversion given $a$ and $\mathbf{x}$. This kind of policy is most natural and achieves a regret bound of the typical order (OPT/$B$) $\sqrt{T}$, where $B$ is the total budget allowed, OPT is the optimal expected reward achievable by a static policy, and $T$ is the number of rounds.
    The Final Ascent: When Bigger Models Generalize Worse on Noisy-Labeled Data. (arXiv:2208.08003v2 [cs.LG] UPDATED)
    Increasing the size of overparameterized neural networks has been shown to improve their generalization performance. However, real-world datasets often contain a significant fraction of noisy labels, which can drastically harm the performance of the models trained on them. In this work, we study how neural networks' test loss changes with model size when the training set contains noisy labels. We show that under a sufficiently large noise-to-sample size ratio, generalization error eventually increases with model size. First, we provide a theoretical analysis on random feature regression and show that this phenomenon occurs as the variance of the generalization loss experiences a second ascent under large noise-to-sample size ratio. Then, we present extensive empirical evidence confirming that our theoretical results hold for neural networks. Furthermore, we empirically observe that the adverse effect of network size is more pronounced when robust training methods are employed to learn from noisy-labeled data. Our results have important practical implications: First, larger models should be employed with extra care, particularly when trained on smaller dataset or using robust learning methods. Second, a large sample size can alleviate the effect of noisy labels and allow larger models to achieve a superior performance even under noise.
    Learning Accurate Decision Trees with Bandit Feedback via Quantized Gradient Descent. (arXiv:2102.07567v3 [cs.LG] UPDATED)
    Decision trees provide a rich family of highly non-linear but efficient models, due to which they continue to be the go-to family of predictive models by practitioners across domains. But learning trees is challenging due to their discrete decision boundaries. The state-of-the-art (SOTA) techniques resort to (a) learning \textit{soft} trees thereby losing logarithmic inference time; or (b) using methods tailored to specific supervised learning settings, requiring access to labeled examples and loss function. In this work, by leveraging techniques like overparameterization and straight-through estimators, we propose a unified method that enables accurate end-to-end gradient based tree training and can be deployed in a variety of settings like offline supervised learning and online learning with bandit feedback. Using extensive validation on standard benchmarks, we demonstrate that our method provides best of both worlds, i.e., it is competitive to, and in some cases more accurate than methods designed \textit{specifically} for the supervised settings; and in bandit settings, where most existing tree learning techniques are not applicable, our models are still accurate and significantly outperform the applicable SOTA methods.
    Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models. (arXiv:2208.14133v2 [cs.LG] UPDATED)
    Deep generative models (DGMs) are data-eager because learning a complex model on limited data suffers from a large variance and easily overfits. Inspired by the classical perspective of the bias-variance tradeoff, we propose regularized deep generative model (Reg-DGM), which leverages a nontransferable pre-trained model to reduce the variance of generative modeling with limited data. Formally, Reg-DGM optimizes a weighted sum of a certain divergence and the expectation of an energy function, where the divergence is between the data and the model distributions, and the energy function is defined by the pre-trained model w.r.t. the model distribution. We analyze a simple yet representative Gaussian-fitting case to demonstrate how the weighting hyperparameter trades off the bias and the variance. Theoretically, we characterize the existence and the uniqueness of the global minimum of Reg-DGM in a non-parametric setting and prove its convergence with neural networks trained by gradient-based methods. Empirically, with various pre-trained feature extractors and a data-dependent energy function, Reg-DGM consistently improves the generation performance of strong DGMs with limited data and achieves competitive results to the state-of-the-art methods.
    DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-based Systems. (arXiv:2110.11155v2 [cs.SE] UPDATED)
    Performance debugging in production is a fundamental activity in modern service-based systems. The diagnosis of performance issues is often time-consuming, since it requires thorough inspection of large volumes of traces and performance indices. In this paper we present DeLag, a novel automated search-based approach for diagnosing performance issues in service-based systems. DeLag identifies subsets of requests that show, in the combination of their Remote Procedure Call execution times, symptoms of potentially relevant performance issues. We call such symptoms Latency Degradation Patterns. DeLag simultaneously searches for multiple latency degradation patterns while optimizing precision, recall and latency dissimilarity. Experimentation on 700 datasets of requests generated from two microservice-based systems shows that our approach provides better and more stable effectiveness than three state-of-the-art approaches and general purpose machine learning clustering algorithms. Moreover, DeLag outperforms in terms of efficiency the second and the third most effective baseline techniques on the largest datasets used in our evaluation.
    An information-theoretic approach to unsupervised keypoint representation learning. (arXiv:2209.15404v1 [cs.CV])
    Extracting informative representations from videos is fundamental for the effective learning of various downstream tasks. Inspired by classical works on saliency, we present a novel information-theoretic approach to discover meaningful representations from videos in an unsupervised fashion. We argue that local entropy of pixel neighborhoods and its evolution in a video stream is a valuable intrinsic supervisory signal for learning to attend to salient features. We, thus, abstract visual features into a concise representation of keypoints that serve as dynamic information transporters. We discover in an unsupervised fashion spatio-temporally consistent keypoint representations that carry the prominent information across video frames, thanks to two original information-theoretic losses. First, a loss that maximizes the information covered by the keypoints in a frame. Second, a loss that encourages optimized keypoint transportation over time, thus, imposing consistency of the information flow. We evaluate our keypoint-based representation compared to state-of-the-art baselines in different downstream tasks such as learning object dynamics. To evaluate the expressivity and consistency of the keypoints, we propose a new set of metrics. Our empirical results showcase the superior performance of our information-driven keypoints that resolve challenges like attendance to both static and dynamic objects, and to objects abruptly entering and leaving the scene.
    Sparse Mixture-of-Experts are Domain Generalizable Learners. (arXiv:2206.04046v4 [cs.CV] UPDATED)
    Human visual perception can easily generalize to out-of-distributed visual data, which is far beyond the capability of modern machine learning models. Domain generalization (DG) aims to close this gap, with existing DG methods mainly focusing on the loss function design. In this paper, we propose to explore an orthogonal direction, i.e., the design of the backbone architecture. It is motivated by an empirical finding that transformer-based models trained with empirical risk minimization (ERM) outperform CNN-based models employing state-of-the-art (SOTA) DG algorithms on multiple DG datasets. We develop a formal framework to characterize a network's robustness to distribution shifts by studying its architecture's alignment to the correlations in the dataset. This analysis guides us to propose a novel DG model built upon vision transformers, namely Generalizable Mixture-of-Experts (GMoE). Extensive experiments on DomainBed demonstrate that GMoE trained with ERM outperforms SOTA DG baselines by a large margin. Moreover, GMoE is complementary to existing DG methods and its performance is substantially improved when trained with DG algorithms.
    Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data. (arXiv:2209.15505v1 [cs.LG])
    SGD with momentum acceleration is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum acceleration is Distributed SGD (DSGD) with momentum acceleration (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum acceleration that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and decrease when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum acceleration whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the standard deep learning setting, where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient $\beta\in [0, 1)$. Through image classification tasks, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum acceleration and can consistently outperform these existing methods when the data distributions are heterogeneous.
    Relative representations enable zero-shot latent space communication. (arXiv:2209.15430v1 [cs.LG])
    Neural networks embed the geometric structure of a data manifold lying in a high-dimensional space into latent representations. Ideally, the distribution of the data points in the latent space should depend only on the task, the data, the loss, and other architecture-specific constraints. However, factors such as the random weights initialization, training hyperparameters, or other sources of randomness in the training phase may induce incoherent latent spaces that hinder any form of reuse. Nevertheless, we empirically observe that, under the same data and modeling choices, distinct latent spaces typically differ by an unknown quasi-isometric transformation: that is, in each space, the distances between the encodings do not change. In this work, we propose to adopt pairwise similarities as an alternative data representation, that can be used to enforce the desired invariance without any additional training. We show how neural architectures can leverage these relative representations to guarantee, in practice, latent isometry invariance, effectively enabling latent space communication: from zero-shot model stitching to latent space comparison between diverse settings. We extensively validate the generalization capability of our approach on different datasets, spanning various modalities (images, text, graphs), tasks (e.g., classification, reconstruction) and architectures (e.g., CNNs, GCNs, transformers).
    Scheduling for Urban Air Mobility using Safe Learning. (arXiv:2209.15457v1 [cs.LG])
    This work considers the scheduling problem for Urban Air Mobility (UAM) vehicles travelling between origin-destination pairs with both hard and soft trip deadlines. Each route is described by a discrete probability distribution over trip completion times (or delay) and over inter-arrival times of requests (or demand) for the route along with a fixed hard or soft deadline. Soft deadlines carry a cost that is incurred when the deadline is missed. An online, safe scheduler is developed that ensures that hard deadlines are never missed, and that average cost of missing soft deadlines is minimized. The system is modelled as a Markov Decision Process (MDP) and safe model-based learning is used to find the probabilistic distributions over route delays and demand. Monte Carlo Tree Search (MCTS) Earliest Deadline First (EDF) is used to safely explore the learned models in an online fashion and develop a near-optimal non-preemptive scheduling policy. These results are compared with Value Iteration (VI) and MCTS (Random) scheduling solutions.
    Sparsity-Constrained Optimal Transport. (arXiv:2209.15466v1 [stat.ML])
    Regularized optimal transport (OT) is now increasingly used as a loss or as a matching layer in neural networks. Entropy-regularized OT can be computed using the Sinkhorn algorithm but it leads to fully-dense transportation plans, meaning that all sources are (fractionally) matched with all targets. To address this issue, several works have investigated quadratic regularization instead. This regularization preserves sparsity and leads to unconstrained and smooth (semi) dual objectives, that can be solved with off-the-shelf gradient methods. Unfortunately, quadratic regularization does not give direct control over the cardinality (number of nonzeros) of the transportation plan. We propose in this paper a new approach for OT with explicit cardinality constraints on the transportation plan. Our work is motivated by an application to sparse mixture of experts, where OT can be used to match input tokens such as image patches with expert models such as neural networks. Cardinality constraints ensure that at most $k$ tokens are matched with an expert, which is crucial for computational performance reasons. Despite the nonconvexity of cardinality constraints, we show that the corresponding (semi) dual problems are tractable and can be solved with first-order gradient methods. Our method can be thought as a middle ground between unregularized OT (recovered in the limit case $k=1$) and quadratically-regularized OT (recovered when $k$ is large enough). The smoothness of the objectives increases as $k$ increases, giving rise to a trade-off between convergence speed and sparsity of the optimal plan.
    An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems. (arXiv:2205.12755v3 [cs.LG] UPDATED)
    Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer, a key feature of human learning. Though, state of the art ML models rely on high customization for each task and leverage size and data scale rather than scaling the number of tasks. Also, continual learning, that adds the temporal aspect to multitask, is often focused to the study of common pitfalls such as catastrophic forgetting instead of being studied at a large scale as a critical component to build the next generation artificial intelligence. We propose an evolutionary method that can generate a large scale multitask model, and can support the dynamic and continuous addition of new tasks. The generated multitask model is sparsely activated and integrates a task-based routing that guarantees bounded compute cost and fewer added parameters per task as the model expands. The proposed method relies on a knowledge compartmentalization technique to achieve immunity against catastrophic forgetting and other common pitfalls such as gradient interference and negative transfer. We empirically show that the proposed method can jointly solve and achieve competitive results on 69image classification tasks, for example achieving the best test accuracy reported fora model trained only on public data for competitive tasks such as cifar10: 99.43%.
    Neuro-Symbolic Causal Language Planning with Commonsense Prompting. (arXiv:2206.02928v3 [cs.CL] UPDATED)
    Language planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps. Such procedural reasoning ability is essential for applications such as household robots and virtual assistants. Although language planning is a basic skill set for humans in daily life, it remains a challenge for large language models (LLMs) that lack deep-level commonsense knowledge in the real world. Previous methods require either manual exemplars or annotated programs to acquire such ability from LLMs. In contrast, this paper proposes Neuro-Symbolic Causal Language Planner (CLAP) that elicits procedural knowledge from the LLMs with commonsense-infused prompting. Pre-trained knowledge in LLMs is essentially an unobserved confounder that causes spurious correlations between tasks and action plans. Through the lens of a Structural Causal Model (SCM), we propose an effective strategy in CLAP to construct prompts as a causal intervention toward our SCM. Using graph sampling techniques and symbolic program executors, our strategy formalizes the structured causal prompts from commonsense knowledge bases. CLAP obtains state-of-the-art performance on WikiHow and RobotHow, achieving a relative improvement of 5.28% in human evaluations under the counterfactual setting. This indicates the superiority of CLAP in causal language planning semantically and sequentially.
    Neural Unbalanced Optimal Transport via Cycle-Consistent Semi-Couplings. (arXiv:2209.15621v1 [cs.LG])
    Comparing unpaired samples of a distribution or population taken at different points in time is a fundamental task in many application domains where measuring populations is destructive and cannot be done repeatedly on the same sample, such as in single-cell biology. Optimal transport (OT) can solve this challenge by learning an optimal coupling of samples across distributions from unpaired data. However, the usual formulation of OT assumes conservation of mass, which is violated in unbalanced scenarios in which the population size changes (e.g., cell proliferation or death) between measurements. In this work, we introduce NubOT, a neural unbalanced OT formulation that relies on the formalism of semi-couplings to account for creation and destruction of mass. To estimate such semi-couplings and generalize out-of-sample, we derive an efficient parameterization based on neural optimal transport maps and propose a novel algorithmic scheme through a cycle-consistent training procedure. We apply our method to the challenging task of forecasting heterogeneous responses of multiple cancer cell lines to various drugs, where we observe that by accurately modeling cell proliferation and death, our method yields notable improvements over previous neural optimal transport methods.
    Risk Control for Online Learning Models. (arXiv:2205.09095v6 [cs.LG] UPDATED)
    To provide rigorous uncertainty quantification for online learning models, we develop a framework for constructing uncertainty sets that provably control risk -- such as coverage of confidence intervals, false negative rate, or F1 score -- in the online setting. This extends conformal prediction to apply to a larger class of online learning problems. Our method guarantees risk control at any user-specified level even when the underlying data distribution shifts drastically, even adversarially, over time in an unknown fashion. The technique we propose is highly flexible as it can be applied with any base online learning algorithm (e.g., a deep neural network trained online), requiring minimal implementation effort and essentially zero additional computational cost. We further extend our approach to control multiple risks simultaneously, so the prediction sets we generate are valid for all given risks. To demonstrate the utility of our method, we conduct experiments on real-world tabular time-series data sets showing that the proposed method rigorously controls various natural risks. Furthermore, we show how to construct valid intervals for an online image-depth estimation problem that previous sequential calibration schemes cannot handle.
    CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. (arXiv:2203.13474v4 [cs.LG] UPDATED)
    Program synthesis strives to generate a computer program as a solution to a given problem specification, expressed with input-output examples or natural language descriptions. The prevalence of large language models advances the state-of-the-art for program synthesis, though limited training resources and data impede open access to such models. To democratize this, we train and release a family of large language models up to 16.1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER. We show the utility of the trained model by demonstrating that it is competitive with the previous state-of-the-art on zero-shot Python code generation on HumanEval. We further investigate the multi-step paradigm for program synthesis, where a single program is factorized into multiple prompts specifying subproblems. To this end, we construct an open benchmark, Multi-Turn Programming Benchmark (MTPB), consisting of 115 diverse problem sets that are factorized into multi-turn prompts. Our analysis on MTPB shows that the same intent provided to CODEGEN in multi-turn fashion significantly improves program synthesis over that provided as a single turn. We make the training library JAXFORMER and model checkpoints available as open source contribution: https://github.com/salesforce/CodeGen.
    $\Phi$-DVAE: Learning Physically Interpretable Representations with Nonlinear Filtering. (arXiv:2209.15609v1 [stat.ML])
    Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder ($\Phi$-DVAE) for embedding diverse data streams into time-evolving physical systems described by differential equations. Our approach combines a standard (possibly nonlinear) filter for the latent state-space model and a VAE, to embed the unstructured data stream into the latent dynamical system. A variational Bayesian framework is used for the joint estimation of the embedding, latent states, and unknown system parameters. To demonstrate the method, we look at three examples: video datasets generated by the advection and Korteweg-de Vries partial differential equations, and a velocity field generated by the Lorenz-63 system. Comparisons with relevant baselines show that the $\Phi$-DVAE provides a data efficient dynamics encoding methodology that is competitive with standard approaches, with the added benefit of incorporating a physically interpretable latent space.
    Deep Recurrent Encoder: A scalable end-to-end network to model brain signals. (arXiv:2103.02339v3 [q-bio.NC] UPDATED)
    Understanding how the brain responds to sensory inputs is challenging: brain recordings are partial, noisy, and high dimensional; they vary across sessions and subjects and they capture highly nonlinear dynamics. These challenges have led the community to develop a variety of preprocessing and analytical (almost exclusively linear) methods, each designed to tackle one of these issues. Instead, we propose to address these challenges through a specific end-to-end deep learning architecture, trained to predict the brain responses of multiple subjects at once. We successfully test this approach on a large cohort of magnetoencephalography (MEG) recordings acquired during a one-hour reading task. Our Deep Recurrent Encoding (DRE) architecture reliably predicts MEG responses to words with a three-fold improvement over classic linear methods. To overcome the notorious issue of interpretability of deep learning, we describe a simple variable importance analysis. When applied to DRE, this method recovers the expected evoked responses to word length and word frequency. The quantitative improvement of the present deep learning approach paves the way to better understand the nonlinear dynamics of brain activity from large datasets.
    Fusion of complementary 2D and 3D mesostructural datasets using generative adversarial networks. (arXiv:2110.11281v3 [cs.CV] UPDATED)
    Modelling the impact of a material's mesostructure on device level performance typically requires access to 3D image data containing all the relevant information to define the geometry of the simulation domain. This image data must include sufficient contrast between phases to distinguish each material, be of high enough resolution to capture the key details, but also have a large enough field-of-view to be representative of the material in general. It is rarely possible to obtain data with all of these properties from a single imaging technique. In this paper, we present a method for combining information from pairs of distinct but complementary imaging techniques in order to accurately reconstruct the desired multi-phase, high resolution, representative, 3D images. Specifically, we use deep convolutional generative adversarial networks to implement super-resolution, style transfer and dimensionality expansion. To demonstrate the widespread applicability of this tool, two pairs of datasets are used to validate the quality of the volumes generated by fusing the information from paired imaging techniques. Three key mesostructural metrics are calculated in each case to show the accuracy of this method. Having confidence in the accuracy of our method, we then demonstrate its power by applying to a real data pair from a lithium ion battery electrode, where the required 3D high resolution image data is not available anywhere in the literature. We believe this approach is superior to previously reported statistical material reconstruction methods both in terms of its fidelity and ease of use. Furthermore, much of the data required to train this algorithm already exists in the literature, waiting to be combined. As such, our open-access code could precipitate a step change by generating the hard to obtain high quality image volumes necessary to simulate behaviour at the mesoscale.
    Flexible risk design using bi-directional dispersion. (arXiv:2203.14434v2 [stat.ML] UPDATED)
    Many novel notions of "risk" (e.g., CVaR, tilted risk, DRO risk) have been proposed and studied, but these risks are all at least as sensitive as the mean to loss tails on the upside, and tend to ignore deviations on the downside. We study a complementary new risk class that penalizes loss deviations in a bi-directional manner, while having more flexibility in terms of tail sensitivity than is offered by mean-variance. This class lets us derive high-probability learning guarantees without explicit gradient clipping, and empirical tests using both simulated and real data illustrate a high degree of control over key properties of the test loss distribution incurred by gradient-based learners.
    B2RL: An open-source Dataset for Building Batch Reinforcement Learning. (arXiv:2209.15626v1 [cs.LG])
    Batch reinforcement learning (BRL) is an emerging research area in the RL community. It learns exclusively from static datasets (i.e. replay buffers) without interaction with the environment. In the offline settings, existing replay experiences are used as prior knowledge for BRL models to find the optimal policy. Thus, generating replay buffers is crucial for BRL model benchmark. In our B2RL (Building Batch RL) dataset, we collected real-world data from our building management systems, as well as buffers generated by several behavioral policies in simulation environments. We believe it could help building experts on BRL research. To the best of our knowledge, we are the first to open-source building datasets for the purpose of BRL learning.
    Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies. (arXiv:2203.07338v2 [cs.LG] UPDATED)
    Human decision making is well known to be imperfect and the ability to analyse such processes individually is crucial when attempting to aid or improve a decision-maker's ability to perform a task, e.g. to alert them to potential biases or oversights on their part. To do so, it is necessary to develop interpretable representations of how agents make decisions and how this process changes over time as the agent learns online in reaction to the accrued experience. To then understand the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. By interpreting actions within a potential outcomes framework, we introduce a meaningful mapping based on agents choosing an action they believe to have the greatest treatment effect. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them, using a novel architecture built upon an expressive family of deep state-space models. Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
    Towards General-Purpose Representation Learning of Polygonal Geometries. (arXiv:2209.15458v1 [cs.CV])
    Neural network representation learning for spatial data is a common need for geographic artificial intelligence (GeoAI) problems. In recent years, many advancements have been made in representation learning for points, polylines, and networks, whereas little progress has been made for polygons, especially complex polygonal geometries. In this work, we focus on developing a general-purpose polygon encoding model, which can encode a polygonal geometry (with or without holes, single or multipolygons) into an embedding space. The result embeddings can be leveraged directly (or finetuned) for downstream tasks such as shape classification, spatial relation prediction, and so on. To achieve model generalizability guarantees, we identify a few desirable properties: loop origin invariance, trivial vertex invariance, part permutation invariance, and topology awareness. We explore two different designs for the encoder: one derives all representations in the spatial domain; the other leverages spectral domain representations. For the spatial domain approach, we propose ResNet1D, a 1D CNN-based polygon encoder, which uses circular padding to achieve loop origin invariance on simple polygons. For the spectral domain approach, we develop NUFTspec based on Non-Uniform Fourier Transformation (NUFT), which naturally satisfies all the desired properties. We conduct experiments on two tasks: 1) shape classification based on MNIST; 2) spatial relation prediction based on two new datasets - DBSR-46K and DBSR-cplx46K. Our results show that NUFTspec and ResNet1D outperform multiple existing baselines with significant margins. While ResNet1D suffers from model performance degradation after shape-invariance geometry modifications, NUFTspec is very robust to these modifications due to the nature of the NUFT.
    MEIM: Multi-partition Embedding Interaction Beyond Block Term Format for Efficient and Expressive Link Prediction. (arXiv:2209.15597v1 [cs.AI])
    Knowledge graph embedding aims to predict the missing relations between entities in knowledge graphs. Tensor-decomposition-based models, such as ComplEx, provide a good trade-off between efficiency and expressiveness, that is crucial because of the large size of real world knowledge graphs. The recent multi-partition embedding interaction (MEI) model subsumes these models by using the block term tensor format and provides a systematic solution for the trade-off. However, MEI has several drawbacks, some of which carried from its subsumed tensor-decomposition-based models. In this paper, we address these drawbacks and introduce the Multi-partition Embedding Interaction iMproved beyond block term format (MEIM) model, with independent core tensor for ensemble effects and soft orthogonality for max-rank mapping, in addition to multi-partition embedding. MEIM improves expressiveness while still being highly efficient, helping it to outperform strong baselines and achieve state-of-the-art results on difficult link prediction benchmarks using fairly small embedding sizes. The source code is released at https://github.com/tranhungnghiep/MEIM-KGE.
    On The Robustness of Self-Supervised Representations for Spoken Language Modeling. (arXiv:2209.15483v1 [cs.CL])
    Self-supervised representations have been extensively studied for discriminative and generative tasks. However, their robustness capabilities have not been extensively investigated. This work focuses on self-supervised representations for spoken generative language models. First, we empirically demonstrate how current state-of-the-art speech representation models lack robustness to basic signal variations that do not alter the spoken information. To overcome this, we propose an effective and efficient method to learn robust self-supervised speech representation for generative spoken language modeling. The proposed approach is based on applying a set of signal transformations to the speech signal and optimizing the model using an iterative pseudo-labeling scheme. Our method significantly improves over the evaluated baselines when considering encoding metrics. We additionally evaluate our method on the speech-to-speech translation task. We consider Spanish-English and French-English conversions and empirically demonstrate the benefits of following the proposed approach.
    Physically Meaningful Uncertainty Quantification in Probabilistic Wind Turbine Power Curve Models as a Damage Sensitive Feature. (arXiv:2209.15579v1 [cs.LG])
    A wind turbines' power curve is easily accessible damage sensitive data, and as such is a key part of structural health monitoring in wind turbines. Power curve models can be constructed in a number of ways, but the authors argue that probabilistic methods carry inherent benefits in this use case, such as uncertainty quantification and allowing uncertainty propagation analysis. Many probabilistic power curve models have a key limitation in that they are not physically meaningful - they return mean and uncertainty predictions outside of what is physically possible (the maximum and minimum power outputs of the wind turbine). This paper investigates the use of two bounded Gaussian Processes in order to produce physically meaningful probabilistic power curve models. The first model investigated was a warped heteroscedastic Gaussian process, and was found to be ineffective due to specific shortcomings of the Gaussian Process in relation to the warping function. The second model - an approximated Gaussian Process with a Beta likelihood was highly successful and demonstrated that a working bounded probabilistic model results in better predictive uncertainty than a corresponding unbounded one without meaningful loss in predictive accuracy. Such a bounded model thus offers increased accuracy for performance monitoring and increased operator confidence in the model due to guaranteed physical plausibility.
    Riemannian Metric Learning via Optimal Transport. (arXiv:2205.09244v2 [cs.LG] UPDATED)
    We introduce an optimal transport-based model for learning a metric tensor from cross-sectional samples of evolving probability measures on a common Riemannian manifold. We neurally parametrize the metric as a spatially-varying matrix field and efficiently optimize our model's objective using a simple alternating scheme. Using this learned metric, we can nonlinearly interpolate between probability measures and compute geodesics on the manifold. We show that metrics learned using our method improve the quality of trajectory inference on scRNA and bird migration data at the cost of little additional cross-sectional data.
    An efficient encoder-decoder architecture with top-down attention for speech separation. (arXiv:2209.15200v1 [cs.SD])
    Deep neural networks have shown excellent prospects in speech separation tasks. However, obtaining good results while keeping a low model complexity remains challenging in real-world applications. In this paper, we provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet, with decreased model complexity without sacrificing performance. The top-down attention in TDANet is extracted by the global attention (GA) module and the cascaded local attention (LA) layers. The GA module takes multi-scale acoustic features as input to extract global attention signal, which then modulates features of different scales by direct top-down connections. The LA layers use features of adjacent layers as input to extract the local attention signal, which is used to modulate the lateral input in a top-down manner. On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods with higher efficiency. Specifically, TDANet's multiply-accumulate operations (MACs) are only 5\% of Sepformer, one of the previous SOTA models, and CPU inference time is only 10\% of Sepformer. In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10\% of Sepformer and the CPU inference time only 24\% of Sepformer. Our study suggests that top-down attention can be a more efficient strategy for speech separation.
    Blessing from Experts: Super Reinforcement Learning in Confounded Environments. (arXiv:2209.15448v1 [cs.LG])
    We introduce super reinforcement learning in the batch setting, which takes the observed action as input for enhanced policy learning. In the presence of unmeasured confounders, the recommendations from human experts recorded in the observed data allow us to recover certain unobserved information. Including this information in the policy search, the proposed super reinforcement learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., the expert's recommendation). Furthermore, to address the issue of unmeasured confounding in finding super-policies, a number of non-parametric identification results are established. Finally, we develop two super-policy learning algorithms and derive their corresponding finite-sample regret guarantees.
    Downlink Compression Improves TopK Sparsification. (arXiv:2209.15203v1 [cs.LG])
    Training large neural networks is time consuming. To speed up the process, distributed training is often used. One of the largest bottlenecks in distributed training is communicating gradients across different nodes. Different gradient compression techniques have been proposed to alleviate the communication bottleneck, including topK gradient sparsification, which truncates the gradient to the largest K components before sending it to other nodes. While some authors have investigated topK gradient sparsification in the parameter-server framework by applying topK compression in both the worker-to-server (uplink) and server-to-worker (downlink) direction, the currently accepted belief says that adding extra compression degrades the convergence of the model. We demonstrate, on the contrary, that adding downlink compression can potentially improve the performance of topK sparsification: not only does it reduce the amount of communication per step, but also, counter-intuitively, can improve the upper bound in the convergence analysis. To show this, we revisit non-convex convergence analysis of topK stochastic gradient descent (SGD) and extend it from the unidirectional to the bidirectional setting. We also remove a restriction of the previous analysis that requires unrealistically large values of K. We experimentally evaluate bidirectional topK SGD against unidirectional topK SGD and show that models trained with bidirectional topK SGD will perform as well as models trained with unidirectional topK SGD while yielding significant communication benefits for large numbers of workers.
    Optimal Query Complexities for Dynamic Trace Estimation. (arXiv:2209.15219v1 [cs.DS])
    We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly, such as during an optimization process. Specifically, for any $m$ matrices $A_1,...,A_m$ with consecutive differences bounded in Schatten-$1$ norm by $\alpha$, we provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $\epsilon$ error with $\delta$ failure probability with an optimal query complexity of $\widetilde{O}\left(m \alpha\sqrt{\log(1/\delta)}/\epsilon + m\log(1/\delta)\right)$, improving the dependence on both $\alpha$ and $\delta$ from Dharangutte and Musco (NeurIPS, 2021). Our procedure works without additional norm bounds on $A_i$ and can be generalized to a bound for the $p$-th Schatten norm for $p \in [1,2]$, giving a complexity of $\widetilde{O}\left(m \alpha\left(\sqrt{\log(1/\delta)}/\epsilon\right)^p +m \log(1/\delta)\right)$. By using novel reductions to communication complexity and information-theoretic analyses of Gaussian matrices, we provide matching lower bounds for static and dynamic trace estimation in all relevant parameters, including the failure probability. Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation, resolving open questions of prior work.
    Graph Neural Networks for Link Prediction with Subgraph Sketching. (arXiv:2209.15486v1 [cs.LG])
    Many Graph Neural Networks (GNNs) perform poorly compared to simple heuristics on Link Prediction (LP) tasks. This is due to limitations in expressive power such as the inability to count triangles (the backbone of most LP heuristics) and because they can not distinguish automorphic nodes (those having identical structural roles). Both expressiveness issues can be alleviated by learning link (rather than node) representations and incorporating structural features such as triangle counts. Since explicit link representations are often prohibitively expensive, recent works resorted to subgraph-based methods, which have achieved state-of-the-art performance for LP, but suffer from poor efficiency due to high levels of redundancy between subgraphs. We analyze the components of subgraph GNN (SGNN) methods for link prediction. Based on our analysis, we propose a novel full-graph GNN called ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as messages to approximate the key components of SGNNs without explicit subgraph construction. ELPH is provably more expressive than Message Passing GNNs (MPNNs). It outperforms existing SGNN models on many standard LP benchmarks while being orders of magnitude faster. However, it shares the common GNN limitation that it is only efficient when the dataset fits in GPU memory. Accordingly, we develop a highly scalable model, called BUDDY, which uses feature precomputation to circumvent this limitation without sacrificing predictive performance. Our experiments show that BUDDY also outperforms SGNNs on standard LP benchmarks while being highly scalable and faster than ELPH.
    AAU-net: An Adaptive Attention U-net for Breast Lesions Segmentation in Ultrasound Images. (arXiv:2204.12077v2 [eess.IV] UPDATED)
    Various deep learning methods have been proposed to segment breast lesion from ultrasound images. However, similar intensity distributions, variable tumor morphology and blurred boundaries present challenges for breast lesions segmentation, especially for malignant tumors with irregular shapes. Considering the complexity of ultrasound images, we develop an adaptive attention U-net (AAU-net) to segment breast lesions automatically and stably from ultrasound images. Specifically, we introduce a hybrid adaptive attention module, which mainly consists of a channel self-attention block and a spatial self-attention block, to replace the traditional convolution operation. Compared with the conventional convolution operation, the design of the hybrid adaptive attention module can help us capture more features under different receptive fields. Different from existing attention mechanisms, the hybrid adaptive attention module can guide the network to adaptively select more robust representation in channel and space dimensions to cope with more complex breast lesions segmentation. Extensive experiments with several state-of-the-art deep learning segmentation methods on three public breast ultrasound datasets show that our method has better performance on breast lesion segmentation. Furthermore, robustness analysis and external experiments demonstrate that our proposed AAU-net has better generalization performance on the segmentation of breast lesions. Moreover, the hybrid adaptive attention module can be flexibly applied to existing network frameworks.
    Neural Integral Equations. (arXiv:2209.15190v1 [cs.LG])
    Integral equations (IEs) are functional equations defined through integral operators, where the unknown function is integrated over a possibly multidimensional space. Important applications of IEs have been found throughout theoretical and applied sciences, including in physics, chemistry, biology, and engineering; often in the form of inverse problems. IEs are especially useful since differential equations, e.g. ordinary differential equations (ODEs), and partial differential equations (PDEs) can be formulated in an integral version which is often more convenient to solve. Moreover, unlike ODEs and PDEs, IEs can model inherently non-local dynamical systems, such as ones with long distance spatiotemporal relations. While efficient algorithms exist for solving given IEs, no method exists that can learn an integral equation and its associated dynamics from data alone. In this article, we introduce Neural Integral Equations (NIE), a method that learns an unknown integral operator from data through a solver. We also introduce an attentional version of NIE, called Attentional Neural Integral Equations (ANIE), where the integral is replaced by self-attention, which improves scalability and provides interpretability. We show that learning dynamics via integral equations is faster than doing so via other continuous methods, such as Neural ODEs. Finally, we show that ANIE outperforms other methods on several benchmark tasks in ODE, PDE, and IE systems of synthetic and real-world data.
    Designing and Training of Lightweight Neural Networks on Edge Devices using Early Halting in Knowledge Distillation. (arXiv:2209.15560v1 [cs.LG])
    Automated feature extraction capability and significant performance of Deep Neural Networks (DNN) make them suitable for Internet of Things (IoT) applications. However, deploying DNN on edge devices becomes prohibitive due to the colossal computation, energy, and storage requirements. This paper presents a novel approach for designing and training lightweight DNN using large-size DNN. The approach considers the available storage, processing speed, and maximum allowable processing time to execute the task on edge devices. We present a knowledge distillation based training procedure to train the lightweight DNN to achieve adequate accuracy. During the training of lightweight DNN, we introduce a novel early halting technique, which preserves network resources; thus, speedups the training procedure. Finally, we present the empirically and real-world evaluations to verify the effectiveness of the proposed approach under different constraints using various edge devices.
    Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability. (arXiv:2209.15594v1 [cs.LG])
    Traditional analyses of gradient descent show that when the largest eigenvalue of the Hessian, also known as the sharpness $S(\theta)$, is bounded by $2/\eta$, training is "stable" and the training loss decreases monotonically. Recent works, however, have observed that this assumption does not hold when training modern neural networks with full batch or large batch gradient descent. Most recently, Cohen et al. (2021) observed two important phenomena. The first, dubbed progressive sharpening, is that the sharpness steadily increases throughout training until it reaches the instability cutoff $2/\eta$. The second, dubbed edge of stability, is that the sharpness hovers at $2/\eta$ for the remainder of training while the loss continues decreasing, albeit non-monotonically. We demonstrate that, far from being chaotic, the dynamics of gradient descent at the edge of stability can be captured by a cubic Taylor expansion: as the iterates diverge in direction of the top eigenvector of the Hessian due to instability, the cubic term in the local Taylor expansion of the loss function causes the curvature to decrease until stability is restored. This property, which we call self-stabilization, is a general property of gradient descent and explains its behavior at the edge of stability. A key consequence of self-stabilization is that gradient descent at the edge of stability implicitly follows projected gradient descent (PGD) under the constraint $S(\theta) \le 2/\eta$. Our analysis provides precise predictions for the loss, sharpness, and deviation from the PGD trajectory throughout training, which we verify both empirically in a number of standard settings and theoretically under mild conditions. Our analysis uncovers the mechanism for gradient descent's implicit bias towards stability.
    Learning with MISELBO: The Mixture Cookbook. (arXiv:2209.15514v1 [cs.LG])
    Mixture models in variational inference (VI) is an active field of research. Recent works have established their connection to multiple importance sampling (MIS) through the MISELBO and advanced the use of ensemble approximations for large-scale problems. However, as we show here, an independent learning of the ensemble components can lead to suboptimal diversity. Hence, we study the effect of instead using MISELBO as an objective function for learning mixtures, and we propose the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network. Two major insights led to the construction of this novel composite model. First, mixture models have potential to be off-the-shelf tools for practitioners to obtain more flexible posterior approximations in VAEs. Therefore, we make them more accessible by demonstrating how to apply them to four popular architectures. Second, the mixture components cooperate in order to cover the target distribution while trying to maximize their diversity when MISELBO is the objective function. We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling. Finally, we demonstrate the superiority of the Mixture VAEs' learned feature representations on both image and single-cell transcriptome data, and obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets. Code available here: \url{https://github.com/Lagergren-Lab/MixtureVAEs}.
    Improved Group Robustness via Classifier Retraining on Independent Splits. (arXiv:2204.09583v2 [cs.LG] UPDATED)
    Deep neural networks learned by minimizing the average risk can achieve strong average performance, but their performance for a subgroup may degrade, if the subgroup is underrepresented in the overall data population. Group distributionally robust optimization (Sagawa et al., 2020a, GDRO) is a standard baseline for learning models with strong worst-group performance. However, GDRO requires group labels for every example during training and can be prone to overfitting, often requiring careful model capacity control via regularization or early stopping. When only a limited amount of group labels is available, Just Train Twice (Liu et al., 2021, JTT) is a popular approach which infers a pseudo-group-label for every unlabeled example. The process of inferring pseudo labels can be highly sensitive during model selection. To alleviate overfitting for GDRO and the pseudo labeling process for JTT, we propose a new method via classifier retraining on independent splits (of the training data). We find that using a novel sample splitting procedure achieves robust worst-group performance in the fine-tuning step. When evaluated on benchmark image and text classification tasks, our approach consistently reduces the requirement of group labels and hyperparameter search during training. Experimental results confirm that our approach performs favorably compared with existing methods (including GDRO and JTT) when either group labels are available during training or are only available during validation.
    Optimal Efficiency-Envy Trade-Off via Optimal Transport. (arXiv:2209.15416v1 [cs.GT])
    We consider the problem of allocating a distribution of items to $n$ recipients where each recipient has to be allocated a fixed, prespecified fraction of all items, while ensuring that each recipient does not experience too much envy. We show that this problem can be formulated as a variant of the semi-discrete optimal transport (OT) problem, whose solution structure in this case has a concise representation and a simple geometric interpretation. Unlike existing literature that treats envy-freeness as a hard constraint, our formulation allows us to \emph{optimally} trade off efficiency and envy continuously. Additionally, we study the statistical properties of the space of our OT based allocation policies by showing a polynomial bound on the number of samples needed to approximate the optimal solution from samples. Our approach is suitable for large-scale fair allocation problems such as the blood donation matching problem, and we show numerically that it performs well on a prior realistic data simulator.
    Effective Early Stopping of Point Cloud Neural Networks. (arXiv:2209.15308v1 [cs.CV])
    Early stopping techniques can be utilized to decrease the time cost, however currently the ultimate goal of early stopping techniques is closely related to the accuracy upgrade or the ability of the neural network to generalize better on unseen data without being large or complex in structure and not directly with its efficiency. Time efficiency is a critical factor in neural networks, especially when dealing with the segmentation of 3D point cloud data, not only because a neural network itself is computationally expensive, but also because point clouds are large and noisy data, making learning processes even more costly. In this paper, we propose a new early stopping technique based on fundamental mathematics aiming to upgrade the trade-off between the learning efficiency and accuracy of neural networks dealing with 3D point clouds. Our results show that by employing our early stopping technique in four distinct and highly utilized neural networks in segmenting 3D point clouds, the training time efficiency of the models is greatly improved, with efficiency gain values reaching up to 94\%, while the models achieving in just a few epochs approximately similar segmentation accuracy metric values like the ones that are obtained in the training of the neural networks in 200 epochs. Also, our proposal outperforms four conventional early stopping approaches in segmentation accuracy, implying a promising innovative early stopping technique in point cloud segmentation.
    Tuning of Mixture-of-Experts Mixed-Precision Neural Networks. (arXiv:2209.15427v1 [cs.LG])
    Deep learning has become a useful data analysis method, however mainstream adaption in distributed computer software and embedded devices has been low so far. Often, adding deep learning inference in mainstream applications and devices requires new hardware with signal processors suited for convolutional neural networks. This work adds new data types (quantized 16-bit and 8-bit integer, 16-bit floating point) to Caffe in order to save memory and increase inference speed on existing commodity graphics processors with OpenCL, common in everyday devices. Existing models can be executed effortlessly in mixed-precision mode. Additionally, we propose a variation of mixture-of-experts to increase inference speed on AlexNet for image classification. We managed to decrease memory usage up to 3.29x while increasing inference speed up to 3.01x on certain devices. We demonstrate with five simple examples how the presented techniques can easily be applied to different machine learning problems. The whole pipeline, consisting of models, example python scripts and modified Caffe library, is available as Open Source software.
    Efficient LSTM Training with Eligibility Traces. (arXiv:2209.15502v1 [cs.LG])
    Training recurrent neural networks is predominantly achieved via backpropagation through time (BPTT). However, this algorithm is not an optimal solution from both a biological and computational perspective. A more efficient and biologically plausible alternative for BPTT is e-prop. We investigate the applicability of e-prop to long short-term memorys (LSTMs), for both supervised and reinforcement learning (RL) tasks. We show that e-prop is a suitable optimization algorithm for LSTMs by comparing it to BPTT on two benchmarks for supervised learning. This proves that e-prop can achieve learning even for problems with long sequences of several hundred timesteps. We introduce extensions that improve the performance of e-prop, which can partially be applied to other network architectures. With the help of these extensions we show that, under certain conditions, e-prop can outperform BPTT for one of the two benchmarks for supervised learning. Finally, we deliver a proof of concept for the integration of e-prop to RL in the domain of deep recurrent Q-learning.
    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control. (arXiv:2110.01052v5 [cs.LG] UPDATED)
    We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use the framework to provide new calibration methods for several core machine learning tasks, with detailed worked examples in computer vision and tabular medical data.
    Double Graphs Regularized Multi-view Subspace Clustering. (arXiv:2209.15143v1 [cs.LG])
    Recent years have witnessed a growing academic interest in multi-view subspace clustering. In this paper, we propose a novel Double Graphs Regularized Multi-view Subspace Clustering (DGRMSC) method, which aims to harness both global and local structural information of multi-view data in a unified framework. Specifically, DGRMSC firstly learns a latent representation to exploit the global complementary information of multiple views. Based on the learned latent representation, we learn a self-representation to explore its global cluster structure. Further, Double Graphs Regularization (DGR) is performed on both latent representation and self-representation to take advantage of their local manifold structures simultaneously. Then, we design an iterative algorithm to solve the optimization problem effectively. Extensive experimental results on real-world datasets demonstrate the effectiveness of the proposed method.
    Dynamic-Backbone Protein-Ligand Structure Prediction with Multiscale Generative Diffusion Models. (arXiv:2209.15171v1 [q-bio.QM])
    Molecular complexes formed by proteins and small-molecule ligands are ubiquitous, and predicting their 3D structures can facilitate both biological discoveries and the design of novel enzymes or drug molecules. Here we propose NeuralPLexer, a deep generative model framework to rapidly predict protein-ligand complex structures and their fluctuations using protein backbone template and molecular graph inputs. NeuralPLexer jointly samples protein and small-molecule 3D coordinates at an atomistic resolution through a generative model that incorporates biophysical constraints and inferred proximity information into a time-truncated diffusion process. The reverse-time generative diffusion process is learned by a novel stereochemistry-aware equivariant graph transformer that enables efficient, concurrent gradient field prediction for all heavy atoms in the protein-ligand complex. NeuralPLexer outperforms existing physics-based and learning-based methods on benchmarking problems including fixed-backbone blind protein-ligand docking and ligand-coupled binding site repacking. Moreover, we identify preliminary evidence that NeuralPLexer enriches bound-state-like protein structures when applied to systems where protein folding landscapes are significantly altered by the presence of ligands. Our results reveal that a data-driven approach can capture the structural cooperativity among protein and small-molecule entities, showing promise for the computational identification of novel drug targets and the end-to-end differentiable design of functional small-molecules and ligand-binding proteins.
    Accurate Long-term Air Temperature Prediction with a Fusion of Artificial Intelligence and Data Reduction Techniques. (arXiv:2209.15424v1 [physics.ao-ph])
    In this paper three customised Artificial Intelligence (AI) frameworks, considering Deep Learning (convolutional neural networks), Machine Learning algorithms and data reduction techniques are proposed, for a problem of long-term summer air temperature prediction. Specifically, the prediction of average air temperature in the first and second August fortnights, using input data from previous months, at two different locations, Paris (France) and C\'ordoba (Spain), is considered. The target variable, mainly in the first August fortnight, can contain signals of extreme events such as heatwaves, like the mega-heatwave of 2003, which affected France and the Iberian Peninsula. Thus, an accurate prediction of long-term air temperature may be valuable also for different problems related to climate change, such as attribution of extreme events, and in other problems related to renewable energy. The analysis carried out this work is based on Reanalysis data, which are first processed by a correlation analysis among different prediction variables and the target (average air temperature in August first and second fortnights). An area with the largest correlation is located, and the variables within, after a feature selection process, are the input of different deep learning and ML algorithms. The experiments carried out show a very good prediction skill in the three proposed AI frameworks, both in Paris and C\'ordoba regions.
    Ensemble-based gradient inference for particle methods in optimization and sampling. (arXiv:2209.15420v1 [stat.ML])
    We propose an approach based on function evaluations and Bayesian inference to extract higher-order differential information of objective functions {from a given ensemble of particles}. Pointwise evaluation $\{V(x^i)\}_i$ of some potential $V$ in an ensemble $\{x^i\}_i$ contains implicit information about first or higher order derivatives, which can be made explicit with little computational effort (ensemble-based gradient inference -- EGI). We suggest to use this information for the improvement of established ensemble-based numerical methods for optimization and sampling such as Consensus-based optimization and Langevin-based samplers. Numerical studies indicate that the augmented algorithms are often superior to their gradient-free variants, in particular the augmented methods help the ensembles to escape their initial domain, to explore multimodal, non-Gaussian settings and to speed up the collapse at the end of optimization dynamics.} The code for the numerical examples in this manuscript can be found in the paper's Github repository (https://github.com/MercuryBench/ensemble-based-gradient.git).
    Rethinking skip connection model as a learnable Markov chain. (arXiv:2209.15278v1 [cs.LG])
    Over past few years afterward the birth of ResNet, skip connection has become the defacto standard for the design of modern architectures due to its widespread adoption, easy optimization and proven performance. Prior work has explained the effectiveness of the skip connection mechanism from different perspectives. In this work, we deep dive into the model's behaviors with skip connections which can be formulated as a learnable Markov chain. An efficient Markov chain is preferred as it always maps the input data to the target domain in a better way. However, while a model is explained as a Markov chain, it is not guaranteed to be optimized following an efficient Markov chain by existing SGD-based optimizers which are prone to get trapped in local optimal points. In order to towards a more efficient Markov chain, we propose a simple routine of penal connection to make any residual-like model become a learnable Markov chain. Aside from that, the penal connection can also be viewed as a particular model regularization and can be easily implemented with one line of code in the most popular deep learning frameworks~\footnote{Source code: \url{https://github.com/densechen/penal-connection}}. The encouraging experimental results in multi-modal translation and image recognition empirically confirm our conjecture of the learnable Markov chain view and demonstrate the superiority of the proposed penal connection.
    Many-Body Approximation for Tensors. (arXiv:2209.15338v1 [stat.ML])
    We propose a nonnegative tensor decomposition with focusing on the relationship between the modes of tensors. Traditional decomposition methods assume low-rankness in the representation, resulting in difficulties in global optimization and target rank selection. To address these problems, we present an alternative way to decompose tensors, a many-body approximation for tensors, based on an information geometric formulation. A tensor is treated via an energy-based model, where the tensor and its mode correspond to a probability distribution and a random variable, respectively, and many-body approximation is performed on it by taking the interaction between variables into account. Our model can be globally optimized in polynomial time in terms of the KL divergence minimization, which is empirically faster than low-rank approximations keeping comparable reconstruction error. Furthermore, we visualize interactions between modes as tensor networks and reveal a nontrivial relationship between many-body approximation and low-rank approximation.
    Ensemble Machine Learning Model Trained on a New Synthesized Dataset Generalizes Well for Stress Prediction Using Wearable Devices. (arXiv:2209.15146v1 [cs.LG])
    Introduction. We investigate the generalization ability of models built on datasets containing a small number of subjects, recorded in single study protocols. Next, we propose and evaluate methods combining these datasets into a single, large dataset. Finally, we propose and evaluate the use of ensemble techniques by combining gradient boosting with an artificial neural network to measure predictive power on new, unseen data. Methods. Sensor biomarker data from six public datasets were utilized in this study. To test model generalization, we developed a gradient boosting model trained on one dataset (SWELL), and tested its predictive power on two datasets previously used in other studies (WESAD, NEURO). Next, we merged four small datasets, i.e. (SWELL, NEURO, WESAD, UBFC-Phys), to provide a combined total of 99 subjects,. In addition, we utilized random sampling combined with another dataset (EXAM) to build a larger training dataset consisting of 200 synthesized subjects,. Finally, we developed an ensemble model that combines our gradient boosting model with an artificial neural network, and tested it on two additional, unseen publicly available stress datasets (WESAD and Toadstool). Results. Our method delivers a robust stress measurement system capable of achieving 85% predictive accuracy on new, unseen validation data, achieving a 25% performance improvement over single models trained on small datasets. Conclusion. Models trained on small, single study protocol datasets do not generalize well for use on new, unseen data and lack statistical power. Ma-chine learning models trained on a dataset containing a larger number of varied study subjects capture physiological variance better, resulting in more robust stress detection.
    TabDDPM: Modelling Tabular Data with Diffusion Models. (arXiv:2209.15421v1 [cs.LG])
    Denoising diffusion probabilistic models are currently becoming the leading paradigm of generative modeling for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have also recently gained some attention in other domains, including speech, NLP, and graph-like data. In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where datapoints are typically represented by vectors of heterogeneous features. The inherent heterogeneity of tabular data makes it quite challenging for accurate modeling, since the individual features can be of completely different nature, i.e., some of them can be continuous and some of them can be discrete. To address such data types, we introduce TabDDPM -- a diffusion model that can be universally applied to any tabular dataset and handles any type of feature. We extensively evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority over existing GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields. Additionally, we show that TabDDPM is eligible for privacy-oriented setups, where the original datapoints cannot be publicly shared.
    Convergence of weak-SINDy Surrogate Models. (arXiv:2209.15573v1 [math.NA])
    In this paper, we give an in-depth error analysis for surrogate models generated by a variant of the Sparse Identification of Nonlinear Dynamics (SINDy) method. We start with an overview of a variety of non-linear system identification techniques, namely, SINDy, weak-SINDy, and the occupation kernel method. Under the assumption that the dynamics are a finite linear combination of a set of basis functions, these methods establish a matrix equation to recover coefficients. We illuminate the structural similarities between these techniques and establish a projection property for the weak-SINDy technique. Following the overview, we analyze the error of surrogate models generated by a simplified version of weak-SINDy. In particular, under the assumption of boundedness of a composition operator given by the solution, we show that (i) the surrogate dynamics converges towards the true dynamics and (ii) the solution of the surrogate model is reasonably close to the true solution. Finally, as an application, we discuss the use of a combination of weak-SINDy surrogate modeling and proper orthogonal decomposition (POD) to build a surrogate model for partial differential equations (PDEs).
    Experts in the Loop: Conditional Variable Selection for Accelerating Post-Silicon Analysis Based on Deep Learning. (arXiv:2209.15249v1 [cs.LG])
    Post-silicon validation is one of the most critical processes in modern semiconductor manufacturing. Specifically, correct and deep understanding in test cases of manufactured devices is key to enable post-silicon tuning and debugging. This analysis is typically performed by experienced human experts. However, with the fast development in semiconductor industry, test cases can contain hundreds of variables. The resulting high-dimensionality poses enormous challenges to experts. Thereby, some recent prior works have introduced data-driven variable selection algorithms to tackle these problems and achieved notable success. Nevertheless, for these methods, experts are not involved in training and inference phases, which may lead to bias and inaccuracy due to the lack of prior knowledge. Hence, this work for the first time aims to design a novel conditional variable selection approach while keeping experts in the loop. In this way, we expect that our algorithm can be more efficiently and effectively trained to identify the most critical variables under certain expert knowledge. Extensive experiments on both synthetic and real-world datasets from industry have been conducted and shown the effectiveness of our method.
    Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning. (arXiv:2209.15078v1 [cs.LG])
    Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model. However, even state of the art algorithms can be difficult to tune for optimum performance. We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyperparameters, along with a mechanism for choosing the best performing set(s) on-line. In the literature, the ensemble technique is used to improve performance in general, but the current work specifically addresses decreasing the hyperparameter tuning effort. Furthermore, our approach targets on-line learning on a single robotic system, and does not require running multiple simulators in parallel. Although the idea is generic, the Deep Deterministic Policy Gradient was the model chosen, being a representative deep learning actor-critic method with good performance in continuous action settings but known high variance. We compare our online weighted q-ensemble approach to q-average ensemble strategies addressed in literature using alternate policy training, as well as online training, demonstrating the advantage of the new approach in eliminating hyperparameter tuning. The applicability to real-world systems was validated in common robotic benchmark environments: the bipedal robot half cheetah and the swimmer. Online Weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles using randomized parameterizations.  ( 3 min )
    The Minority Matters: A Diversity-Promoting Collaborative Metric Learning Algorithm. (arXiv:2209.15292v1 [cs.IR])
    Collaborative Metric Learning (CML) has recently emerged as a popular method in recommendation systems (RS), closing the gap between metric learning and Collaborative Filtering. Following the convention of RS, existing methods exploit unique user representation in their model design. This paper focuses on a challenging scenario where a user has multiple categories of interests. Under this setting, we argue that the unique user representation might induce preference bias, especially when the item category distribution is imbalanced. To address this issue, we propose a novel method called \textit{Diversity-Promoting Collaborative Metric Learning} (DPCML), with the hope of considering the commonly ignored minority interest of the user. The key idea behind DPCML is to include a multiple set of representations for each user in the system. Based on this embedding paradigm, user preference toward an item is aggregated from different embeddings by taking the minimum item-user distance among the user embedding set. Furthermore, we observe that the diversity of the embeddings for the same user also plays an essential role in the model. To this end, we propose a \textit{diversity control regularization} term to accommodate the multi-vector representation strategy better. Theoretically, we show that DPCML could generalize well to unseen test data by tackling the challenge of the annoying operation that comes from the minimum value. Experiments over a range of benchmark datasets speak to the efficacy of DPCML.
    Machine Unlearning Method Based On Projection Residual. (arXiv:2209.15276v1 [cs.LG])
    Machine learning models (mainly neural networks) are used more and more in real life. Users feed their data to the model for training. But these processes are often one-way. Once trained, the model remembers the data. Even when data is removed from the dataset, the effects of these data persist in the model. With more and more laws and regulations around the world protecting data privacy, it becomes even more important to make models forget this data completely through machine unlearning. This paper adopts the projection residual method based on Newton iteration method. The main purpose is to implement machine unlearning tasks in the context of linear regression models and neural network models. This method mainly uses the iterative weighting method to completely forget the data and its corresponding influence, and its computational cost is linear in the feature dimension of the data. This method can improve the current machine learning method. At the same time, it is independent of the size of the training set. Results were evaluated by feature injection testing (FIT). Experiments show that this method is more thorough in deleting data, which is close to model retraining.
    New Metric Formulas that Include Measurement Errors in Machine Learning for Natural Sciences. (arXiv:2209.15588v1 [cs.LG])
    The application of machine learning to physics problems is widely found in the scientific literature. Both regression and classification problems are addressed by a large array of techniques that involve learning algorithms. Unfortunately, the measurement errors of the data used to train machine learning models are almost always neglected. This leads to estimations of the performance of the models (and thus their generalisation power) that is too optimistic since it is always assumed that the target variables (what one wants to predict) are correct. In physics, this is a dramatic deficiency as it can lead to the belief that theories or patterns exist where, in reality, they do not. This paper addresses this deficiency by deriving formulas for commonly used metrics (both for regression and classification problems) that take into account measurement errors of target variables. The new formulas give an estimation of the metrics which is always more pessimistic than what is obtained with the classical ones, not taking into account measurement errors. The formulas given here are of general validity, completely model-independent, and can be applied without limitations. Thus, with statistical confidence, one can analyze the existence of relationships when dealing with measurements with errors of any kind. The formulas have wide applicability outside physics and can be used in all problems where measurement errors are relevant to the conclusions of studies.
    Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders. (arXiv:2202.09671v3 [stat.ML] UPDATED)
    Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy-data distribution that we can confidently learn. Then, we use fewer reverse steps to generate data by starting from this hidden distribution that is made similar to the noisy data. We reveal that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior. Experimental results show even with a significantly smaller number of reverse diffusion steps, the proposed truncated diffusion probabilistic models can provide consistent improvements over the non-truncated ones in terms of performance in both unconditional and text-guided image generations.
    Two-headed eye-segmentation approach for biometric identification. (arXiv:2209.15471v1 [cs.CV])
    Iris-based identification systems are among the most popular approaches for person identification. Such systems require good-quality segmentation modules that ideally identify the regions for different eye components. This paper introduces the new two-headed architecture, where the eye components and eyelashes are segmented using two separate decoding modules. Moreover, we investigate various training scenarios by adopting different training losses. Thanks to the two-headed approach, we were also able to examine the quality of the model with the convex prior, which enforces the convexity of the segmented shapes. We conducted an extensive evaluation of various learning scenarios on real-life conditions high-resolution near-infrared iris images.
    Explainable Censored Learning: Finding Critical Features with Long Term Prognostic Values for Survival Prediction. (arXiv:2209.15450v1 [cs.LG])
    Interpreting critical variables involved in complex biological processes related to survival time can help understand prediction from survival models, evaluate treatment efficacy, and develop new therapies for patients. Currently, the predictive results of deep learning (DL)-based models are better than or as good as standard survival methods, they are often disregarded because of their lack of transparency and little interpretability, which is crucial to their adoption in clinical applications. In this paper, we introduce a novel, easily deployable approach, called EXplainable CEnsored Learning (EXCEL), to iteratively exploit critical variables and simultaneously implement (DL) model training based on these variables. First, on a toy dataset, we illustrate the principle of EXCEL; then, we mathematically analyze our proposed method, and we derive and prove tight generalization error bounds; next, on two semi-synthetic datasets, we show that EXCEL has good anti-noise ability and stability; finally, we apply EXCEL to a variety of real-world survival datasets including clinical data and genetic data, demonstrating that EXCEL can effectively identify critical features and achieve performance on par with or better than the original models. It is worth pointing out that EXCEL is flexibly deployed in existing or emerging models for explainable survival data in the presence of right censoring.
    Safe Exploration Method for Reinforcement Learning under Existence of Disturbance. (arXiv:2209.15452v1 [cs.LG])
    Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
    Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization. (arXiv:2203.02214v5 [cs.LG] UPDATED)
    Recent progress in state-only imitation learning extends the scope of applicability of imitation learning to real-world settings by relieving the need for observing expert actions. However, existing solutions only learn to extract a state-to-action mapping policy from the data, without considering how the expert plans to the target. This hinders the ability to leverage demonstrations and limits the flexibility of the policy. In this paper, we introduce Decoupled Policy Optimization (DePO), which explicitly decouples the policy as a high-level state planner and an inverse dynamics model. With embedded decoupled policy gradient and generative adversarial training, DePO enables knowledge transfer to different action spaces or state transition dynamics, and can generalize the planner to out-of-demonstration state regions. Our in-depth experimental analysis shows the effectiveness of DePO on learning a generalized target state planner while achieving the best imitation performance. We demonstrate the appealing usage of DePO for transferring across different tasks by pre-training, and the potential for co-training agents with various skills.
    Data Poisoning Attacks Against Multimodal Encoders. (arXiv:2209.15266v1 [cs.CR])
    Traditional machine learning (ML) models usually rely on large-scale labeled datasets to achieve strong performance. However, such labeled datasets are often challenging and expensive to obtain. Also, the predefined categories limit the model's ability to generalize to other visual concepts as additional labeled data is required. On the contrary, the newly emerged multimodal model, which contains both visual and linguistic modalities, learns the concept of images from the raw text. It is a promising way to solve the above problems as it can use easy-to-collect image-text pairs to construct the training dataset and the raw texts contain almost unlimited categories according to their semantics. However, learning from a large-scale unlabeled dataset also exposes the model to the risk of potential poisoning attacks, whereby the adversary aims to perturb the model's training dataset to trigger malicious behaviors in it. Previous work mainly focuses on the visual modality. In this paper, we instead focus on answering two questions: (1) Is the linguistic modality also vulnerable to poisoning attacks? and (2) Which modality is most vulnerable? To answer the two questions, we conduct three types of poisoning attacks against CLIP, the most representative multimodal contrastive learning framework. Extensive evaluations on different datasets and model architectures show that all three attacks can perform well on the linguistic modality with only a relatively low poisoning rate and limited epochs. Also, we observe that the poisoning effect differs between different modalities, i.e., with lower MinRank in the visual modality and with higher Hit@K when K is small in the linguistic modality. To mitigate the attacks, we propose both pre-training and post-training defenses. We empirically show that both defenses can significantly reduce the attack performance while preserving the model's utility.
    Domain Generalization -- A Causal Perspective. (arXiv:2209.15177v1 [cs.LG])
    Machine learning models have gained widespread success, from healthcare to personalized recommendations. One of the preliminary assumptions of these models is the independent and identical distribution. Therefore, the train and test data are sampled from the same observation per this assumption. However, this assumption seldom holds in the real world due to distribution shifts. Since the models rely heavily on this assumption, they exhibit poor generalization capabilities. Over the recent years, dedicated efforts have been made to improve the generalization capabilities of these models. The primary idea behind these methods is to identify stable features or mechanisms that remain invariant across the different distributions. Many generalization approaches employ causal theories to describe invariance since causality and invariance are inextricably intertwined. However, current surveys deal with the causality-aware domain generalization methods on a very high-level. Furthermore, none of the existing surveys categorize the causal domain generalization methods based on the problem and causal theories these methods leverage. To this end, we present a comprehensive survey on causal domain generalization models from the aspects of the problem and causal theories. Furthermore, this survey includes in-depth insights into publicly accessible datasets and benchmarks for domain generalization in various domains. Finally, we conclude the survey with insights and discussions on future research directions. Finally, we conclude the survey with insights and discussions on future research directions.
    GM-VAE: Representation Learning with VAE on Gaussian Manifold. (arXiv:2209.15217v1 [cs.LG])
    We propose a Gaussian manifold variational auto-encoder (GM-VAE) whose latent space consists of a set of diagonal Gaussian distributions. It is known that the set of the diagonal Gaussian distributions with the Fisher information metric forms a product hyperbolic space, which we call a Gaussian manifold. To learn the VAE endowed with the Gaussian manifold, we first propose a pseudo Gaussian manifold normal distribution based on the Kullback-Leibler divergence, a local approximation of the squared Fisher-Rao distance, to define a density over the latent space. With the newly proposed distribution, we introduce geometric transformations at the last and the first of the encoder and the decoder of VAE, respectively to help the transition between the Euclidean and Gaussian manifolds. Through the empirical experiments, we show competitive generalization performance of GM-VAE against other variants of hyperbolic- and Euclidean-VAEs. Our model achieves strong numerical stability, which is a common limitation reported with previous hyperbolic-VAEs.
    Toward Discovering Options that Achieve Faster Planning. (arXiv:2205.12515v2 [cs.LG] UPDATED)
    We propose a new objective for option discovery that emphasizes the computational advantage of using options in planning. In a sequential machine, the speed of planning is proportional to the number of elementary operations used to achieve a good policy. For episodic tasks, the number of elementary operations depends on the number of options composed by the policy in an episode and the number of options being considered at each decision point. To reduce the amount of computation in planning, for a given set of episodic tasks and a given number of options, our objective prefers options with which it is possible to achieve a high return by composing few options, and also prefers a smaller set of options to choose from at each decision point. We develop an algorithm that optimizes the proposed objective. In a variant of the classic four-room domain, we show that 1) a higher objective value is typically associated with fewer number of elementary planning operations used by the option-value iteration algorithm to obtain a near-optimal value function, 2) our algorithm achieves an objective value that matches it achieved by two human-designed options 3) the amount of computation used by option-value iteration with options discovered by our algorithm matches it with the human-designed options, 4) the options produced by our algorithm also make intuitive sense--they seem to move to and terminate at the entrances of rooms.
    Understanding Interventional TreeSHAP : How and Why it Works. (arXiv:2209.15123v1 [cs.LG])
    Shapley values are ubiquitous in interpretable Machine Learning due to their strong theoretical background and efficient implementation in the SHAP library. Computing these values used to induce an exponential cost with respect to the number of input features of an opaque model. Now, with efficient implementations such as Interventional TreeSHAP, this exponential burden is alleviated assuming one is explaining ensembles of decision trees. Although Interventional TreeSHAP has risen in popularity, it still lacks a formal proof of how/why it works. We provide such proof with the aim of not only increasing the transparency of the algorithm but also to encourage further development of these ideas. Notably, our proof for Interventional TreeSHAP is easily adapted to Shapley-Taylor indices.
    A Survey: Credit Sentiment Score Prediction. (arXiv:2209.15293v1 [cs.CE])
    Manual approvals are still used by banks and other NGOs to approve loans. It takes time and is prone to mistakes because it is controlled by a bank employee. Several fields of machine learning mining technologies have been utilized to enhance various areas of credit rating forecast. A major goal of this research is to look at current sentiment analysis techniques that are being used to generate creditworthiness.
    Metro: Memory-Enhanced Transformer for Retrosynthetic Planning via Reaction Tree. (arXiv:2209.15315v1 [cs.LG])
    Retrosynthetic planning plays a critical role in drug discovery and organic chemistry. Starting from a target molecule as the root node, it aims to find a complete reaction tree subject to the constraint that all leaf nodes belong to a set of starting materials. The multi-step reactions are crucial because they determine the flow chart in the production of the Organic Chemical Industry. However, existing datasets lack curation of tree-structured multi-step reactions, and fail to provide such reaction trees, limiting models' understanding of organic molecule transformations. In this work, we first develop a benchmark curated for the retrosynthetic planning task, which consists of 124,869 reaction trees retrieved from the public USPTO-full dataset. On top of that, we propose Metro: Memory-Enhanced Transformer for RetrOsynthetic planning. Specifically, the dependency among molecules in the reaction tree is captured as context information for multi-step retrosynthesis predictions through transformers with a memory module. Extensive experiments show that Metro dramatically outperforms existing single-step retrosynthesis models by at least 10.7% in top-1 accuracy. The experiments demonstrate the superiority of exploiting context information in the retrosynthetic planning task. Moreover, the proposed model can be directly used for synthetic accessibility analysis, as it is trained on reaction trees with the shortest depths. Our work is the first step towards a brand new formulation for retrosynthetic planning in the aspects of data construction, model design, and evaluation. Code is available at https://github.com/SongtaoLiu0823/metro.
    Observational Robustness and Invariances in Reinforcement Learning via Lexicographic Objectives. (arXiv:2209.15320v1 [cs.LG])
    Policy robustness in Reinforcement Learning (RL) may not be desirable at any price; the alterations caused by robustness requirements from otherwise optimal policies should be explainable and quantifiable. Policy gradient algorithms that have strong convergence guarantees are usually modified to obtain robust policies in ways that do not preserve algorithm guarantees, which defeats the purpose of formal robustness requirements. In this work we study a notion of robustness in partially observable MDPs where state observations are perturbed by a noise-induced stochastic kernel. We characterise the set of policies that are maximally robust by analysing how the policies are altered by this kernel. We then establish a connection between such robust policies and certain properties of the noise kernel, as well as with structural properties of the underlying MDPs, constructing sufficient conditions for policy robustness. We use these notions to propose a robustness-inducing scheme, applicable to any policy gradient algorithm, to formally trade off the reward achieved by a policy with its robustness level through lexicographic optimisation, which preserves convergence properties of the original algorithm. We test the the proposed approach through numerical experiments on safety-critical RL environments, and show how the proposed method helps achieve high robustness when state errors are introduced in the policy roll-out.
    RL-MD: A Novel Reinforcement Learning Approach for DNA Motif Discovery. (arXiv:2209.15181v1 [cs.LG])
    The extraction of sequence patterns from a collection of functionally linked unlabeled DNA sequences is known as DNA motif discovery, and it is a key task in computational biology. Several deep learning-based techniques have recently been introduced to address this issue. However, these algorithms can not be used in real-world situations because of the need for labeled data. Here, we presented RL-MD, a novel reinforcement learning based approach for DNA motif discovery task. RL-MD takes unlabelled data as input, employs a relative information-based method to evaluate each proposed motif, and utilizes these continuous evaluation results as the reward. The experiments show that RL-MD can identify high-quality motifs in real-world data.  ( 2 min )
    Leveraging variational autoencoders for multiple data imputation. (arXiv:2209.15321v1 [stat.ML])
    Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations, particularly for more extreme missing data values. To overcome this, we employ $\beta$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of $\beta$ is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. In downstream tasks, we show how multiple imputation with $\beta$-VAEs can avoid false discoveries that arise as artefacts of imputation.
    ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery. (arXiv:2209.15265v1 [cs.LG])
    The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We set out to resolve this discrepancy from a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models. The situation is simple: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models even when the labels are noisy. The phase transition phenomenon is confirmed through numerical experiments.
    Sparse Random Networks for Communication-Efficient Federated Learning. (arXiv:2209.15328v1 [cs.LG])
    One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient compression methods, we propose a radically different approach that does not update the weights at all. Instead, our method freezes the weights at their initial \emph{random} values and learns how to sparsify the random network for the best performance. To this end, the clients collaborate in training a \emph{stochastic} binary mask to find the optimal sparse random network within the original one. At the end of the training, the final model is a sparse network with random weights -- or a subnetwork inside the dense random network. We show improvements in accuracy, communication (less than $1$ bit per parameter (bpp)), convergence speed, and final model size (less than $1$ bpp) over relevant baselines on MNIST, EMNIST, CIFAR-10, and CIFAR-100 datasets, in the low bitrate regime under various system configurations.
    Equitable Marketplace Mechanism Design. (arXiv:2209.15418v1 [cs.GT])
    We consider a trading marketplace that is populated by traders with diverse trading strategies and objectives. The marketplace allows the suppliers to list their goods and facilitates matching between buyers and sellers. In return, such a marketplace typically charges fees for facilitating trade. The goal of this work is to design a dynamic fee schedule for the marketplace that is equitable and profitable to all traders while being profitable to the marketplace at the same time (from charging fees). Since the traders adapt their strategies to the fee schedule, we present a reinforcement learning framework for simultaneously learning a marketplace fee schedule and trading strategies that adapt to this fee schedule using a weighted optimization objective of profits and equitability. We illustrate the use of the proposed approach in detail on a simulated stock exchange with different types of investors, specifically market makers and consumer investors. As we vary the equitability weights across different investor classes, we see that the learnt exchange fee schedule starts favoring the class of investors with the highest weight. We further discuss the observed insights from the simulated stock exchange in light of the general framework of equitable marketplace mechanism design.
    ASPiRe:Adaptive Skill Priors for Reinforcement Learning. (arXiv:2209.15205v1 [cs.LG])
    We introduce ASPiRe (Adaptive Skill Prior for RL), a new approach that leverages prior experience to accelerate reinforcement learning. Unlike existing methods that learn a single skill prior from a large and diverse dataset, our framework learns a library of different distinction skill priors (i.e., behavior priors) from a collection of specialized datasets, and learns how to combine them to solve a new task. This formulation allows the algorithm to acquire a set of specialized skill priors that are more reusable for downstream tasks; however, it also brings up additional challenges of how to effectively combine these unstructured sets of skill priors to form a new prior for new tasks. Specifically, it requires the agent not only to identify which skill prior(s) to use but also how to combine them (either sequentially or concurrently) to form a new prior. To achieve this goal, ASPiRe includes Adaptive Weight Module (AWM) that learns to infer an adaptive weight assignment between different skill priors and uses them to guide policy learning for downstream tasks via weighted Kullback-Leibler divergences. Our experiments demonstrate that ASPiRe can significantly accelerate the learning of new downstream tasks in the presence of multiple priors and show improvement on competitive baselines.  ( 2 min )
    Smooth Bilevel Programming for Sparse Regularization. (arXiv:2106.01429v2 [stat.ML] UPDATED)
    Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning. State of the art approaches are more efficient but typically rely on specific coordinate pruning schemes. In this work, we show how a surprisingly simple reparametrization of IRLS, coupled with a bilevel resolution (instead of an alternating scheme) is able to achieve top performances on a wide range of sparsity (such as Lasso, group Lasso and trace norm regularizations), regularization strength (including hard constraints), and design matrices (ranging from correlated designs to differential operators). Similarly to IRLS, our method only involves linear systems resolutions, but in sharp contrast, corresponds to the minimization of a smooth function. Despite being non-convex, we show that there is no spurious minima and that saddle points are "ridable", so that there always exists a descent direction. We thus advocate for the use of a BFGS quasi-Newton solver, which makes our approach simple, robust and efficient. We perform a numerical benchmark of the convergence speed of our algorithm against state of the art solvers for Lasso, group Lasso, trace norm and linearly constrained problems. These results highlight the versatility of our approach, removing the need to use different solvers depending on the specificity of the ML problem under study.  ( 3 min )
    Probabilistic Metamodels for an Efficient Characterization of Complex Driving Scenarios. (arXiv:2110.02892v3 [cs.LG] UPDATED)
    To validate the safety of automated vehicles (AV), scenario-based testing aims to systematically describe driving scenarios an AV might encounter. In this process, continuous inputs such as velocities result in an infinite number of possible variations of a scenario. Thus, metamodels are used to perform analyses or to select specific variations for examination. However, despite the safety criticality of AV testing, metamodels are usually seen as a part of an overall approach, and their predictions are not questioned. This paper analyzes the predictive performance of Gaussian processes (GP), deep Gaussian processes, extra-trees, and Bayesian neural networks (BNN), considering four scenarios with 5 to 20 inputs. Building on this, an iterative approach is introduced and evaluated, which allows to efficiently select test cases for common analysis tasks. The results show that regarding predictive performance, the appropriate selection of test cases is more important than the choice of metamodels. However, the choice of metamodels remains crucial: Their great flexibility allows BNNs to benefit from large amounts of data and to model even the most complex scenarios. In contrast, less flexible models like GPs convince with higher reliability. Hence, relevant test cases are best explored using scalable virtual test setups and flexible models. Subsequently, more realistic test setups and more reliable models can be used for targeted testing and validation.  ( 3 min )
    Sequential Importance Sampling for Hybrid Model Bayesian Inference to Support Bioprocess Mechanism Learning and Robust Control. (arXiv:2205.02410v4 [stat.ML] UPDATED)
    Driven by the critical needs of biomanufacturing 4.0, we introduce a probabilistic knowledge graph hybrid model characterizing the risk- and science-based understanding of bioprocess mechanisms. It can faithfully capture the important properties, including nonlinear reactions, partially observed state, and nonstationary dynamics. Given very limited real process observations, we derive a posterior distribution quantifying model estimation uncertainty. To avoid the evaluation of intractable likelihoods, Approximate Bayesian Computation sampling with Sequential Monte Carlo (ABC-SMC) is utilized to approximate the posterior distribution. Under high stochastic and model uncertainties, it is computationally expensive to match output trajectories. Therefore, we create a linear Gaussian dynamic Bayesian network (LG-DBN) auxiliary likelihood-based ABC-SMC approach. Through matching the summary statistics driven through LG-DBN likelihood that can capture critical interactions and variations, the proposed algorithm can accelerate hybrid model inference, support process monitoring, and facilitate mechanism learning and robust control.  ( 2 min )
    Family-Based Fingerprint Analysis: A Position Paper. (arXiv:2209.15620v1 [cs.CR])
    Thousands of vulnerabilities are reported on a monthly basis to security repositories, such as the National Vulnerability Database. Among these vulnerabilities, software misconfiguration is one of the top 10 security risks for web applications. With this large influx of vulnerability reports, software fingerprinting has become a highly desired capability to discover distinctive and efficient signatures and recognize reportedly vulnerable software implementations. Due to the exponential worst-case complexity of fingerprint matching, designing more efficient methods for fingerprinting becomes highly desirable, especially for variability-intensive systems where optional features add another exponential factor to its analysis. This position paper presents our vision of a framework that lifts model learning and family-based analysis principles to software fingerprinting. In this framework, we propose unifying databases of signatures into a featured finite state machine and using presence conditions to specify whether and in which circumstances a given input-output trace is observed. We believe feature-based signatures can aid performance improvements by reducing the size of fingerprints under analysis.
    One-Shot Adaptation of GAN in Just One CLIP. (arXiv:2203.09301v3 [cs.CV] UPDATED)
    There are many recent research efforts to fine-tune a pre-trained generator with a few target images to generate images of a novel domain. Unfortunately, these methods often suffer from overfitting or under-fitting when fine-tuned with a single target image. To address this, here we present a novel single-shot GAN adaptation method through unified CLIP space manipulations. Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization, followed by generator fine-tuning with a novel loss function that imposes CLIP space consistency between the source and adapted generators. To further improve the adapted model to produce spatially consistent samples with respect to the source generator, we also propose contrastive regularization for patchwise relationships in the CLIP space. Experimental results show that our model generates diverse outputs with the target texture and outperforms the baseline models both qualitatively and quantitatively. Furthermore, we show that our CLIP space manipulation strategy allows more effective attribute editing.  ( 2 min )
    Predicting the power grid frequency of European islands. (arXiv:2209.15414v1 [stat.AP])
    Modelling, forecasting and overall understanding of the dynamics of the power grid and its frequency is essential for the safe operation of existing and future power grids. Much previous research was focused on large continental areas, while small systems, such as islands are less well-studied. These natural island systems are ideal testing environments for microgrid proposals and artificially islanded grid operation. In the present paper, we utilize measurements of the power grid frequency obtained in European islands: the Faroe Islands, Ireland, the Balearic Islands and Iceland and investigate how their frequency can be predicted, compared to the Nordic power system, acting as a reference. The Balearic islands are found to be particularly deterministic and easy to predict in contrast to hard-to-predict Iceland. Furthermore, we show that typically 2-4 weeks of data are needed to improve prediction performance beyond simple benchmarks.
    Efficient computation of the Knowledge Gradient for Bayesian Optimization. (arXiv:2209.15367v1 [cs.LG])
    Bayesian optimization is a powerful collection of methods for optimizing stochastic expensive black box functions. One key component of a Bayesian optimization algorithm is the acquisition function that determines which solution should be evaluated in every iteration. A popular and very effective choice is the Knowledge Gradient acquisition function, however there is no analytical way to compute it. Several different implementations make different approximations. In this paper, we review and compare the spectrum of Knowledge Gradient implementations and propose One-shot Hybrid KG, a new approach that combines several of the previously proposed ideas and is cheap to compute as well as powerful and efficient. We prove the new method preserves theoretical properties of previous methods and empirically show the drastically reduced computational overhead with equal or improved performance. All experiments are implemented in BOTorch and code is available on github.  ( 2 min )
    Your Out-of-Distribution Detection Method is Not Robust!. (arXiv:2209.15246v1 [cs.CV])
    Out-of-distribution (OOD) detection has recently gained substantial attention due to the importance of identifying out-of-domain samples in reliability and safety. Although OOD detection methods have advanced by a great deal, they are still susceptible to adversarial examples, which is a violation of their purpose. To mitigate this issue, several defenses have recently been proposed. Nevertheless, these efforts remained ineffective, as their evaluations are based on either small perturbation sizes, or weak attacks. In this work, we re-examine these defenses against an end-to-end PGD attack on in/out data with larger perturbation sizes, e.g. up to commonly used $\epsilon=8/255$ for the CIFAR-10 dataset. Surprisingly, almost all of these defenses perform worse than a random detection under the adversarial setting. Next, we aim to provide a robust OOD detection method. In an ideal defense, the training should expose the model to almost all possible adversarial perturbations, which can be achieved through adversarial training. That is, such training perturbations should based on both in- and out-of-distribution samples. Therefore, unlike OOD detection in the standard setting, access to OOD, as well as in-distribution, samples sounds necessary in the adversarial training setup. These tips lead us to adopt generative OOD detection methods, such as OpenGAN, as a baseline. We subsequently propose the Adversarially Trained Discriminator (ATD), which utilizes a pre-trained robust model to extract robust features, and a generator model to create OOD samples. Using ATD with CIFAR-10 and CIFAR-100 as the in-distribution data, we could significantly outperform all previous methods in the robust AUROC while maintaining high standard AUROC and classification accuracy. The code repository is available at https://github.com/rohban-lab/ATD .
    Re-parameterizing Your Optimizers rather than Architectures. (arXiv:2205.15242v2 [cs.LG] UPDATED)
    The well-designed structures in neural networks reflect the prior knowledge incorporated into the models. However, though different models have various priors, we are used to training them with model-agnostic optimizers such as SGD. In this paper, we propose to incorporate model-specific prior knowledge into optimizers by modifying the gradients according to a set of model-specific hyper-parameters. Such a methodology is referred to as Gradient Re-parameterization, and the optimizers are named RepOptimizers. For the extreme simplicity of model structure, we focus on a VGG-style plain model and showcase that such a simple model trained with a RepOptimizer, which is referred to as RepOpt-VGG, performs on par with or better than the recent well-designed models. From a practical perspective, RepOpt-VGG is a favorable base model because of its simple structure, high inference speed and training efficiency. Compared to Structural Re-parameterization, which adds priors into models via constructing extra training-time structures, RepOptimizers require no extra forward/backward computations and solve the problem of quantization. We hope to spark further research beyond the realms of model structure design. The code and models are publicly available at https://github.com/DingXiaoH/RepOptimizers.  ( 2 min )
    Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. (arXiv:2209.15224v1 [stat.ML])
    Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that not only can effectively utilize unknown similarity between related tasks but is also robust against a fraction of outlier tasks from arbitrary sources. The proposed procedure is shown to achieve minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Finally, we demonstrate the effectiveness of our methods through simulations and a real data analysis. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.  ( 2 min )
    Depth-Wise Attention (DWAtt): A Layer Fusion Method for Data-Efficient Classification. (arXiv:2209.15168v1 [cs.CL])
    Language Models pretrained on large textual data have been shown to encode different types of knowledge simultaneously. Traditionally, only the features from the last layer are used when adapting to new tasks or data. We put forward that, when using or finetuning deep pretrained models, intermediate layer features that may be relevant to the downstream task are buried too deep to be used efficiently in terms of needed samples or steps. To test this, we propose a new layer fusion method: Depth-Wise Attention (DWAtt), to help re-surface signals from non-final layers. We compare DWAtt to a basic concatenation-based layer fusion method (Concat), and compare both to a deeper model baseline -- all kept within a similar parameter budget. Our findings show that DWAtt and Concat are more step- and sample-efficient than the baseline, especially in the few-shot setting. DWAtt outperforms Concat on larger data sizes. On CoNLL-03 NER, layer fusion shows 3.68-9.73% F1 gain at different few-shot sizes. The layer fusion models presented significantly outperform the baseline in various training scenarios with different data sizes, architectures, and training constraints.
    Linearly Mapping from Image to Text Space. (arXiv:2209.15162v1 [cs.CL])
    The extent to which text-only language models (LMs) learn to represent the physical, non-linguistic world is an open question. Prior work has shown that pretrained LMs can be taught to ``understand'' visual inputs when the models' parameters are updated on image captioning tasks. We test a stronger hypothesis: that the conceptual representations learned by text-only models are functionally equivalent (up to a linear transformation) to those learned by models trained on vision tasks. Specifically, we show that the image representations from vision models can be transferred as continuous prompts to frozen LMs by training only a single linear projection. Using these to prompt the LM achieves competitive performance on captioning and visual question answering tasks compared to models that tune both the image encoder and text decoder (such as the MAGMA model). We compare three image encoders with increasing amounts of linguistic supervision seen during pretraining: BEIT (no linguistic information), NF-ResNET (lexical category information), and CLIP (full natural language descriptions). We find that all three encoders perform equally well at transferring visual property information to the language model (e.g., whether an animal is large or small), but that image encoders pretrained with linguistic supervision more saliently encode category information (e.g., distinguishing hippo vs.\ elephant) and thus perform significantly better on benchmark language-and-vision tasks. Our results indicate that LMs encode conceptual information structurally similarly to vision-based models, even those that are solely trained on images.
    Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility. (arXiv:2105.03692v2 [cs.LG] UPDATED)
    As deep learning datasets grow larger and less curated, backdoor data poisoning attacks, which inject malicious poisoned data into the training dataset, have drawn increasing attention in both academia and industry. We identify an incompatibility property of the interaction of clean and poisoned data with the training algorithm, specifically that including poisoned data in the training dataset does not improve model accuracy on clean data and vice-versa. Leveraging this property, we develop an algorithm that iteratively refines subsets of the poisoned dataset to obtain subsets that concentrate around either clean or poisoned data. The result is a partition of the original dataset into disjoint subsets, for each of which we train a corresponding model. A voting algorithm over these models identifies the clean data within the larger poisoned dataset. We empirically evaluate our approach and technique for image classification tasks over the GTSRB and CIFAR-10 datasets. The experimental results show that prior dirty-label and clean-label backdoor attacks in the literature produce poisoned datasets that exhibit behavior consistent with the incompatibility property. The results also show that our defense reduces the attack success rate below 1% on 134 out of 165 scenarios in this setting, with only a 2% drop in clean accuracy on CIFAR-10 (and negligible impact on GTSRB).
    Cloud Classification with Unsupervised Deep Learning. (arXiv:2209.15585v1 [physics.ao-ph])
    We present a framework for cloud characterization that leverages modern unsupervised deep learning technologies. While previous neural network-based cloud classification models have used supervised learning methods, unsupervised learning allows us to avoid restricting the model to artificial categories based on historical cloud classification schemes and enables the discovery of novel, more detailed classifications. Our framework learns cloud features directly from radiance data produced by NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) satellite instrument, deriving cloud characteristics from millions of images without relying on pre-defined cloud types during the training process. We present preliminary results showing that our method extracts physically relevant information from radiance data and produces meaningful cloud classes.  ( 2 min )
    Beyond Bayes-optimality: meta-learning what you know you don't know. (arXiv:2209.15618v1 [cs.AI])
    Meta-training agents with memory has been shown to culminate in Bayes-optimal agents, which casts Bayes-optimality as the implicit solution to a numerical optimization problem rather than an explicit modeling assumption. Bayes-optimal agents are risk-neutral, since they solely attune to the expected return, and ambiguity-neutral, since they act in new situations as if the uncertainty were known. This is in contrast to risk-sensitive agents, which additionally exploit the higher-order moments of the return, and ambiguity-sensitive agents, which act differently when recognizing situations in which they lack knowledge. Humans are also known to be averse to ambiguity and sensitive to risk in ways that aren't Bayes-optimal, indicating that such sensitivity can confer advantages, especially in safety-critical situations. How can we extend the meta-learning protocol to generate risk- and ambiguity-sensitive agents? The goal of this work is to fill this gap in the literature by showing that risk- and ambiguity-sensitivity also emerge as the result of an optimization problem using modified meta-training algorithms, which manipulate the experience-generation process of the learner. We empirically test our proposed meta-training algorithms on agents exposed to foundational classes of decision-making experiments and demonstrate that they become sensitive to risk and ambiguity.  ( 3 min )
    Restricted Strong Convexity of Deep Learning Models with Smooth Activations. (arXiv:2209.15106v1 [cs.LG])
    We consider the problem of optimization of deep learning models with smooth activation functions. While there exist influential results on the problem from the ``near initialization'' perspective, we shed considerable new light on the problem. In particular, we make two key technical contributions for such models with $L$ layers, $m$ width, and $\sigma_0^2$ initialization variance. First, for suitable $\sigma_0^2$, we establish a $O(\frac{\text{poly}(L)}{\sqrt{m}})$ upper bound on the spectral norm of the Hessian of such models, considerably sharpening prior results. Second, we introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) which holds as long as the squared norm of the average gradient of predictors is $\Omega(\frac{\text{poly}(L)}{\sqrt{m}})$ for the square loss. We also present results for more general losses. The RSC based analysis does not need the ``near initialization" perspective and guarantees geometric convergence for gradient descent (GD). To the best of our knowledge, ours is the first result on establishing geometric convergence of GD based on RSC for deep learning models, thus becoming an alternative sufficient condition for convergence that does not depend on the widely-used Neural Tangent Kernel (NTK). We share preliminary experimental results supporting our theoretical advances.
    End-to-end multi-particle reconstruction in high occupancy imaging calorimeters with graph neural networks. (arXiv:2204.01681v3 [physics.ins-det] UPDATED)
    We present an end-to-end reconstruction algorithm to build particle candidates from detector hits in next-generation granular calorimeters similar to that foreseen for the high-luminosity upgrade of the CMS detector. The algorithm exploits a distance-weighted graph neural network, trained with object condensation, a graph segmentation technique. Through a single-shot approach, the reconstruction task is paired with energy regression. We describe the reconstruction performance in terms of efficiency as well as in terms of energy resolution. In addition, we show the jet reconstruction performance of our method and discuss its inference computational cost. To our knowledge, this work is the first-ever example of single-shot calorimetric reconstruction of ${\cal O}(1000)$ particles in high-luminosity conditions with 200 pileup.  ( 2 min )
    Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks. (arXiv:2209.15595v1 [cs.LG])
    Though successful, federated learning presents new challenges for machine learning, especially when the issue of data heterogeneity, also known as Non-IID data, arises. To cope with the statistical heterogeneity, previous works incorporated a proximal term in local optimization or modified the model aggregation scheme at the server side or advocated clustered federated learning approaches where the central server groups agent population into clusters with jointly trainable data distributions to take the advantage of a certain level of personalization. While effective, they lack a deep elaboration on what kind of data heterogeneity and how the data heterogeneity impacts the accuracy performance of the participating clients. In contrast to many of the prior federated learning approaches, we demonstrate not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive: (1) Dissimilar labels of clients (label skew) are not necessarily considered data heterogeneity, and (2) the principal angle between the agents' data subspaces spanned by their corresponding principal vectors of data is a better estimate of the data heterogeneity. Our code is available at https://github.com/MMorafah/FL-SC-NIID.  ( 2 min )
    Improving Molecular Pretraining with Complementary Featurizations. (arXiv:2209.15101v1 [cs.LG])
    Molecular pretraining, which learns molecular representations over massive unlabeled data, has become a prominent paradigm to solve a variety of tasks in computational chemistry and drug discovery. Recently, prosperous progress has been made in molecular pretraining with different molecular featurizations, including 1D SMILES strings, 2D graphs, and 3D geometries. However, the role of molecular featurizations with their corresponding neural architectures in molecular pretraining remains largely unexamined. In this paper, through two case studies -- chirality classification and aromatic ring counting -- we first demonstrate that different featurization techniques convey chemical information differently. In light of this observation, we propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO). MOCO comprehensively leverages multiple featurizations that complement each other and outperforms existing state-of-the-art models that solely relies on one or two featurizations on a wide range of molecular property prediction tasks.
    SoK: On the Impossible Security of Very Large Foundation Models. (arXiv:2209.15259v1 [cs.LG])
    Large machine learning models, or so-called foundation models, aim to serve as base-models for application-oriented machine learning. Although these models showcase impressive performance, they have been empirically found to pose serious security and privacy issues. We may however wonder if this is a limitation of the current models, or if these issues stem from a fundamental intrinsic impossibility of the foundation model learning problem itself. This paper aims to systematize our knowledge supporting the latter. More precisely, we identify several key features of today's foundation model learning problem which, given the current understanding in adversarial machine learning, suggest incompatibility of high accuracy with both security and privacy. We begin by observing that high accuracy seems to require (1) very high-dimensional models and (2) huge amounts of data that can only be procured through user-generated datasets. Moreover, such data is fundamentally heterogeneous, as users generally have very specific (easily identifiable) data-generating habits. More importantly, users' data is filled with highly sensitive information, and maybe heavily polluted by fake users. We then survey lower bounds on accuracy in privacy-preserving and Byzantine-resilient heterogeneous learning that, we argue, constitute a compelling case against the possibility of designing a secure and privacy-preserving high-accuracy foundation model. We further stress that our analysis also applies to other high-stake machine learning applications, including content recommendation. We conclude by calling for measures to prioritize security and privacy, and to slow down the race for ever larger models.  ( 3 min )
    Augmentation Backdoors. (arXiv:2209.15139v1 [cs.LG])
    Data augmentation is used extensively to improve model generalisation. However, reliance on external libraries to implement augmentation methods introduces a vulnerability into the machine learning pipeline. It is well known that backdoors can be inserted into machine learning models through serving a modified dataset to train on. Augmentation therefore presents a perfect opportunity to perform this modification without requiring an initially backdoored dataset. In this paper we present three backdoor attacks that can be covertly inserted into data augmentation. Our attacks each insert a backdoor using a different type of computer vision augmentation transform, covering simple image transforms, GAN-based augmentation, and composition-based augmentation. By inserting the backdoor using these augmentation transforms, we make our backdoors difficult to detect, while still supporting arbitrary backdoor functionality. We evaluate our attacks on a range of computer vision benchmarks and demonstrate that an attacker is able to introduce backdoors through just a malicious augmentation routine.
    Improving Generative Flow Networks with Path Regularization. (arXiv:2209.15092v1 [cs.LG])
    Generative Flow Networks (GFlowNets) are recently proposed models for learning stochastic policies that generate compositional objects by sequences of actions with the probability proportional to a given reward function. The central problem of GFlowNets is to improve their exploration and generalization. In this work, we propose a novel path regularization method based on optimal transport theory that places prior constraints on the underlying structure of the GFlowNets. The prior is designed to help the GFlowNets better discover the latent structure of the target distribution or enhance its ability to explore the environment in the context of active learning. The path regularization controls the flow in GFlowNets to generate more diverse and novel candidates via maximizing the optimal transport distances between two forward policies or to improve the generalization via minimizing the optimal transport distances. In addition, we derive an efficient implementation of the regularization by finding its closed form solutions in specific cases and a meaningful upper bound that can be used as an approximation to minimize the regularization term. We empirically demonstrate the advantage of our path regularization on a wide range of tasks, including synthetic hypergrid environment modeling, discrete probabilistic modeling, and biological sequence design.
    Start Small: Training Game Level Generators from Nothing by Learning at Multiple Sizes. (arXiv:2209.15052v1 [cs.LG])
    A procedural level generator is a tool that generates levels from noise. One approach to build generators is using machine learning, but given the training data rarity, multiple methods have been proposed to train generators from nothing. However, level generation tasks tend to have sparse feedback, which is commonly mitigated using game-specific supplemental rewards. This paper proposes a novel approach to train generators from nothing by learning at multiple level sizes starting from a small size up to the desired sizes. This approach employs the observed phenomenon that feedback is denser at smaller sizes to avoid supplemental rewards. It also presents the benefit of training generators to output levels at various sizes. We apply this approach to train controllable generators using generative flow networks. We also modify diversity sampling to be compatible with generative flow networks and to expand the expressive range. The results show that our methods can generate high-quality diverse levels for Sokoban, Zelda and Danger Dave for a variety of sizes, after only 3h 29min up to 6h 11min (depending on the game) of training on a single commodity machine. Also, the results show that our generators can output levels for sizes that were unavailable during training.
    3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation. (arXiv:2209.15076v1 [cs.CV])
    Vision transformers (ViTs) have quickly superseded convolutional networks (ConvNets) as the current state-of-the-art (SOTA) models for medical image segmentation. Hierarchical transformers (e.g., Swin Transformers) reintroduced several ConvNet priors and further enhanced the practical viability of adapting volumetric segmentation in 3D medical datasets. The effectiveness of hybrid approaches is largely credited to the large receptive field for non-local self-attention and the large number of model parameters. In this work, we propose a lightweight volumetric ConvNet, termed 3D UX-Net, which adapts the hierarchical transformer using ConvNet modules for robust volumetric segmentation. Specifically, we revisit volumetric depth-wise convolutions with large kernel size (e.g. starting from $7\times7\times7$) to enable the larger global receptive fields, inspired by Swin Transformer. We further substitute the multi-layer perceptron (MLP) in Swin Transformer blocks with pointwise depth convolutions and enhance model performances with fewer normalization and activation layers, thus reducing the number of model parameters. 3D UX-Net competes favorably with current SOTA transformers (e.g. SwinUNETR) using three challenging public datasets on volumetric brain and abdominal imaging: 1) MICCAI Challenge 2021 FLARE, 2) MICCAI Challenge 2021 FeTA, and 3) MICCAI Challenge 2022 AMOS. 3D UX-Net consistently outperforms SwinUNETR with improvement from 0.929 to 0.938 Dice (FLARE2021) and 0.867 to 0.874 Dice (Feta2021). We further evaluate the transfer learning capability of 3D UX-Net with AMOS2022 and demonstrates another improvement of $2.27\%$ Dice (from 0.880 to 0.900). The source code with our proposed model are available at https://github.com/MASILab/3DUX-Net.
    Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions. (arXiv:2209.15055v1 [stat.ML])
    We show that the representation cost of fully connected neural networks with homogeneous nonlinearities - which describes the implicit bias in function space of networks with $L_2$-regularization or with losses such as the cross-entropy - converges as the depth of the network goes to infinity to a notion of rank over nonlinear functions. We then inquire under which conditions the global minima of the loss recover the `true' rank of the data: we show that for too large depths the global minimum will be approximately rank 1 (underestimating the rank); we then argue that there is a range of depths which grows with the number of datapoints where the true rank is recovered. Finally, we discuss the effect of the rank of a classifier on the topology of the resulting class boundaries and show that autoencoders with optimal nonlinear rank are naturally denoising.
    Provably expressive temporal graph networks. (arXiv:2209.15059v1 [cs.LG])
    Temporal graph networks (TGNs) have gained prominence as models for embedding dynamic interactions, but little is known about their theoretical underpinnings. We establish fundamental results about the representational power and limits of the two main categories of TGNs: those that aggregate temporal walks (WA-TGNs), and those that augment local message passing with recurrent memory modules (MP-TGNs). Specifically, novel constructions reveal the inadequacy of MP-TGNs and WA-TGNs, proving that neither category subsumes the other. We extend the 1-WL (Weisfeiler-Leman) test to temporal graphs, and show that the most powerful MP-TGNs should use injective updates, as in this case they become as expressive as the temporal WL. Also, we show that sufficiently deep MP-TGNs cannot benefit from memory, and MP/WA-TGNs fail to compute graph properties such as girth. These theoretical insights lead us to PINT -- a novel architecture that leverages injective temporal message passing and relative positional features. Importantly, PINT is provably more expressive than both MP-TGNs and WA-TGNs. PINT significantly outperforms existing TGNs on several real-world benchmarks.
    How to tackle an emerging topic? Combining strong and weak labels for Covid news NER. (arXiv:2209.15108v1 [cs.CL])
    Being able to train Named Entity Recognition (NER) models for emerging topics is crucial for many real-world applications especially in the medical domain where new topics are continuously evolving out of the scope of existing models and datasets. For a realistic evaluation setup, we introduce a novel COVID-19 news NER dataset (COVIDNEWS-NER) and release 3000 entries of hand annotated strongly labelled sentences and 13000 auto-generated weakly labelled sentences. Besides the dataset, we propose CONTROSTER, a recipe to strategically combine weak and strong labels in improving NER in an emerging topic through transfer learning. We show the effectiveness of CONTROSTER on COVIDNEWS-NER while providing analysis on combining weak and strong labels for training. Our key findings are: (1) Using weak data to formulate an initial backbone before tuning on strong data outperforms methods trained on only strong or weak data. (2) A combination of out-of-domain and in-domain weak label training is crucial and can overcome saturation when being training on weak labels from a single source.
    Sparse tree-based initialization for neural networks. (arXiv:2209.15283v1 [stat.ML])
    Dedicated neural network (NN) architectures have been designed to handle specific data types (such as CNN for images or RNN for text), which ranks them among state-of-the-art methods for dealing with these data. Unfortunately, no architecture has been found for dealing with tabular data yet, for which tree ensemble methods (tree boosting, random forests) usually show the best predictive performances. In this work, we propose a new sparse initialization technique for (potentially deep) multilayer perceptrons (MLP): we first train a tree-based procedure to detect feature interactions and use the resulting information to initialize the network, which is subsequently trained via standard stochastic gradient strategies. Numerical experiments on several tabular data sets show that this new, simple and easy-to-use method is a solid concurrent, both in terms of generalization capacity and computation time, to default MLP initialization and even to existing complex deep learning solutions. In fact, this wise MLP initialization raises the resulting NN methods to the level of a valid competitor to gradient boosting when dealing with tabular data. Besides, such initializations are able to preserve the sparsity of weights introduced in the first layers of the network through training. This fact suggests that this new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor (as for convolutional layers in CNN).
    Transformers for Object Detection in Large Point Clouds. (arXiv:2209.15258v1 [cs.CV])
    We present TransLPC, a novel detection model for large point clouds that is based on a transformer architecture. While object detection with transformers has been an active field of research, it has proved difficult to apply such models to point clouds that span a large area, e.g. those that are common in autonomous driving, with lidar or radar data. TransLPC is able to remedy these issues: The structure of the transformer model is modified to allow for larger input sequence lengths, which are sufficient for large point clouds. Besides this, we propose a novel query refinement technique to improve detection accuracy, while retaining a memory-friendly number of transformer decoder queries. The queries are repositioned between layers, moving them closer to the bounding box they are estimating, in an efficient manner. This simple technique has a significant effect on detection accuracy, which is evaluated on the challenging nuScenes dataset on real-world lidar data. Besides this, the proposed method is compatible with existing transformer-based solutions that require object detection, e.g. for joint multi-object tracking and detection, and enables them to be used in conjunction with large point clouds.  ( 3 min )
    Generalizability of Adversarial Robustness Under Distribution Shifts. (arXiv:2209.15042v1 [cs.LG])
    Recent progress in empirical and certified robustness promises to deliver reliable and deployable Deep Neural Networks (DNNs). Despite that success, most existing evaluations of DNN robustness have been done on images sampled from the same distribution that the model was trained on. Yet, in the real world, DNNs may be deployed in dynamic environments that exhibit significant distribution shifts. In this work, we take a first step towards thoroughly investigating the interplay between empirical and certified adversarial robustness on one hand and domain generalization on another. To do so, we train robust models on multiple domains and evaluate their accuracy and robustness on an unseen domain. We observe that: (1) both empirical and certified robustness generalize to unseen domains, and (2) the level of generalizability does not correlate well with input visual similarity, measured by the FID between source and target domains. We also extend our study to cover a real-world medical application, in which adversarial augmentation enhances both the robustness and generalization accuracy in unseen domains.
    Understanding Pure CLIP Guidance for Voxel Grid NeRF Models. (arXiv:2209.15172v1 [cs.CV])
    We explore the task of text to 3D object generation using CLIP. Specifically, we use CLIP for guidance without access to any datasets, a setting we refer to as pure CLIP guidance. While prior work has adopted this setting, there is no systematic study of mechanics for preventing adversarial generations within CLIP. We illustrate how different image-based augmentations prevent the adversarial generation problem, and how the generated results are impacted. We test different CLIP model architectures and show that ensembling different models for guidance can prevent adversarial generations within bigger models and generate sharper results. Furthermore, we implement an implicit voxel grid model to show how neural networks provide an additional layer of regularization, resulting in better geometrical structure and coherency of generated objects. Compared to prior work, we achieve more coherent results with higher memory efficiency and faster training speeds.
    Nonconvex Matrix Factorization is Geodesically Convex: Global Landscape Analysis for Fixed-rank Matrix Optimization From a Riemannian Perspective. (arXiv:2209.15130v1 [math.OC])
    We study a general matrix optimization problem with a fixed-rank positive semidefinite (PSD) constraint. We perform the Burer-Monteiro factorization and consider a particular Riemannian quotient geometry in a search space that has a total space equipped with the Euclidean metric. When the original objective f satisfies standard restricted strong convexity and smoothness properties, we characterize the global landscape of the factorized objective under the Riemannian quotient geometry. We show the entire search space can be divided into three regions: (R1) the region near the target parameter of interest, where the factorized objective is geodesically strongly convex and smooth; (R2) the region containing neighborhoods of all strict saddle points; (R3) the remaining regions, where the factorized objective has a large gradient. To our best knowledge, this is the first global landscape analysis of the Burer-Monteiro factorized objective under the Riemannian quotient geometry. Our results provide a fully geometric explanation for the superior performance of vanilla gradient descent under the Burer-Monteiro factorization. When f satisfies a weaker restricted strict convexity property, we show there exists a neighborhood near local minimizers such that the factorized objective is geodesically convex. To prove our results we provide a comprehensive landscape analysis of a matrix factorization problem with a least squares objective, which serves as a critical bridge. Our conclusions are also based on a result of independent interest stating that the geodesic ball centered at Y with a radius 1/3 of the least singular value of Y is a geodesically convex set under the Riemannian quotient geometry, which as a corollary, also implies a quantitative bound of the convexity radius in the Bures-Wasserstein space. The convexity radius obtained is sharp up to constants.
    Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments. (arXiv:2209.15090v1 [eess.SY])
    It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions. Many popular safe RL methods such as those based on the Constrained Markov Decision Process (CMDP) paradigm formulate safety violations in a cost function and try to constrain the expectation of cumulative cost under a threshold. However, it is often difficult to effectively capture and enforce hard reachability-based safety constraints indirectly with such constraints on safety violation costs. In this work, we leverage the notion of barrier function to explicitly encode the hard safety constraints, and given that the environment is unknown, relax them to our design of \emph{generative-model-based soft barrier functions}. Based on such soft barriers, we propose a safe RL approach that can jointly learn the environment and optimize the control policy, while effectively avoiding unsafe regions with safety probability optimization. Experiments on a set of examples demonstrate that our approach can effectively enforce hard safety constraints and significantly outperform CMDP-based baseline methods in system safe rate measured via simulations.
    Graph Attention Network for Camera Relocalization on Dynamic Scenes. (arXiv:2209.15056v1 [cs.CV])
    We devise a graph attention network-based approach for learning a scene triangle mesh representation in order to estimate an image camera position in a dynamic environment. Previous approaches built a scene-dependent model that explicitly or implicitly embeds the structure of the scene. They use convolution neural networks or decision trees to establish 2D/3D-3D correspondences. Such a mapping overfits the target scene and does not generalize well to dynamic changes in the environment. Our work introduces a novel approach to solve the camera relocalization problem by using the available triangle mesh. Our 3D-3D matching framework consists of three blocks: (1) a graph neural network to compute the embedding of mesh vertices, (2) a convolution neural network to compute the embedding of grid cells defined on the RGB-D image, and (3) a neural network model to establish the correspondence between the two embeddings. These three components are trained end-to-end. To predict the final pose, we run the RANSAC algorithm to generate camera pose hypotheses, and we refine the prediction using the point-cloud representation. Our approach significantly improves the camera pose accuracy of the state-of-the-art method from $0.358$ to $0.506$ on the RIO10 benchmark for dynamic indoor camera relocalization.
    Music Source Separation with Band-split RNN. (arXiv:2209.15174v1 [eess.AS])
    The performance of music source separation (MSS) models has been greatly improved in recent years thanks to the development of novel neural network architectures and training pipelines. However, recent model designs for MSS were mainly motivated by other audio processing tasks or other research fields, while the intrinsic characteristics and patterns of the music signals were not fully discovered. In this paper, we propose band-split RNN (BSRNN), a frequency-domain model that explictly splits the spectrogram of the mixture into subbands and perform interleaved band-level and sequence-level modeling. The choices of the bandwidths of the subbands can be determined by a priori knowledge or expert knowledge on the characteristics of the target source in order to optimize the performance on a certain type of target musical instrument. To better make use of unlabeled data, we also describe a semi-supervised model finetuning pipeline that can further improve the performance of the model. Experiment results show that BSRNN trained only on MUSDB18-HQ dataset significantly outperforms several top-ranking models in Music Demixing (MDX) Challenge 2021, and the semi-supervised finetuning stage further improves the performance on all four instrument tracks.
    Automatic Data Augmentation via Invariance-Constrained Learning. (arXiv:2209.15031v1 [cs.LG])
    Underlying data structures, such as symmetries or invariances to transformations, are often exploited to improve the solution of learning tasks. However, embedding these properties in models or learning algorithms can be challenging and computationally intensive. Data augmentation, on the other hand, induces these symmetries during training by applying multiple transformations to the input data. Despite its ubiquity, its effectiveness depends on the choices of which transformations to apply, when to do so, and how often. In fact, there is both empirical and theoretical evidence that the indiscriminate use of data augmentation can introduce biases that outweigh its benefits. This work tackles these issues by automatically adapting the data augmentation while solving the learning task. To do so, it formulates data augmentation as an invariance-constrained learning problem and leverages Monte Carlo Markov Chain (MCMC) sampling to solve it. The result is a practical algorithm that not only does away with a priori searches for augmentation distributions, but also dynamically controls if and when data augmentation is applied. Our experiments illustrate the performance of this method, which achieves state-of-the-art results in automatic data augmentation benchmarks for CIFAR datasets. Furthermore, this approach can be used to gather insights on the actual symmetries underlying a learning task.
    Variable-Based Calibration for Machine Learning Classifiers. (arXiv:2209.15154v1 [cs.LG])
    The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based calibration and metrics such as expected calibration error (ECE). In particular, we find that models with near-perfect ECE can exhibit significant variable-based calibration error as a function of features of the data. We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing recalibration methods. To mitigate this issue, we propose strategies for detection, visualization, and quantification of variable-based calibration error. We then examine the limitations of current score-based recalibration methods and explore potential modifications. Finally, we discuss the implications of these findings, emphasizing that an understanding of calibration beyond simple aggregate measures is crucial for endeavors such as fairness and model interpretability.
    The Replicator Dynamic, Chain Components and the Response Graph. (arXiv:2209.15230v1 [cs.GT])
    In this paper we examine the relationship between the flow of the replicator dynamic, the continuum limit of Multiplicative Weights Update, and a game's response graph. We settle an open problem establishing that under the replicator, sink chain components -- a topological notion of long-run outcome of a dynamical system -- always exist and are approximated by the sink connected components of the game's response graph. More specifically, each sink chain component contains a sink connected component of the response graph, as well as all mixed strategy profiles whose support consists of pure profiles in the same connected component, a set we call the content of the connected component. As a corollary, all profiles are chain recurrent in games with strongly connected response graphs. In any two-player game sharing a response graph with a zero-sum game, the sink chain component is unique. In two-player zero-sum and potential games the sink chain components and sink connected components are in a one-to-one correspondence, and we conjecture that this holds in all games.
    Few-shot Text Classification with Dual Contrastive Consistency. (arXiv:2209.15069v1 [cs.CL])
    In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification where only a few annotated examples are given for each class. Since using traditional cross-entropy loss to fine-tune language model under this scenario causes serious overfitting and leads to sub-optimal generalization of model, we adopt supervised contrastive learning on few labeled data and consistency-regularization on vast unlabeled data. Moreover, we propose a novel contrastive consistency to further boost model performance and refine sentence representation. After conducting extensive experiments on four datasets, we demonstrate that our model (FTCC) can outperform state-of-the-art methods and has better robustness.
    A deep learning approach to the probabilistic numerical solution of path-dependent partial differential equations. (arXiv:2209.15010v1 [cs.LG])
    Recent work on Path-Dependent Partial Differential Equations (PPDEs) has shown that PPDE solutions can be approximated by a probabilistic representation, implemented in the literature by the estimation of conditional expectations using regression. However, a limitation of this approach is to require the selection of a basis in a function space. In this paper, we overcome this limitation by the use of deep learning methods, and we show that this setting allows for the derivation of error bounds on the approximation of conditional expectations. Numerical examples based on a two-person zero-sum game, as well as on Asian and barrier option pricing, are presented. In comparison with other deep learning approaches, our algorithm appears to be more accurate, especially in large dimensions.
    On the optimization and generalization of overparameterized implicit neural networks. (arXiv:2209.15562v1 [cs.LG])
    Implicit neural networks have become increasingly attractive in the machine learning community since they can achieve competitive performance but use much less computational resources. Recently, a line of theoretical works established the global convergences for first-order methods such as gradient descent if the implicit networks are over-parameterized. However, as they train all layers together, their analyses are equivalent to only studying the evolution of the output layer. It is unclear how the implicit layer contributes to the training. Thus, in this paper, we restrict ourselves to only training the implicit layer. We show that global convergence is guaranteed, even if only the implicit layer is trained. On the other hand, the theoretical understanding of when and how the training performance of an implicit neural network can be generalized to unseen data is still under-explored. Although this problem has been studied in standard feed-forward networks, the case of implicit neural networks is still intriguing since implicit networks theoretically have infinitely many layers. Therefore, this paper investigates the generalization error for implicit neural networks. Specifically, we study the generalization of an implicit network activated by the ReLU function over random initialization. We provide a generalization bound that is initialization sensitive. As a result, we show that gradient flow with proper random initialization can train a sufficient over-parameterized implicit network to achieve arbitrarily small generalization errors.
    Batch Multivalid Conformal Prediction. (arXiv:2209.15145v1 [cs.LG])
    We develop fast distribution-free conformal prediction algorithms for obtaining multivalid coverage on exchangeable data in the batch setting. Multivalid coverage guarantees are stronger than marginal coverage guarantees in two ways: (1) They hold even conditional on group membership -- that is, the target coverage level $1-\alpha$ holds conditionally on membership in each of an arbitrary (potentially intersecting) group in a finite collection $\mathcal{G}$ of regions in the feature space. (2) They hold even conditional on the value of the threshold used to produce the prediction set on a given example. In fact multivalid coverage guarantees hold even when conditioning on group membership and threshold value simultaneously. We give two algorithms: both take as input an arbitrary non-conformity score and an arbitrary collection of possibly intersecting groups $\mathcal{G}$, and then can equip arbitrary black-box predictors with prediction sets. Our first algorithm (BatchGCP) is a direct extension of quantile regression, needs to solve only a single convex minimization problem, and produces an estimator which has group-conditional guarantees for each group in $\mathcal{G}$. Our second algorithm (BatchMVP) is iterative, and gives the full guarantees of multivalid conformal prediction: prediction sets that are valid conditionally both on group membership and non-conformity threshold. We evaluate the performance of both of our algorithms in an extensive set of experiments. Code to replicate all of our experiments can be found at https://github.com/ProgBelarus/BatchMultivalidConformal
    Likelihood adjusted semidefinite programs for clustering heterogeneous data. (arXiv:2209.15097v1 [stat.ML])
    Clustering is a widely deployed unsupervised learning tool. Model-based clustering is a flexible framework to tackle data heterogeneity when the clusters have different shapes. Likelihood-based inference for mixture distributions often involves non-convex and high-dimensional objective functions, imposing difficult computational and statistical challenges. The classic expectation-maximization (EM) algorithm is a computationally thrifty iterative method that maximizes a surrogate function minorizing the log-likelihood of observed data in each iteration, which however suffers from bad local maxima even in the special case of the standard Gaussian mixture model with common isotropic covariance matrices. On the other hand, recent studies reveal that the unique global solution of a semidefinite programming (SDP) relaxed $K$-means achieves the information-theoretically sharp threshold for perfectly recovering the cluster labels under the standard Gaussian mixture model. In this paper, we extend the SDP approach to a general setting by integrating cluster labels as model parameters and propose an iterative likelihood adjusted SDP (iLA-SDP) method that directly maximizes the \emph{exact} observed likelihood in the presence of data heterogeneity. By lifting the cluster assignment to group-specific membership matrices, iLA-SDP avoids centroids estimation -- a key feature that allows exact recovery under well-separateness of centroids without being trapped by their adversarial configurations. Thus iLA-SDP is less sensitive than EM to initialization and more stable on high-dimensional data. Our numeric experiments demonstrate that iLA-SDP can achieve lower mis-clustering errors over several widely used clustering methods including $K$-means, SDP and EM algorithms.
    Individual Privacy Accounting with Gaussian Differential Privacy. (arXiv:2209.15596v1 [cs.CR])
    Individual privacy accounting enables bounding differential privacy (DP) loss individually for each participant involved in the analysis. This can be informative as often the individual privacy losses are considerably smaller than those indicated by the DP bounds that are based on considering worst-case bounds at each data access. In order to account for the individual privacy losses in a principled manner, we need a privacy accountant for adaptive compositions of randomised mechanisms, where the loss incurred at a given data access is allowed to be smaller than the worst-case loss. This kind of analysis has been carried out for the R\'enyi differential privacy (RDP) by Feldman and Zrnic (2021), however not yet for the so-called optimal privacy accountants. We make first steps in this direction by providing a careful analysis using the Gaussian differential privacy which gives optimal bounds for the Gaussian mechanism, one of the most versatile DP mechanisms. This approach is based on determining a certain supermartingale for the hockey-stick divergence and on extending the R\'enyi divergence-based fully adaptive composition results by Feldman and Zrnic (2021). We also consider measuring the individual $(\varepsilon,\delta)$-privacy losses using the so-called privacy loss distributions. With the help of the Blackwell theorem, we can then make use of the RDP analysis to construct an approximative individual $(\varepsilon,\delta)$-accountant.
    Shuffled linear regression through graduated convex relaxation. (arXiv:2209.15608v1 [stat.CO])
    The shuffled linear regression problem aims to recover linear relationships in datasets where the correspondence between input and output is unknown. This problem arises in a wide range of applications including survey data, in which one needs to decide whether the anonymity of the responses can be preserved while uncovering significant statistical connections. In this work, we propose a novel optimization algorithm for shuffled linear regression based on a posterior-maximizing objective function assuming Gaussian noise prior. We compare and contrast our approach with existing methods on synthetic and real data. We show that our approach performs competitively while achieving empirical running-time improvements. Furthermore, we demonstrate that our algorithm is able to utilize the side information in the form of seeds, which recently came to prominence in related problems.
    S2P: State-conditioned Image Synthesis for Data Augmentation in Offline Reinforcement Learning. (arXiv:2209.15256v1 [cs.LG])
    Offline reinforcement learning (Offline RL) suffers from the innate distributional shift as it cannot interact with the physical environment during training. To alleviate such limitation, state-based offline RL leverages a learned dynamics model from the logged experience and augments the predicted state transition to extend the data distribution. For exploiting such benefit also on the image-based RL, we firstly propose a generative model, S2P (State2Pixel), which synthesizes the raw pixel of the agent from its corresponding state. It enables bridging the gap between the state and the image domain in RL algorithms, and virtually exploring unseen image distribution via model-based transition in the state space. Through experiments, we confirm that our S2P-based image synthesis not only improves the image-based offline RL performance but also shows powerful generalization capability on unseen tasks.
    Low-Dose CT Using Denoising Diffusion Probabilistic Model for 20$\times$ Speedup. (arXiv:2209.15136v1 [eess.IV])
    Low-dose computed tomography (LDCT) is an important topic in the field of radiology over the past decades. LDCT reduces ionizing radiation-induced patient health risks but it also results in a low signal-to-noise ratio (SNR) and a potential compromise in the diagnostic performance. In this paper, to improve the LDCT denoising performance, we introduce the conditional denoising diffusion probabilistic model (DDPM) and show encouraging results with a high computational efficiency. Specifically, given the high sampling cost of the original DDPM model, we adapt the fast ordinary differential equation (ODE) solver for a much-improved sampling efficiency. The experiments show that the accelerated DDPM can achieve 20x speedup without compromising image quality.
    Rethinking and Recomputing the Value of ML Models. (arXiv:2209.15157v1 [cs.LG])
    In this paper, we argue that the way we have been training and evaluating ML models has largely forgotten the fact that they are applied in an organization or societal context as they provide value to people. We show that with this perspective we fundamentally change how we evaluate, select and deploy ML models - and to some extent even what it means to learn. Specifically, we stress that the notion of value plays a central role in learning and evaluating, and different models may require different learning practices and provide different values based on the application context they are applied. We also show that this concretely impacts how we select and embed models into human workflows based on experimental datasets. Nothing of what is presented here is hard: to a large extent is a series of fairly trivial observations with massive practical implications.
    Empowering the trustworthiness of ML-based critical systems through engineering activities. (arXiv:2209.15438v1 [cs.SE])
    This paper reviews the entire engineering process of trustworthy Machine Learning (ML) algorithms designed to equip critical systems with advanced analytics and decision functions. We start from the fundamental principles of ML and describe the core elements conditioning its trust, particularly through its design: namely domain specification, data engineering, design of the ML algorithms, their implementation, evaluation and deployment. The latter components are organized in an unique framework for the design of trusted ML systems.
    TT-NF: Tensor Train Neural Fields. (arXiv:2209.15529v1 [cs.LG])
    Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations. In this paper, we introduce a novel low-rank representation termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense regular grids and efficient methods for sampling from them. Our representation is a TT parameterization of the neural field, trained with backpropagation to minimize a non-convex objective. We analyze the effect of low-rank compression on the downstream task quality metrics in two settings. First, we demonstrate the efficiency of our method in a sandbox task of tensor denoising, which admits comparison with SVD-based schemes designed to minimize reconstruction error. Furthermore, we apply the proposed approach to Neural Radiance Fields, where the low-rank structure of the field corresponding to the best quality can be discovered only through learning.  ( 2 min )
    Equivariant Energy-Guided SDE for Inverse Molecular Design. (arXiv:2209.15408v1 [physics.chem-ph])
    Inverse molecular design is critical in material science and drug discovery, where the generated molecules should satisfy certain desirable properties. In this paper, we propose equivariant energy-guided stochastic differential equations (EEGSDE), a flexible framework for controllable 3D molecule generation under the guidance of an energy function in diffusion models. Formally, we show that EEGSDE naturally exploits the geometric symmetry in 3D molecular conformation, as long as the energy function is invariant to orthogonal transformations. Empirically, under the guidance of designed energy functions, EEGSDE significantly improves the baseline on QM9, in inverse molecular design targeted to quantum properties and molecular structures. Furthermore, EEGSDE is able to generate molecules with multiple target properties by combining the corresponding energy functions linearly.
    Prompt Tuning for Graph Neural Networks. (arXiv:2209.15240v1 [cs.LG])
    In recent years, prompt tuning has set off a research boom in the adaptation of pre-trained models. In this paper, we propose Graph Prompt as an efficient and effective alternative to full fine-tuning for adapting the pre-trianed GNN models to downstream tasks. To the best of our knowledge, we are the first to explore the effectiveness of prompt tuning on existing pre-trained GNN models. Specifically, without tuning the parameters of the pre-trained GNN model, we train a task-specific graph prompt that provides graph-level transformations on the downstream graphs during the adaptation stage. Then, we introduce a concrete implementation of the graph prompt, called GP-Feature (GPF), which adds learnable perturbations to the feature space of the downstream graph. GPF has a strong expressive ability that it can modify both the node features and the graph structure implicitly. Accordingly, we demonstrate that GPF can achieve the approximately equivalent effect of any graph-level transformations under most existing pre-trained GNN models. We validate the effectiveness of GPF on numerous pre-trained GNN models, and the experimental results show that with a small amount (about 0.1% of that for fine-tuning ) of tunable parameters, GPF can achieve comparable performances as fine-tuning, and even obtain significant performance gains in some cases.
    Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction. (arXiv:2209.15245v1 [cs.LG])
    Due to limited communication capacities of edge devices, most existing federated learning (FL) methods randomly select only a subset of devices to participate in training for each communication round. Compared with engaging all the available clients, the random-selection mechanism can lead to significant performance degradation on non-IID (independent and identically distributed) data. In this paper, we show our key observation that the essential reason resulting in such performance degradation is the class-imbalance of the grouped data from randomly selected clients. Based on our key observation, we design an efficient heterogeneity-aware client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS), which can effectively reduce class-imbalance of the group dataset from the intentionally selected clients. In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way. Based on this measure, we also design a computation-efficient client sampling strategy, such that the actively selected clients will generate a more class-balanced grouped dataset with theoretical guarantees. Extensive experimental results demonstrate Fed-CBS outperforms the status quo approaches. Furthermore, it achieves comparable or even better performance than the ideal setting where all the available clients participate in the FL training.
    Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods. (arXiv:2209.15589v1 [cs.CV])
    Self-supervised methods have achieved remarkable success in transfer learning, often achieving the same or better accuracy than supervised pre-training. Most prior work has done so by increasing pre-training computation by adding complex data augmentation, multiple views, or lengthy training schedules. In this work, we investigate a related, but orthogonal question: given a \textit{fixed} FLOP budget, what are the best datasets, models, and (self-)supervised training methods for obtaining high accuracy on representative visual tasks? Given the availability of large datasets, this setting is often more relevant for both academic and industry labs alike. We examine five large-scale datasets (JFT-300M, ALIGN, ImageNet-1K, ImageNet-21K, and COCO) and six pre-training methods (CLIP, DINO, SimCLR, BYOL, Masked Autoencoding, and supervised). In a like-for-like fashion, we characterize their FLOP and CO$_2$ footprints, relative to their accuracy when transferred to a canonical image segmentation task. Our analysis reveals strong disparities in the computational efficiency of pre-training methods and their dependence on dataset quality. In particular, our results call into question the commonly-held assumption that self-supervised methods inherently scale to large, uncurated data. We therefore advocate for (1) paying closer attention to dataset curation and (2) reporting of accuracies in context of the total computational cost.  ( 3 min )
    MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features. (arXiv:2209.15159v1 [cs.CV])
    MobileViT (MobileViTv1) combines convolutional neural networks (CNNs) and vision transformers (ViTs) to create light-weight models for mobile vision tasks. Though the main MobileViTv1-block helps to achieve competitive state-of-the-art results, the fusion block inside MobileViTv1-block, creates scaling challenges and has a complex learning task. We propose changes to the fusion block that are simple and effective to create MobileViTv3-block, which addresses the scaling and simplifies the learning task. Our proposed MobileViTv3-block used to create MobileViTv3-XXS, XS and S models outperform MobileViTv1 on ImageNet-1k, ADE20K, COCO and PascalVOC2012 datasets. On ImageNet-1K, MobileViTv3-XXS and MobileViTv3-XS surpasses MobileViTv1-XXS and MobileViTv1-XS by 2% and 1.9% respectively. Recently published MobileViTv2 architecture removes fusion block and uses linear complexity transformers to perform better than MobileViTv1. We add our proposed fusion block to MobileViTv2 to create MobileViTv3-0.5, 0.75 and 1.0 models. These new models give better accuracy numbers on ImageNet-1k, ADE20K, COCO and PascalVOC2012 datasets as compared to MobileViTv2. MobileViTv3-0.5 and MobileViTv3-0.75 outperforms MobileViTv2-0.5 and MobileViTv2-0.75 by 2.1% and 1.0% respectively on ImageNet-1K dataset. For segmentation task, MobileViTv3-1.0 achieves 2.07% and 1.1% better mIOU compared to MobileViTv2-1.0 on ADE20K dataset and PascalVOC2012 dataset respectively. Our code and the trained models are available at: https://github.com/micronDLA/MobileViTv3  ( 3 min )
    DynImp: Dynamic Imputation for Wearable Sensing Data Through Sensory and Temporal Relatedness. (arXiv:2209.15415v1 [eess.SP])
    In wearable sensing applications, data is inevitable to be irregularly sampled or partially missing, which pose challenges for any downstream application. An unique aspect of wearable data is that it is time-series data and each channel can be correlated to another one, such as x, y, z axis of accelerometer. We argue that traditional methods have rarely made use of both times-series dynamics of the data as well as the relatedness of the features from different sensors. We propose a model, termed as DynImp, to handle different time point's missingness with nearest neighbors along feature axis and then feeding the data into a LSTM-based denoising autoencoder which can reconstruct missingness along the time axis. We experiment the model on the extreme missingness scenario ($>50\%$ missing rate) which has not been widely tested in wearable data. Our experiments on activity recognition show that the method can exploit the multi-modality features from related sensors and also learn from history time-series dynamics to reconstruct the data under extreme missingness.  ( 2 min )
    On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs. (arXiv:2209.15141v1 [cs.LG])
    We show two average-reward off-policy control algorithms, Differential Q Learning (Wan, Naik, \& Sutton 2021a) and RVI Q Learning (Abounadi Bertsekas \& Borkar 2001), converge in weakly-communicating MDPs. Weakly-communicating MDPs are the most general class of MDPs that a learning algorithm with a single stream of experience can guarantee obtaining a policy achieving optimal reward rate. The original convergence proofs of the two algorithms require that all optimal policies induce unichains, which is not necessarily true for weakly-communicating MDPs. To the best of our knowledge, our results are the first showing average-reward off-policy control algorithms converge in weakly-communicating MDPs. As a direct extension, we show that average-reward options algorithms introduced by (Wan, Naik, \& Sutton 2021b) converge if the Semi-MDP induced by options is weakly-communicating.
  • Open

    Smooth Bilevel Programming for Sparse Regularization. (arXiv:2106.01429v2 [stat.ML] UPDATED)
    Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning. State of the art approaches are more efficient but typically rely on specific coordinate pruning schemes. In this work, we show how a surprisingly simple reparametrization of IRLS, coupled with a bilevel resolution (instead of an alternating scheme) is able to achieve top performances on a wide range of sparsity (such as Lasso, group Lasso and trace norm regularizations), regularization strength (including hard constraints), and design matrices (ranging from correlated designs to differential operators). Similarly to IRLS, our method only involves linear systems resolutions, but in sharp contrast, corresponds to the minimization of a smooth function. Despite being non-convex, we show that there is no spurious minima and that saddle points are "ridable", so that there always exists a descent direction. We thus advocate for the use of a BFGS quasi-Newton solver, which makes our approach simple, robust and efficient. We perform a numerical benchmark of the convergence speed of our algorithm against state of the art solvers for Lasso, group Lasso, trace norm and linearly constrained problems. These results highlight the versatility of our approach, removing the need to use different solvers depending on the specificity of the ML problem under study.
    Riemannian Metric Learning via Optimal Transport. (arXiv:2205.09244v2 [cs.LG] UPDATED)
    We introduce an optimal transport-based model for learning a metric tensor from cross-sectional samples of evolving probability measures on a common Riemannian manifold. We neurally parametrize the metric as a spatially-varying matrix field and efficiently optimize our model's objective using a simple alternating scheme. Using this learned metric, we can nonlinearly interpolate between probability measures and compute geodesics on the manifold. We show that metrics learned using our method improve the quality of trajectory inference on scRNA and bird migration data at the cost of little additional cross-sectional data.
    Risk Control for Online Learning Models. (arXiv:2205.09095v6 [cs.LG] UPDATED)
    To provide rigorous uncertainty quantification for online learning models, we develop a framework for constructing uncertainty sets that provably control risk -- such as coverage of confidence intervals, false negative rate, or F1 score -- in the online setting. This extends conformal prediction to apply to a larger class of online learning problems. Our method guarantees risk control at any user-specified level even when the underlying data distribution shifts drastically, even adversarially, over time in an unknown fashion. The technique we propose is highly flexible as it can be applied with any base online learning algorithm (e.g., a deep neural network trained online), requiring minimal implementation effort and essentially zero additional computational cost. We further extend our approach to control multiple risks simultaneously, so the prediction sets we generate are valid for all given risks. To demonstrate the utility of our method, we conduct experiments on real-world tabular time-series data sets showing that the proposed method rigorously controls various natural risks. Furthermore, we show how to construct valid intervals for an online image-depth estimation problem that previous sequential calibration schemes cannot handle.
    Identifying Latent Causal Content for Multi-Source Domain Adaptation. (arXiv:2208.14161v2 [cs.LG] UPDATED)
    Multi-source domain adaptation (MSDA) learns to predict the labels in target domain data, under the setting that data from multiple source domains are labelled and data from the target domain are unlabelled. Most methods for this task focus on learning invariant representations across domains. However, their success relies heavily on the assumption that the label distribution remains consistent across domains, which may not hold in general real-world problems. In this paper, we propose a new and more flexible assumption, termed \textit{latent covariate shift}, where a latent content variable $\mathbf{z}_c$ and a latent style variable $\mathbf{z}_s$ are introduced in the generative process, with the marginal distribution of $\mathbf{z}_c$ changing across domains and the conditional distribution of the label given $\mathbf{z}_c$ remaining invariant across domains. We show that although (completely) identifying the proposed latent causal model is challenging, the latent content variable can be identified up to scaling by using its dependence with labels from source domains, together with the identifiability conditions of nonlinear ICA. This motivates us to propose a novel method for MSDA, which learns the invariant label distribution conditional on the latent content variable, instead of learning invariant representations. Empirical evaluation on simulation and real data demonstrates the effectiveness of the proposed method.
    Finding NEEMo: Geometric Fitting using Neural Estimation of the Energy Mover's Distance. (arXiv:2209.15624v1 [stat.ML])
    A novel neural architecture was recently developed that enforces an exact upper bound on the Lipschitz constant of the model by constraining the norm of its weights in a minimal way, resulting in higher expressiveness compared to other techniques. We present a new and interesting direction for this architecture: estimation of the Wasserstein metric (Earth Mover's Distance) in optimal transport by employing the Kantorovich-Rubinstein duality to enable its use in geometric fitting applications. Specifically, we focus on the field of high-energy particle physics, where it has been shown that a metric for the space of particle-collider events can be defined based on the Wasserstein metric, referred to as the Energy Mover's Distance (EMD). This metrization has the potential to revolutionize data-driven collider phenomenology. The work presented here represents a major step towards realizing this goal by providing a differentiable way of directly calculating the EMD. We show how the flexibility that our approach enables can be used to develop novel clustering algorithms.
    Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility. (arXiv:2105.03692v2 [cs.LG] UPDATED)
    As deep learning datasets grow larger and less curated, backdoor data poisoning attacks, which inject malicious poisoned data into the training dataset, have drawn increasing attention in both academia and industry. We identify an incompatibility property of the interaction of clean and poisoned data with the training algorithm, specifically that including poisoned data in the training dataset does not improve model accuracy on clean data and vice-versa. Leveraging this property, we develop an algorithm that iteratively refines subsets of the poisoned dataset to obtain subsets that concentrate around either clean or poisoned data. The result is a partition of the original dataset into disjoint subsets, for each of which we train a corresponding model. A voting algorithm over these models identifies the clean data within the larger poisoned dataset. We empirically evaluate our approach and technique for image classification tasks over the GTSRB and CIFAR-10 datasets. The experimental results show that prior dirty-label and clean-label backdoor attacks in the literature produce poisoned datasets that exhibit behavior consistent with the incompatibility property. The results also show that our defense reduces the attack success rate below 1% on 134 out of 165 scenarios in this setting, with only a 2% drop in clean accuracy on CIFAR-10 (and negligible impact on GTSRB).
    Sequential Importance Sampling for Hybrid Model Bayesian Inference to Support Bioprocess Mechanism Learning and Robust Control. (arXiv:2205.02410v4 [stat.ML] UPDATED)
    Driven by the critical needs of biomanufacturing 4.0, we introduce a probabilistic knowledge graph hybrid model characterizing the risk- and science-based understanding of bioprocess mechanisms. It can faithfully capture the important properties, including nonlinear reactions, partially observed state, and nonstationary dynamics. Given very limited real process observations, we derive a posterior distribution quantifying model estimation uncertainty. To avoid the evaluation of intractable likelihoods, Approximate Bayesian Computation sampling with Sequential Monte Carlo (ABC-SMC) is utilized to approximate the posterior distribution. Under high stochastic and model uncertainties, it is computationally expensive to match output trajectories. Therefore, we create a linear Gaussian dynamic Bayesian network (LG-DBN) auxiliary likelihood-based ABC-SMC approach. Through matching the summary statistics driven through LG-DBN likelihood that can capture critical interactions and variations, the proposed algorithm can accelerate hybrid model inference, support process monitoring, and facilitate mechanism learning and robust control.
    Switching One-Versus-the-Rest Loss to Increase the Margin of Logits for Adversarial Robustness. (arXiv:2207.10283v2 [cs.LG] UPDATED)
    Adversarial training is a promising method to improve the robustness against adversarial attacks. To enhance its performance, recent methods impose high weights on the cross-entropy loss for important data points near the decision boundary. However, these importance-aware methods are vulnerable to sophisticated attacks, e.g., Auto-Attack. In this paper, we experimentally investigate the cause of their vulnerability via margins between logits for the true label and the other labels because they should be large enough to prevent the largest logit from being flipped by the attacks. Our experiments reveal that the histogram of the logit margins of na\"ive adversarial training has two peaks. Thus, the levels of difficulty in increasing logit margins are roughly divided into two: difficult samples (small logit margins) and easy samples (large logit margins). On the other hand, only one peak near zero appears in the histogram of importance-aware methods, i.e., they reduce the logit margins of easy samples. To increase logit margins of difficult samples without reducing those of easy samples, we propose switching one-versus-the-rest loss (SOVR), which switches from cross-entropy to one-versus-the-rest loss (OVR) for difficult samples. We derive trajectories of logit margins for a simple problem and prove that OVR increases logit margins two times larger than the weighted cross-entropy loss. Thus, SOVR increases logit margins of difficult samples, unlike existing methods. We experimentally show that SOVR achieves better robustness against Auto-Attack than importance-aware methods.
    A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning. (arXiv:2209.15634v1 [cs.LG])
    With the increasing need for handling large state and action spaces, general function approximation has become a key technique in reinforcement learning (RL). In this paper, we propose a general framework that unifies model-based and model-free RL, and an Admissible Bellman Characterization (ABC) class that subsumes nearly all Markov Decision Process (MDP) models in the literature for tractable RL. We propose a novel estimation function with decomposable structural properties for optimization-based exploration and the functional eluder dimension as a complexity measure of the ABC class. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed, achieving regret bounds that match or improve over the best-known results for a variety of MDP models. In particular, for MDPs with low Witness rank, under a slightly stronger assumption, OPERA improves the state-of-the-art sample complexity results by a factor of $dH$. Our framework provides a generic interface to design and analyze new RL models and algorithms.
    Non-asymptotic Optimal Prediction Error for Growing-dimensional Partially Functional Linear Models. (arXiv:2009.04729v3 [math.ST] UPDATED)
    Under the reproducing kernel Hilbert spaces (RKHS), we consider the penalized least-squares of the partially functional linear models (PFLM), whose predictor contains both functional and traditional multivariate parts, and the multivariate part allows a divergent number of parameters. From the non-asymptotic point of view, we focus on the rate-optimal upper and lower bounds of the prediction error. An exact upper bound for the excess prediction risk is shown in a non-asymptotic form under a more general assumption known as the effective dimension to the model, by which we also show the prediction consistency when the number of multivariate covariates $p$ slightly increases with the sample size $n$. Our new finding implies a trade-off between the number of non-functional predictors and the effective dimension of the kernel principal components to ensure prediction consistency in the increasing-dimensional setting. The analysis in our proof hinges on the spectral condition of the sandwich operator of the covariance operator and the reproducing kernel, and on sub-Gaussian and Berstein concentration inequalities for the random elements in Hilbert space. Finally, we derive the non-asymptotic minimax lower bound under the regularity assumption of the Kullback-Leibler divergence of the models.
    Contextual Bandits with Knapsacks for a Conversion Model. (arXiv:2206.00314v2 [cs.LG] UPDATED)
    We consider contextual bandits with knapsacks, with an underlying structure between rewards generated and cost vectors suffered. We do so motivated by sales with commercial discounts. At each round, given the stochastic i.i.d.\ context $\mathbf{x}_t$ and the arm picked $a_t$ (corresponding, e.g., to a discount level), a customer conversion may be obtained, in which case a reward $r(a,\mathbf{x}_t)$ is gained and vector costs $c(a_t,\mathbf{x}_t)$ are suffered (corresponding, e.g., to losses of earnings). Otherwise, in the absence of a conversion, the reward and costs are null. The reward and costs achieved are thus coupled through the binary variable measuring conversion or the absence thereof. This underlying structure between rewards and costs is different from the linear structures considered by Agrawal and Devanur [2016] (but we show that the techniques introduced in the present article may also be applied to the case of these linear structures). The adaptive policies exhibited solve at each round a linear program based on upper-confidence estimates of the probabilities of conversion given $a$ and $\mathbf{x}$. This kind of policy is most natural and achieves a regret bound of the typical order (OPT/$B$) $\sqrt{T}$, where $B$ is the total budget allowed, OPT is the optimal expected reward achievable by a static policy, and $T$ is the number of rounds.  ( 3 min )
    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control. (arXiv:2110.01052v5 [cs.LG] UPDATED)
    We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithms work with any underlying model and (unknown) data-generating distribution and do not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use the framework to provide new calibration methods for several core machine learning tasks, with detailed worked examples in computer vision and tabular medical data.  ( 2 min )
    Adaptive Discretization in Online Reinforcement Learning. (arXiv:2110.15843v2 [stat.ML] UPDATED)
    Discretization based approaches to solving online reinforcement learning problems have been studied extensively in practice on applications ranging from resource allocation to cache management. Two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. While there have been several experimental results investigating heuristic solutions to these questions, there has been little theoretical treatment. In this paper we provide a unified theoretical analysis of tree-based hierarchical partitioning methods for online reinforcement learning, providing model-free and model-based algorithms. We show how our algorithms are able to take advantage of inherent structure of the problem by providing guarantees that scale with respect to the 'zooming dimension' instead of the ambient dimension, an instance-dependent quantity measuring the benignness of the optimal $Q_h^\star$ function. Many applications in computing systems and operations research requires algorithms that compete on three facets: low sample complexity, mild storage requirements, and low computational burden. Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets. This motivates its use in practical applications as our approach automatically adapts to underlying problem structure even when very little is known a priori about the system.  ( 3 min )
    $\Phi$-DVAE: Learning Physically Interpretable Representations with Nonlinear Filtering. (arXiv:2209.15609v1 [stat.ML])
    Incorporating unstructured data into physical models is a challenging problem that is emerging in data assimilation. Traditional approaches focus on well-defined observation operators whose functional forms are typically assumed to be known. This prevents these methods from achieving a consistent model-data synthesis in configurations where the mapping from data-space to model-space is unknown. To address these shortcomings, in this paper we develop a physics-informed dynamical variational autoencoder ($\Phi$-DVAE) for embedding diverse data streams into time-evolving physical systems described by differential equations. Our approach combines a standard (possibly nonlinear) filter for the latent state-space model and a VAE, to embed the unstructured data stream into the latent dynamical system. A variational Bayesian framework is used for the joint estimation of the embedding, latent states, and unknown system parameters. To demonstrate the method, we look at three examples: video datasets generated by the advection and Korteweg-de Vries partial differential equations, and a velocity field generated by the Lorenz-63 system. Comparisons with relevant baselines show that the $\Phi$-DVAE provides a data efficient dynamics encoding methodology that is competitive with standard approaches, with the added benefit of incorporating a physically interpretable latent space.  ( 2 min )
    Transfer Learning with Pre-trained Conditional Generative Models. (arXiv:2204.12833v2 [cs.LG] UPDATED)
    Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods always assume at least one of (i) source and target task label spaces overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, holding these assumptions is difficult in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to storage costs and privacy, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with an artificial dataset synthesized by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform the baselines of scratch training and knowledge distillation.  ( 2 min )
    Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders. (arXiv:2202.09671v3 [stat.ML] UPDATED)
    Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy-data distribution that we can confidently learn. Then, we use fewer reverse steps to generate data by starting from this hidden distribution that is made similar to the noisy data. We reveal that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior. Experimental results show even with a significantly smaller number of reverse diffusion steps, the proposed truncated diffusion probabilistic models can provide consistent improvements over the non-truncated ones in terms of performance in both unconditional and text-guided image generations.  ( 2 min )
    Identifying Weight-Variant Latent Causal Models. (arXiv:2208.14153v2 [cs.LG] UPDATED)
    The task of causal representation learning aims to uncover latent higher-level causal representations that affect lower-level observations. Identifying true latent causal representations from observed data, while allowing instantaneous causal relations among latent variables, remains a challenge, however. To this end, we start from the analysis of three intrinsic properties in identifying latent space from observations: transitivity, permutation indeterminacy, and scaling indeterminacy. We find that transitivity acts as a key role in impeding the identifiability of latent causal representations. To address the unidentifiable issue due to transitivity, we introduce a novel identifiability condition where the underlying latent causal model satisfies a linear-Gaussian model, in which the causal coefficients and the distribution of Gaussian noise are modulated by an additional observed variable. Under some mild assumptions, we can show that the latent causal representations can be identified up to trivial permutation and scaling. Furthermore, based on this theoretical result, we propose a novel method, termed Structural caUsAl Variational autoEncoder, which directly learns latent causal representations and causal relationships among them, together with the mapping from the latent causal variables to the observed ones. We show that the proposed method learns the true parameters asymptotically. Experimental results on synthetic and real data demonstrate the identifiability and consistency results and the efficacy of the proposed method in learning latent causal representations.  ( 3 min )
    One-Shot Adaptation of GAN in Just One CLIP. (arXiv:2203.09301v3 [cs.CV] UPDATED)
    There are many recent research efforts to fine-tune a pre-trained generator with a few target images to generate images of a novel domain. Unfortunately, these methods often suffer from overfitting or under-fitting when fine-tuned with a single target image. To address this, here we present a novel single-shot GAN adaptation method through unified CLIP space manipulations. Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization, followed by generator fine-tuning with a novel loss function that imposes CLIP space consistency between the source and adapted generators. To further improve the adapted model to produce spatially consistent samples with respect to the source generator, we also propose contrastive regularization for patchwise relationships in the CLIP space. Experimental results show that our model generates diverse outputs with the target texture and outperforms the baseline models both qualitatively and quantitatively. Furthermore, we show that our CLIP space manipulation strategy allows more effective attribute editing.  ( 2 min )
    The Final Ascent: When Bigger Models Generalize Worse on Noisy-Labeled Data. (arXiv:2208.08003v2 [cs.LG] UPDATED)
    Increasing the size of overparameterized neural networks has been shown to improve their generalization performance. However, real-world datasets often contain a significant fraction of noisy labels, which can drastically harm the performance of the models trained on them. In this work, we study how neural networks' test loss changes with model size when the training set contains noisy labels. We show that under a sufficiently large noise-to-sample size ratio, generalization error eventually increases with model size. First, we provide a theoretical analysis on random feature regression and show that this phenomenon occurs as the variance of the generalization loss experiences a second ascent under large noise-to-sample size ratio. Then, we present extensive empirical evidence confirming that our theoretical results hold for neural networks. Furthermore, we empirically observe that the adverse effect of network size is more pronounced when robust training methods are employed to learn from noisy-labeled data. Our results have important practical implications: First, larger models should be employed with extra care, particularly when trained on smaller dataset or using robust learning methods. Second, a large sample size can alleviate the effect of noisy labels and allow larger models to achieve a superior performance even under noise.  ( 3 min )
    ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery. (arXiv:2209.15265v1 [cs.LG])
    The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We set out to resolve this discrepancy from a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models. The situation is simple: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models even when the labels are noisy. The phase transition phenomenon is confirmed through numerical experiments.  ( 3 min )
    Evaluation of importance estimators in deep learning classifiers for Computed Tomography. (arXiv:2209.15398v1 [cs.CV])
    Deep learning has shown superb performance in detecting objects and classifying images, ensuring a great promise for analyzing medical imaging. Translating the success of deep learning to medical imaging, in which doctors need to understand the underlying process, requires the capability to interpret and explain the prediction of neural networks. Interpretability of deep neural networks often relies on estimating the importance of input features (e.g., pixels) with respect to the outcome (e.g., class probability). However, a number of importance estimators (also known as saliency maps) have been developed and it is unclear which ones are more relevant for medical imaging applications. In the present work, we investigated the performance of several importance estimators in explaining the classification of computed tomography (CT) images by a convolutional deep network, using three distinct evaluation metrics. First, the model-centric fidelity measures a decrease in the model accuracy when certain inputs are perturbed. Second, concordance between importance scores and the expert-defined segmentation masks is measured on a pixel level by a receiver operating characteristic (ROC) curves. Third, we measure a region-wise overlap between a XRAI-based map and the segmentation mask by Dice Similarity Coefficients (DSC). Overall, two versions of SmoothGrad topped the fidelity and ROC rankings, whereas both Integrated Gradients and SmoothGrad excelled in DSC evaluation. Interestingly, there was a critical discrepancy between model-centric (fidelity) and human-centric (ROC and DSC) evaluation. Expert expectation and intuition embedded in segmentation maps does not necessarily align with how the model arrived at its prediction. Understanding this difference in interpretability would help harnessing the power of deep learning in medicine.  ( 3 min )
    Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability. (arXiv:2209.15594v1 [cs.LG])
    Traditional analyses of gradient descent show that when the largest eigenvalue of the Hessian, also known as the sharpness $S(\theta)$, is bounded by $2/\eta$, training is "stable" and the training loss decreases monotonically. Recent works, however, have observed that this assumption does not hold when training modern neural networks with full batch or large batch gradient descent. Most recently, Cohen et al. (2021) observed two important phenomena. The first, dubbed progressive sharpening, is that the sharpness steadily increases throughout training until it reaches the instability cutoff $2/\eta$. The second, dubbed edge of stability, is that the sharpness hovers at $2/\eta$ for the remainder of training while the loss continues decreasing, albeit non-monotonically. We demonstrate that, far from being chaotic, the dynamics of gradient descent at the edge of stability can be captured by a cubic Taylor expansion: as the iterates diverge in direction of the top eigenvector of the Hessian due to instability, the cubic term in the local Taylor expansion of the loss function causes the curvature to decrease until stability is restored. This property, which we call self-stabilization, is a general property of gradient descent and explains its behavior at the edge of stability. A key consequence of self-stabilization is that gradient descent at the edge of stability implicitly follows projected gradient descent (PGD) under the constraint $S(\theta) \le 2/\eta$. Our analysis provides precise predictions for the loss, sharpness, and deviation from the PGD trajectory throughout training, which we verify both empirically in a number of standard settings and theoretically under mild conditions. Our analysis uncovers the mechanism for gradient descent's implicit bias towards stability.  ( 3 min )
    TT-NF: Tensor Train Neural Fields. (arXiv:2209.15529v1 [cs.LG])
    Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations. In this paper, we introduce a novel low-rank representation termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense regular grids and efficient methods for sampling from them. Our representation is a TT parameterization of the neural field, trained with backpropagation to minimize a non-convex objective. We analyze the effect of low-rank compression on the downstream task quality metrics in two settings. First, we demonstrate the efficiency of our method in a sandbox task of tensor denoising, which admits comparison with SVD-based schemes designed to minimize reconstruction error. Furthermore, we apply the proposed approach to Neural Radiance Fields, where the low-rank structure of the field corresponding to the best quality can be discovered only through learning.  ( 2 min )
    Sparse Random Networks for Communication-Efficient Federated Learning. (arXiv:2209.15328v1 [cs.LG])
    One main challenge in federated learning is the large communication cost of exchanging weight updates from clients to the server at each round. While prior work has made great progress in compressing the weight updates through gradient compression methods, we propose a radically different approach that does not update the weights at all. Instead, our method freezes the weights at their initial \emph{random} values and learns how to sparsify the random network for the best performance. To this end, the clients collaborate in training a \emph{stochastic} binary mask to find the optimal sparse random network within the original one. At the end of the training, the final model is a sparse network with random weights -- or a subnetwork inside the dense random network. We show improvements in accuracy, communication (less than $1$ bit per parameter (bpp)), convergence speed, and final model size (less than $1$ bpp) over relevant baselines on MNIST, EMNIST, CIFAR-10, and CIFAR-100 datasets, in the low bitrate regime under various system configurations.  ( 2 min )
    Sparsity-Constrained Optimal Transport. (arXiv:2209.15466v1 [stat.ML])
    Regularized optimal transport (OT) is now increasingly used as a loss or as a matching layer in neural networks. Entropy-regularized OT can be computed using the Sinkhorn algorithm but it leads to fully-dense transportation plans, meaning that all sources are (fractionally) matched with all targets. To address this issue, several works have investigated quadratic regularization instead. This regularization preserves sparsity and leads to unconstrained and smooth (semi) dual objectives, that can be solved with off-the-shelf gradient methods. Unfortunately, quadratic regularization does not give direct control over the cardinality (number of nonzeros) of the transportation plan. We propose in this paper a new approach for OT with explicit cardinality constraints on the transportation plan. Our work is motivated by an application to sparse mixture of experts, where OT can be used to match input tokens such as image patches with expert models such as neural networks. Cardinality constraints ensure that at most $k$ tokens are matched with an expert, which is crucial for computational performance reasons. Despite the nonconvexity of cardinality constraints, we show that the corresponding (semi) dual problems are tractable and can be solved with first-order gradient methods. Our method can be thought as a middle ground between unregularized OT (recovered in the limit case $k=1$) and quadratically-regularized OT (recovered when $k$ is large enough). The smoothness of the objectives increases as $k$ increases, giving rise to a trade-off between convergence speed and sparsity of the optimal plan.  ( 3 min )
    Building Normalizing Flows with Stochastic Interpolants. (arXiv:2209.15571v1 [cs.LG])
    A simple generative model based on a continuous-time normalizing flow between any pair of base and target distributions is proposed. The velocity field of this flow is inferred from the probability current of a time-dependent distribution that interpolates between the base and the target in finite time. Unlike conventional normalizing flow inference methods based the maximum likelihood principle, which require costly backpropagation through ODE solvers, our interpolant approach leads to a simple quadratic loss for the velocity itself which is expressed in terms of expectations that are readily amenable to empirical estimation. The flow can be used to generate samples from either the base or target, and can be used to estimate the likelihood at any time along the interpolant. The approach is contextualized in its relation to diffusions. In particular, in situations where the base is a Gaussian distribution, we show that the velocity of our normalizing flow can also be used to construct a diffusion model to sample the target as well as estimating its score. This allows one to map methods based on stochastic differential equations to those of ordinary differential equations, simplifying the mechanics of the model, but capturing equivalent dynamics. Benchmarking on density estimation tasks illustrates that the learned flow can match and surpass maximum likelihood continuous flows at a fraction of the conventional ODE training costs.  ( 3 min )
    Flexible risk design using bi-directional dispersion. (arXiv:2203.14434v2 [stat.ML] UPDATED)
    Many novel notions of "risk" (e.g., CVaR, tilted risk, DRO risk) have been proposed and studied, but these risks are all at least as sensitive as the mean to loss tails on the upside, and tend to ignore deviations on the downside. We study a complementary new risk class that penalizes loss deviations in a bi-directional manner, while having more flexibility in terms of tail sensitivity than is offered by mean-variance. This class lets us derive high-probability learning guarantees without explicit gradient clipping, and empirical tests using both simulated and real data illustrate a high degree of control over key properties of the test loss distribution incurred by gradient-based learners.  ( 2 min )
    Individual Privacy Accounting with Gaussian Differential Privacy. (arXiv:2209.15596v1 [cs.CR])
    Individual privacy accounting enables bounding differential privacy (DP) loss individually for each participant involved in the analysis. This can be informative as often the individual privacy losses are considerably smaller than those indicated by the DP bounds that are based on considering worst-case bounds at each data access. In order to account for the individual privacy losses in a principled manner, we need a privacy accountant for adaptive compositions of randomised mechanisms, where the loss incurred at a given data access is allowed to be smaller than the worst-case loss. This kind of analysis has been carried out for the R\'enyi differential privacy (RDP) by Feldman and Zrnic (2021), however not yet for the so-called optimal privacy accountants. We make first steps in this direction by providing a careful analysis using the Gaussian differential privacy which gives optimal bounds for the Gaussian mechanism, one of the most versatile DP mechanisms. This approach is based on determining a certain supermartingale for the hockey-stick divergence and on extending the R\'enyi divergence-based fully adaptive composition results by Feldman and Zrnic (2021). We also consider measuring the individual $(\varepsilon,\delta)$-privacy losses using the so-called privacy loss distributions. With the help of the Blackwell theorem, we can then make use of the RDP analysis to construct an approximative individual $(\varepsilon,\delta)$-accountant.  ( 3 min )
    Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions. (arXiv:2209.15055v1 [stat.ML])
    We show that the representation cost of fully connected neural networks with homogeneous nonlinearities - which describes the implicit bias in function space of networks with $L_2$-regularization or with losses such as the cross-entropy - converges as the depth of the network goes to infinity to a notion of rank over nonlinear functions. We then inquire under which conditions the global minima of the loss recover the `true' rank of the data: we show that for too large depths the global minimum will be approximately rank 1 (underestimating the rank); we then argue that there is a range of depths which grows with the number of datapoints where the true rank is recovered. Finally, we discuss the effect of the rank of a classifier on the topology of the resulting class boundaries and show that autoencoders with optimal nonlinear rank are naturally denoising.  ( 2 min )
    Efficient computation of the Knowledge Gradient for Bayesian Optimization. (arXiv:2209.15367v1 [cs.LG])
    Bayesian optimization is a powerful collection of methods for optimizing stochastic expensive black box functions. One key component of a Bayesian optimization algorithm is the acquisition function that determines which solution should be evaluated in every iteration. A popular and very effective choice is the Knowledge Gradient acquisition function, however there is no analytical way to compute it. Several different implementations make different approximations. In this paper, we review and compare the spectrum of Knowledge Gradient implementations and propose One-shot Hybrid KG, a new approach that combines several of the previously proposed ideas and is cheap to compute as well as powerful and efficient. We prove the new method preserves theoretical properties of previous methods and empirically show the drastically reduced computational overhead with equal or improved performance. All experiments are implemented in BOTorch and code is available on github.  ( 2 min )
    Improve learning combining crowdsourced labels by weighting Areas Under the Margin. (arXiv:2209.15380v1 [cs.LG])
    In supervised learning -- for instance in image classification -- modern massive datasets are commonly labeled by a crowd of workers. The obtained labels in this crowdsourcing setting are then aggregated for training. The aggregation step generally leverages a per worker trust score. Yet, such worker-centric approaches discard each task ambiguity. Some intrinsically ambiguous tasks might even fool expert workers, which could eventually be harmful for the learning step. In a standard supervised learning setting -- with one label per task and balanced classes -- the Area Under the Margin (AUM) statistic is tailored to identify mislabeled data. We adapt the AUM to identify ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted AUM (WAUM). The WAUM is an average of AUMs weighted by worker and task dependent scores. We show that the WAUM can help discarding ambiguous tasks from the training set, leading to better generalization or calibration performance. We report improvements with respect to feature-blind aggregation strategies both for simulated settings and for the CIFAR-10H crowdsourced dataset.  ( 2 min )
    Sparse tree-based initialization for neural networks. (arXiv:2209.15283v1 [stat.ML])
    Dedicated neural network (NN) architectures have been designed to handle specific data types (such as CNN for images or RNN for text), which ranks them among state-of-the-art methods for dealing with these data. Unfortunately, no architecture has been found for dealing with tabular data yet, for which tree ensemble methods (tree boosting, random forests) usually show the best predictive performances. In this work, we propose a new sparse initialization technique for (potentially deep) multilayer perceptrons (MLP): we first train a tree-based procedure to detect feature interactions and use the resulting information to initialize the network, which is subsequently trained via standard stochastic gradient strategies. Numerical experiments on several tabular data sets show that this new, simple and easy-to-use method is a solid concurrent, both in terms of generalization capacity and computation time, to default MLP initialization and even to existing complex deep learning solutions. In fact, this wise MLP initialization raises the resulting NN methods to the level of a valid competitor to gradient boosting when dealing with tabular data. Besides, such initializations are able to preserve the sparsity of weights introduced in the first layers of the network through training. This fact suggests that this new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor (as for convolutional layers in CNN).  ( 3 min )
    Safe Exploration Method for Reinforcement Learning under Existence of Disturbance. (arXiv:2209.15452v1 [cs.LG])
    Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.  ( 3 min )
    Fast Topological Signal Identification and Persistent Cohomological Cycle Matching. (arXiv:2209.15446v1 [math.AT])
    Within the context of topological data analysis, the problems of identifying topological significance and matching signals across datasets are important and useful inferential tasks in many applications. The limitation of existing solutions to these problems, however, is computational speed. In this paper, we harness the state-of-the-art for persistent homology computation by studying the problem of determining topological prevalence and cycle matching using a cohomological approach, which increases their feasibility and applicability to a wider variety of applications and contexts. We demonstrate this on a wide range of real-life, large-scale, and complex datasets. We extend existing notions of topological prevalence and cycle matching to include general non-Morse filtrations. This provides the most general and flexible state-of-the-art adaptation of topological signal identification and persistent cycle matching, which performs comparisons of orders of ten for thousands of sampled points in a matter of minutes on standard institutional HPC CPU facilities.  ( 2 min )
    Leveraging variational autoencoders for multiple data imputation. (arXiv:2209.15321v1 [stat.ML])
    Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations, particularly for more extreme missing data values. To overcome this, we employ $\beta$-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of $\beta$ is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. In downstream tasks, we show how multiple imputation with $\beta$-VAEs can avoid false discoveries that arise as artefacts of imputation.  ( 2 min )
    Likelihood adjusted semidefinite programs for clustering heterogeneous data. (arXiv:2209.15097v1 [stat.ML])
    Clustering is a widely deployed unsupervised learning tool. Model-based clustering is a flexible framework to tackle data heterogeneity when the clusters have different shapes. Likelihood-based inference for mixture distributions often involves non-convex and high-dimensional objective functions, imposing difficult computational and statistical challenges. The classic expectation-maximization (EM) algorithm is a computationally thrifty iterative method that maximizes a surrogate function minorizing the log-likelihood of observed data in each iteration, which however suffers from bad local maxima even in the special case of the standard Gaussian mixture model with common isotropic covariance matrices. On the other hand, recent studies reveal that the unique global solution of a semidefinite programming (SDP) relaxed $K$-means achieves the information-theoretically sharp threshold for perfectly recovering the cluster labels under the standard Gaussian mixture model. In this paper, we extend the SDP approach to a general setting by integrating cluster labels as model parameters and propose an iterative likelihood adjusted SDP (iLA-SDP) method that directly maximizes the \emph{exact} observed likelihood in the presence of data heterogeneity. By lifting the cluster assignment to group-specific membership matrices, iLA-SDP avoids centroids estimation -- a key feature that allows exact recovery under well-separateness of centroids without being trapped by their adversarial configurations. Thus iLA-SDP is less sensitive than EM to initialization and more stable on high-dimensional data. Our numeric experiments demonstrate that iLA-SDP can achieve lower mis-clustering errors over several widely used clustering methods including $K$-means, SDP and EM algorithms.  ( 3 min )
    On the optimization and generalization of overparameterized implicit neural networks. (arXiv:2209.15562v1 [cs.LG])
    Implicit neural networks have become increasingly attractive in the machine learning community since they can achieve competitive performance but use much less computational resources. Recently, a line of theoretical works established the global convergences for first-order methods such as gradient descent if the implicit networks are over-parameterized. However, as they train all layers together, their analyses are equivalent to only studying the evolution of the output layer. It is unclear how the implicit layer contributes to the training. Thus, in this paper, we restrict ourselves to only training the implicit layer. We show that global convergence is guaranteed, even if only the implicit layer is trained. On the other hand, the theoretical understanding of when and how the training performance of an implicit neural network can be generalized to unseen data is still under-explored. Although this problem has been studied in standard feed-forward networks, the case of implicit neural networks is still intriguing since implicit networks theoretically have infinitely many layers. Therefore, this paper investigates the generalization error for implicit neural networks. Specifically, we study the generalization of an implicit network activated by the ReLU function over random initialization. We provide a generalization bound that is initialization sensitive. As a result, we show that gradient flow with proper random initialization can train a sufficient over-parameterized implicit network to achieve arbitrarily small generalization errors.  ( 3 min )
    Learning with MISELBO: The Mixture Cookbook. (arXiv:2209.15514v1 [cs.LG])
    Mixture models in variational inference (VI) is an active field of research. Recent works have established their connection to multiple importance sampling (MIS) through the MISELBO and advanced the use of ensemble approximations for large-scale problems. However, as we show here, an independent learning of the ensemble components can lead to suboptimal diversity. Hence, we study the effect of instead using MISELBO as an objective function for learning mixtures, and we propose the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network. Two major insights led to the construction of this novel composite model. First, mixture models have potential to be off-the-shelf tools for practitioners to obtain more flexible posterior approximations in VAEs. Therefore, we make them more accessible by demonstrating how to apply them to four popular architectures. Second, the mixture components cooperate in order to cover the target distribution while trying to maximize their diversity when MISELBO is the objective function. We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling. Finally, we demonstrate the superiority of the Mixture VAEs' learned feature representations on both image and single-cell transcriptome data, and obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets. Code available here: \url{https://github.com/Lagergren-Lab/MixtureVAEs}.  ( 3 min )
    Ensemble-based gradient inference for particle methods in optimization and sampling. (arXiv:2209.15420v1 [stat.ML])
    We propose an approach based on function evaluations and Bayesian inference to extract higher-order differential information of objective functions {from a given ensemble of particles}. Pointwise evaluation $\{V(x^i)\}_i$ of some potential $V$ in an ensemble $\{x^i\}_i$ contains implicit information about first or higher order derivatives, which can be made explicit with little computational effort (ensemble-based gradient inference -- EGI). We suggest to use this information for the improvement of established ensemble-based numerical methods for optimization and sampling such as Consensus-based optimization and Langevin-based samplers. Numerical studies indicate that the augmented algorithms are often superior to their gradient-free variants, in particular the augmented methods help the ensembles to escape their initial domain, to explore multimodal, non-Gaussian settings and to speed up the collapse at the end of optimization dynamics.} The code for the numerical examples in this manuscript can be found in the paper's Github repository (https://github.com/MercuryBench/ensemble-based-gradient.git).  ( 2 min )
    Many-Body Approximation for Tensors. (arXiv:2209.15338v1 [stat.ML])
    We propose a nonnegative tensor decomposition with focusing on the relationship between the modes of tensors. Traditional decomposition methods assume low-rankness in the representation, resulting in difficulties in global optimization and target rank selection. To address these problems, we present an alternative way to decompose tensors, a many-body approximation for tensors, based on an information geometric formulation. A tensor is treated via an energy-based model, where the tensor and its mode correspond to a probability distribution and a random variable, respectively, and many-body approximation is performed on it by taking the interaction between variables into account. Our model can be globally optimized in polynomial time in terms of the KL divergence minimization, which is empirically faster than low-rank approximations keeping comparable reconstruction error. Furthermore, we visualize interactions between modes as tensor networks and reveal a nontrivial relationship between many-body approximation and low-rank approximation.  ( 2 min )
    A deep learning approach to the probabilistic numerical solution of path-dependent partial differential equations. (arXiv:2209.15010v1 [cs.LG])
    Recent work on Path-Dependent Partial Differential Equations (PPDEs) has shown that PPDE solutions can be approximated by a probabilistic representation, implemented in the literature by the estimation of conditional expectations using regression. However, a limitation of this approach is to require the selection of a basis in a function space. In this paper, we overcome this limitation by the use of deep learning methods, and we show that this setting allows for the derivation of error bounds on the approximation of conditional expectations. Numerical examples based on a two-person zero-sum game, as well as on Asian and barrier option pricing, are presented. In comparison with other deep learning approaches, our algorithm appears to be more accurate, especially in large dimensions.  ( 2 min )
    Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. (arXiv:2209.15224v1 [stat.ML])
    Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that not only can effectively utilize unknown similarity between related tasks but is also robust against a fraction of outlier tasks from arbitrary sources. The proposed procedure is shown to achieve minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Finally, we demonstrate the effectiveness of our methods through simulations and a real data analysis. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.  ( 2 min )
    Structured Optimal Variational Inference for Dynamic Latent Space Models. (arXiv:2209.15117v1 [stat.ML])
    We consider a latent space model for dynamic networks, where our objective is to estimate the pairwise inner products of the latent positions. To balance posterior inference and computational scalability, we present a structured mean-field variational inference framework, where the time-dependent properties of the dynamic networks are exploited to facilitate computation and inference. Additionally, an easy-to-implement block coordinate ascent algorithm is developed with message-passing type updates in each block, whereas the complexity per iteration is linear with the number of nodes and time points. To facilitate learning of the pairwise latent distances, we adopt a Gamma prior for the transition variance different from the literature. To certify the optimality, we demonstrate that the variational risk of the proposed variational inference approach attains the minimax optimal rate under certain conditions. En route, we derive the minimax lower bound, which might be of independent interest. To best of our knowledge, this is the first such exercise for dynamic latent space models. Simulations and real data analysis demonstrate the efficacy of our methodology and the efficiency of our algorithm. Finally, our proposed methodology can be readily extended to the case where the scales of the latent nodes are learned in a nodewise manner.  ( 2 min )
    Minimalistic Unsupervised Learning with the Sparse Manifold Transform. (arXiv:2209.15261v1 [cs.LG])
    We describe a minimalistic and interpretable method for unsupervised learning, without resorting to data augmentation, hyperparameter tuning, or other engineering designs, that achieves performance close to the SOTA SSL methods. Our approach leverages the sparse manifold transform, which unifies sparse coding, manifold learning, and slow feature analysis. With a one-layer deterministic sparse manifold transform, one can achieve 99.3% KNN top-1 accuracy on MNIST, 81.1% KNN top-1 accuracy on CIFAR-10 and 53.2% on CIFAR-100. With a simple gray-scale augmentation, the model gets 83.2% KNN top-1 accuracy on CIFAR-10 and 57% on CIFAR-100. These results significantly close the gap between simplistic ``white-box'' methods and the SOTA methods. Additionally, we provide visualization to explain how an unsupervised representation transform is formed. The proposed method is closely connected to latent-embedding self-supervised methods and can be treated as the simplest form of VICReg. Though there remains a small performance gap between our simple constructive model and SOTA methods, the evidence points to this as a promising direction for achieving a principled and white-box approach to unsupervised learning.  ( 2 min )
    Improving Generative Flow Networks with Path Regularization. (arXiv:2209.15092v1 [cs.LG])
    Generative Flow Networks (GFlowNets) are recently proposed models for learning stochastic policies that generate compositional objects by sequences of actions with the probability proportional to a given reward function. The central problem of GFlowNets is to improve their exploration and generalization. In this work, we propose a novel path regularization method based on optimal transport theory that places prior constraints on the underlying structure of the GFlowNets. The prior is designed to help the GFlowNets better discover the latent structure of the target distribution or enhance its ability to explore the environment in the context of active learning. The path regularization controls the flow in GFlowNets to generate more diverse and novel candidates via maximizing the optimal transport distances between two forward policies or to improve the generalization via minimizing the optimal transport distances. In addition, we derive an efficient implementation of the regularization by finding its closed form solutions in specific cases and a meaningful upper bound that can be used as an approximation to minimize the regularization term. We empirically demonstrate the advantage of our path regularization on a wide range of tasks, including synthetic hypergrid environment modeling, discrete probabilistic modeling, and biological sequence design.  ( 3 min )
    Diffusion-based Image Translation using Disentangled Style and Content Representation. (arXiv:2209.15264v1 [cs.CV])
    Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer which is not limited to the specific domains. Unfortunately, due to the stochastic nature of diffusion models, it is often difficult to maintain the original content of the image during the reverse diffusion. To address this, here we present a novel diffusion-based unsupervised image translation method using disentangled style and content representation. Specifically, inspired by the splicing Vision Transformer, we extract intermediate keys of multihead self attention layer from ViT model and used them as the content preservation loss. Then, an image guided style transfer is performed by matching the [CLS] classification token from the denoised samples and target image, whereas additional CLIP loss is used for the text-driven style transfer. To further accelerate the semantic change during the reverse diffusion, we also propose a novel semantic divergence loss and resampling strategy. Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks.  ( 2 min )
  • Open

    [R] DDIM Reconstruction Confusion
    I'm trying to leverage the "determinism" that is discussed in the DDIM paper in order to go from a clean image (x_0) to it's appropriate latent representation (latent as in the x_T noise), and then back. AKA, I want to go from the clean image to the pure noise that directly maps back to the same image when running the DDIM diffusion procedure. I attached a snippet from the DDIM paper that describes this behavior well: https://preview.redd.it/rvi16d91mhr91.png?width=711&format=png&auto=webp&s=e8c0b68efecd9c1e1cc501c8038e665739d5d98c I've been throwing my head at this for a while and I have a few questions about the theory and also the implementation: Intuitively, where is the initial noise coming from? If we have a clean image, then how exactly do we derive the noise direction to start noising in? I understand that if we have random noise, then the DDIM sampling procedure will have determinism in the output, but I don't quite understand the reverse direction, since I'm not sure where the initial noise direction is rooted in. Implementation-wise, I don't really get where I can get the noise. Does this come from passing the clean image directly into the unet and getting a noise-residual, and then adding that to the clean image and iterating from there? When I tried doing this, I got some nasty results submitted by /u/adham-elarabawy [link] [comments]  ( 104 min )

  • Open

    Impact of using sockets to communicate between Python and RL environment
    Hello! When looking into implementing RL in a game environment, I found that both Unity MLAgents and the third-party UnrealCV communicate between the game environments and Python using sockets. I am looking into implementing RL for Unreal and wondering about the performance impact of using sockets vs using RL C++ libraries to keep everything "in-engine"/native. Since the socket connection is local, I assume the actual communication is near-instant. However, how does serializing all input (particularly large inputs like images) for the sockets impact performance? What about multiple agents - like communicating between several agents asynchronously? submitted by /u/AnAIReplacedMe [link] [comments]  ( 104 min )
    Good sources that explain the C51 algorithm
    Can someone suggest some good sources that explain the C51 algorithm well? I'm getting a little lost in the details, and the paper is not an easy read :) ​ I believe HSE university posted a few videos on Coursera about this (the Practical RL course), but the course has been removed. submitted by /u/Academic-Rent7800 [link] [comments]  ( 102 min )
    Learning to play "For Elise" by Beethoven, with reinforcement learning, at least the first few notes.
    Hello, I wanted to try on technique of reinforcement learning for music generation / imitation: It learns the first few notes after say a few hundred episodes but then somehow it gets stuck and can not learn the whole piece: https://github.com/githubuser1983/music_generation_with_reinforcement_learning ​ Here is some result, after playing a little bit with some hyperparameters: pdf: https://drive.google.com/file/d/1dB-gc7BPev4cryVbiDFTyBm0qKCGnhq8/view?usp=sharing mp3: https://drive.google.com/file/d/1VF7HUonfQXAVSzMANgu26fBvZCrFCOYQ/view?usp=sharing ​ Any feedback would be very nice! (I am not sure what the right flair is for this post) submitted by /u/musescore1983 [link] [comments]  ( 103 min )
    Reinforcement learning using only observations!
    I own frame-level annotated (message, stats-values parsed from tty) dataset of 30000 nethack gameplays but no actions for corresponding frames. I am looking for papers for doing RL or‍ world-modelling observations alone. any ideas? submitted by /u/hocobozos [link] [comments]  ( 102 min )
    RL methods / ideas for optimal stoppin
    What RL methods or ideas are relevant or can be used for optimal stopping aka "the secretary problem", if any at all? Would RL be generally appropriate for this type of problem? Edit: Sorry for the typo in the title, it should be "stopping" submitted by /u/countlinard [link] [comments]  ( 102 min )
    why exponential recency weighted average for non stationary problem?
    I'm reading chapter 2 of RL introduction by Sutton, I know what its trying to achieve but I just don't see the necessity of rearranging into exponential recency form. the exponential recency form is directly derived from the original form (first line), so isn't the exponential weighting already included in the original form ?? why not just use the first line?? https://preview.redd.it/pciyagsxvbr91.png?width=552&format=png&auto=webp&s=8c33bacff6bca09022f2bb9406beaf86c22e5f6a submitted by /u/bc0428 [link] [comments]  ( 103 min )
  • Open

    Dua Lipa by artificial intelligence
    submitted by /u/Straight_Soil_747 [link] [comments]  ( 102 min )
    Self-Programming Artificial Intelligence Using Code-Generating...
    submitted by /u/Black_RL [link] [comments]  ( 102 min )
    Stage 1: denial
    submitted by /u/Firm-Earth1633 [link] [comments]  ( 102 min )
    Some lessons from the History of AI
    I’ve recently become very interested in the history of AI, and I’ve been reading a few articles as well as the book “A Brief History of AI” by Michael Woolridge for a good primer on the subject. I’ve decided to collect some of my thoughts in a very short essay. First, I want to discuss the term “AI” itself. In some sense it feels like a marketing term. It’s a very loaded term, and any time it is used, there is almost always a more precise designation that could be used (machine learning, heuristic search, etc.). Of course any of these more precise terms could further be broken down with even more precision, but I think that when we group so many applications under the umbrella of “AI”, invalid associations are made. For example, if fears about slaughterbots create a negative bias towards …  ( 109 min )
    Opinion - are we just protein based characters?
    Are machines still dreaming of electrical sheep? Or have they moved on to replace human workers? submitted by /u/goronmask [link] [comments]  ( 102 min )
    Pattern recognition with neuromorphic computing using magnetic field–induced dynamics of skyrmions
    submitted by /u/hockiklocki [link] [comments]  ( 103 min )
    Biologists Create New "Human Cells"
    submitted by /u/engalinayf [link] [comments]  ( 102 min )
    Entropy modulation
    submitted by /u/marvelmind_robotics [link] [comments]  ( 102 min )
    High fashion campaigns with A.I.
    submitted by /u/Straight_Soil_747 [link] [comments]  ( 102 min )
    Deforum notebook update Now with Dynamic Video Masking for Stable Diffus...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 102 min )
    Another audio reactive animation experiment. This time utilizing piano notes alongside beats to zoom out and switch scenes.
    submitted by /u/dreamingtulpa [link] [comments]  ( 102 min )
    Anyway I can invest in ai art?
    Maybe I sound dumb but is there anyway I can invest money into a ai art company/ project? submitted by /u/SkeletorCrypto [link] [comments]  ( 102 min )
    Disturbing "This person does not exist"
    submitted by /u/Syn_Chronized [link] [comments]  ( 102 min )
    MegaPortraits: High-Res Deepfakes Created From a Single Photo
    submitted by /u/globeworldmap [link] [comments]  ( 103 min )
    Who needs dalle 2 access?
    I got accounts linked to dalle 2 so if anyone needs one hmu submitted by /u/Designer-Career6211 [link] [comments]  ( 102 min )
    Would silicon be able to construct robot brains that house consciousness like that of Star Wars?
    I recently watched a Tim Ventura video about replacing silicon for computer chips and that A.I. neural networks would produce too much energy and need to be the size of buildings to create consciousness. That being said. What could replace silicon so we can build the robot brains the droids have in Star Wars. submitted by /u/InfinityScientist [link] [comments]  ( 103 min )
    Question about the future of ai generated video
    I love the new text to image generators, I think it's amazing. I love scifi and stuff as well it all got me thinking about where this is headed and I wanted to know how long it may realistically take till we get there. I could see a world where you could feed a neural network all the episodes of Seinfeld and it could give you an infinite number of new episodes, all rendered in photo realistic detail. Could be any show, King of Queens, Everybody Loves Raymond, Family Guy, Community. You could also have an ai write and generate shows based on current trends with no input from human beings. All the voice work and acting would he generated. Then the show could morph and change in real time based on reception and feedback. So how long until those things are realistically possible, or at least close? I know we already have some video generators but they aren't very good. Thanks submitted by /u/prestigeworldwyd [link] [comments]  ( 112 min )
    Alien PCB - To reimagine technology, the technological innovation - (microfluidic logic circuit, analog of electronic oscillator)
    (Midjourney, unedited, no filter) submitted by /u/Embarrassed_Way_7539 [link] [comments]  ( 102 min )
    (Spider warning) How to recreate this? What algorithm was used?
    submitted by /u/massimo_nyc [link] [comments]  ( 110 min )
  • Open

    [D] Most interesting papers from ICLR 2023 submissions?
    Hey guys, I’m looking for new papers to read when I’m bored, are there any ICLR papers that have got your attention? submitted by /u/billjames1685 [link] [comments]  ( 103 min )
    [P] A simple openAI gym dashboard in the browser
    submitted by /u/vaaal88 [link] [comments]  ( 103 min )
    [D] - Why do Attention layers work so well? Don't weights in DNNs already tell the network how much weight/attention to give to a specific input? (High weight = lots of attention, low weight = little attention)
    So an attention layer has a Q, K, and V vector My understanding is the goal is to say for a given query q, how relevant is the value v. From this the network learns which data is relevant to focus on for a given input. But what I don't get is why this is effective. Don't DNNs already do this with weights? A neuron in a hidden layer can be set off by any arbitrary combination of inputs, so in principle something like attention should be able to naturally emerge inside of a DNN. For example, image recognition neural network may learn to focus on specific patterns of pixels and ignore others. Why does hard coding this mechanism into the model so much benefit? submitted by /u/029187 [link] [comments]  ( 107 min )
    [D] TensorRT in C++ vs in Python
    Will I see any significant decrease in runtime if I run the TensorRT inference in C++ instead of Python for my Yolov5 network? How about for my custom convolutional network? submitted by /u/Commercial_Put577 [link] [comments]  ( 103 min )
    [D] Mixing paragraphs of a reading and generate a new meaning
    Would it be possible to take different texts, with similar topic , cut them in several paragraphs and mix them to create new text with meaning? If so, would it be too complex to do it? submitted by /u/jabertolin [link] [comments]  ( 103 min )
    [D] - Have there been any cutting edge or practical use cases where neuro-evolution was used?
    Hi all, the post title basically says it all. Has neuro evolution ever been the best-in-class alg for a type of problem? submitted by /u/029187 [link] [comments]  ( 103 min )
    [D] Types of Machine Learning Papers
    submitted by /u/Lost-Parfait568 [link] [comments]  ( 105 min )
    Tesla AI day 2022 video link and index [Discussion]
    https://youtu.be/ODSJsviD_SU ​ 17:04 Bot reveal 35:16 crash test 40:19 powertrain 45:48 biologically inspired design 49:43 visual navigation 56:25 motion adaptation 56:51 what's next ​ 58:10 autopilot intro 1:04:00 planning 1:11:28 occlusions 11:12:21 occupancy network 1:17:00 nerf discussion 1:19:07 auto labeling 1:20:00 14k gpus 1:23:22 optimized video training 1:25:29 auto pilot vision 1:28:00 model as language components ​ 1:34:36 sparsification 1:35:39 fsd lanes network in car 1:38:20 1B parameters, compiler tool chain 1:40:32 autolabeling 1:47:09 challenge cases 1:47:52 simulation 1:51:38 unreal engine 1:53:52 data engine, improve autopilot thru data ​ ​ 1:56:46 dojo super computer 2:02:43 dojo accelerator 2:05:43 voltage regulator module 2:07:38 vibrating capacitors 2:09:31 cooling solutions 2:11:00 dojo interface processor 2:12:17 dojo host interface 2:12:41 dojo cabinet 2:13:01 exapod 2:13:55 software stack 2:17:46 dojo compiler 2:20:48 dojo vs a100 2:22:42 ingest, dataloader 2:24:20 72 gpu rqcs to 4 dojo cabinets ​ 2:26:32 q&a submitted by /u/MLisdabomb [link] [comments]  ( 104 min )
    [P] Talking head animation with StyleGAN!
    submitted by /u/willowill5 [link] [comments]  ( 103 min )
    [D] Has anyone done or found a fair price-quality analysis of modern NLPs?
    I'm not much involved in ML, but I've been tasked with finding the best price-quality text generation solution (basically, for generating ads and product descriptions). What I need is a custom solution. I've learned a bit about OpenAI, Cohere and Tune the Model APIs. But I couldn't find any decent research about the accuracy of their models and price-quality analysis based on it. Has anyone done, found such research, or is it impossible to do it at all? There's a lot of buzz about content generation, but there are no independent analytics??? If there is no research, can you recommend a tool/tools based on your experience? submitted by /u/alexlash [link] [comments]  ( 104 min )
    [D] Is big model the direction of Strong Artificial intelligence in the feature?
    The big model has so powerful ability, text-to-picture, text-to-video, and so on. I thought we could achieve AGI after we can explain the model, but, when I get a brief understanding of model interpretation, I think It doesn’t work because explaining the big model is so hard. If not big model, what is the direction of AGI? submitted by /u/waa007 [link] [comments]  ( 106 min )
    [D] Hidden unit connected to each other in a single layer
    I have to trying to wrap my head around this neural network for performing binary classification, The first layer: Input layer The second layer: Hidden layer The third layer: output layer The first hidden unit in hidden layer: The summation of weight and input will be passed into the sigmoid function, how to deal with the one input coming from the second hidden unit? Because we don't know the input from the hidden unit. Can anyone help on how to deal with this? https://preview.redd.it/z2jn1s2ofer91.png?width=835&format=png&auto=webp&s=12cf63c98a93886ab95c4572fb511182bf3aef05 submitted by /u/abystoma [link] [comments]  ( 118 min )
    BlenderBot Developers for hire? [D]
    People, ​ I am wondering whether there are any BlenderBot developers for hire? - am I likely to find them here? - if not, where? ​ Thanks, Phil. submitted by /u/philip_rhoades [link] [comments]  ( 103 min )
    [R] natural and expressive motion generation for digital humans with text-to-motion: "a person turns to his right and paces back and forth"
    submitted by /u/SpatialComputing [link] [comments]  ( 105 min )
    [N] New BetaML v0.8: model definition, hyperparameters tuning and fitting in 2 lines
    Dear ML community, I'm pleased to announce BetaML v0.8. The Beta Machine Learning Toolkit is a package including many algorithms and utilities to implement machine learning workflows in Julia, with a detailed tutorial on its usage from Python or R (no wrapper packages are needed) and an extensive interface to MLJ. Aside from the support of the standard mod = Model([Options]), fit!(mod,X,[Y]), predict(mod,[X]) paradigm for 22 models (see list below) , this version brings the implementation of one of the easiest hyperparameter tuning functionality available on ML libraries. From model definition to tuning, fitting and prediction in just 3 lines of code: julia mod = ModelXX(autotune=true) # --> control autotune with the parameter `tunemethod` fit!(mod,x,[y]) # --> autotune happens here tog…  ( 106 min )
    [D] Gpu for machine translation
    Gpu for machine translation Soo, i want to make machine translation rig for me to make my work easier. I work as translator and use currently using google api to reduce my workload. But my country have very few people so development of google translate is extremely bad. I had to fix some easiest sentenses like "Goodnight" since GT translate it wrong. That's why I decided to make my own translation system and use my own translations as base. So what is bare minimum required gpu for at least 10.000 pages of translations? Currently I'm considering p106-100, rx 580, 1060 6gb. I think these materials are enough, but let me know if it's not. submitted by /u/wrsage [link] [comments]  ( 105 min )
    [N] Electric vehicules charging station hierarchical forecasting hackathon
    For those interested to learn more and go beyond the EV hype, the Smarter Mobility Data Challenge propose you to tackle one of its most critical assets to manage, charging stations! The challenge is planned to last 2 months, with some with webinars session with EV load experts in order to discover the ins & outs of this particuliar domain. More details on registration is available on Codalab platform (Discord server included as well). Note that part of this hackathon is targeted for students from European institutions, but as all online hackathon, nothing prevent any users to use it as learning experience in the domain or methodology (this is quite an uncommon case of hierarchical forecasting problem). Here's a small extract behind this hackathon motivation: Transport represents almos…  ( 116 min )
    [D] Most Popular AI Research Sept 2022 - Ranked Based On GitHub Stars
    submitted by /u/cloud_weather [link] [comments]  ( 104 min )
    Do companies/teams accept ppl coming from a completely different field into AI or ML? [D]
    Will companies accept ppl coming from while different domain or background to ML or AI field? Fresh grad been working as a Production support and Release and deploy engineer for 2.5 years now. I'm learning about ML daily doing side projects getting my hands dirty, etc what not to get into ML career. But how do I convince recruiters that I'm a good fit so he can pass on my resume to the managers ?? Pretty sure if I apply on company career website I won't even get shortlisted since mye previous experience would be completely different from what I'm applying for. Let me know how you guys made it, would be really helpful. Every suggestion is welcome. submitted by /u/ritheshgirish9 [link] [comments]  ( 107 min )
    [P] stablediffusion-infinity: Outpainting with Stable Diffusion on an infinite canvas
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 105 min )
    [D] DreamBooth Stable Diffusion training in 10 GB VRAM, using xformers, 8bit adam, gradient checkpointing and caching latents.
    Code: https://github.com/ShivamShrirao/diffusers/tree/main/examples/dreambooth Colab: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb https://preview.redd.it/rj70zdpqqar91.png?width=1009&format=png&auto=webp&s=940710714f058f0e0e9707e19e119c79ed7f3ce6 Tested on Tesla T4 GPU on google colab. It is still pretty fast, no further precision loss from the previous 12 GB version. I have also added a table to choose the best flags according to the memory and speed requirements. ​ fp16 train_batch_size gradient_accumulation_steps gradient_checkpointing use_8bit_adam GB VRAM usage Speed (it/s) fp16 1 1 TRUE TRUE 9.92 0.93 no 1 1 TRUE TRUE 10.08 0.42 fp16 2 1 TRUE TRUE 10.4 0.66 fp16 1 1 FALSE TRUE 11.17 1.14 no 1 1 FALSE TRUE 11.17 0.49 fp16 1 2 TRUE TRUE 11.56 1 fp16 2 1 FALSE TRUE 13.67 0.82 fp16 1 2 FALSE TRUE 13.7 0.83 fp16 1 1 TRUE FALSE 15.79 0.77 Might also work on 3080 10GB now but I haven't tested. Let me know if anybody here can test. submitted by /u/0x00groot [link] [comments]  ( 104 min )
    "[N]" Brainchop V1.4.0
    Brainchop win TF Community Sportlight Award Github: https://github.com/neuroneural/brainchop https://preview.redd.it/2xpva4q9kar91.png?width=576&format=png&auto=webp&s=b0fd34e6f46043d432321541d373bd94d830a864 submitted by /u/Character-Rip-5824 [link] [comments]  ( 103 min )
    [P] Interactive Map of ICLR 2023 Submissions
    Here is a map of all submissions to ICLR 2023, organized by abstract contents: https://atlas.nomic.ai/map/01ff9510-d771-47db-b6a0-2108c9fe8ad1/3ceb455b-7971-4495-bb81-8291dc2d8f37 You can share hot-links to any part of the map by clicking the bottom button in the toolbar. Share anything interesting you can find! EDIT: By request here is a map of ICLR 2018-2023, you can get access to make your own maps here! submitted by /u/NomicAI [link] [comments]  ( 103 min )
  • Open

    Can variable ratios as new variable help a NN
    Let's think of an LSTM or a simple Fully Connected Neural Network. Let's say I have variables X and Y to predict Z. But because I know the real problem, I know that X/Y is an important number to look at. General question is: Is it a good idea to add the result of some function f(x,y) as an input variable to this NN? submitted by /u/CommunityBrave822 [link] [comments]  ( 102 min )
    Idea: create all the monty python personalities gleaned from deep learning of all their interviews
    In effect not actually trying to be successful but as a benchmark to show how close replicating the members and their behaviors AI can come in mannerisms, speech, delivering of lines, presentation of skits, common pairing like old ladies discussing stupid observations, interrupting conversations with stupid shorts terry Gilliam might imagine. submitted by /u/KiernanHolland [link] [comments]  ( 102 min )
  • Open

    Making flags in Unicode
    I recently found out [1] that the Unicode sequences for flag emoji are created by taking the two-letter country abbreviation (ISO 3166-1 alpha-2) and replacing both letters with their counterparts in the range U+1F1E6 through U+1F1FF. For example, the abbreviation for Canada is CA, and the characters 🇨 (U+1F1e8) and 🇦 (U+1F!E6) together create 🇨🇦. […] Making flags in Unicode first appeared on John D. Cook.  ( 4 min )
  • Open

    Wiggling toward bio-inspired machine intelligence
    Inspired by jellyfish and octopuses, PhD candidate Juncal Arbelaiz investigates the theoretical underpinnings that will enable systems to more efficiently adapt to their environments.  ( 7 min )
  • Open

    Constraining Representations Yields Models That Know What They Don't Know. (arXiv:2208.14488v2 [cs.LG] UPDATED)
    A well-known failure mode of neural networks is that they may confidently return erroneous predictions. Such unsafe behaviour is particularly frequent when the use case slightly differs from the training context, and/or in the presence of an adversary. This work presents a novel direction to address these issues in a broad, general manner: imposing class-aware constraints on a model's internal activation patterns. Specifically, we assign to each class a unique, fixed, randomly-generated binary vector - hereafter called class code - and train the model so that its cross-depths activation patterns predict the appropriate class code according to the input sample's class. The resulting predictors are dubbed total activation classifiers (TAC), and TACs may either be trained from scratch, or used with negligible cost as a thin add-on on top of a frozen, pre-trained neural network. The distance between a TAC's activation pattern and the closest valid code acts as an additional confidence score, besides the default unTAC'ed prediction head's. In the add-on case, the original neural network's inference head is completely unaffected (so its accuracy remains the same) but we now have the option to use TAC's own confidence and prediction when determining which course of action to take in an hypothetical production workflow. In particular, we show that TAC strictly improves the value derived from models allowed to reject/defer. We provide further empirical evidence that TAC works well on multiple types of architectures and data modalities and that it is at least as good as state-of-the-art alternative confidence scores derived from existing models.  ( 3 min )

  • Open

    Disciple- Ray Volpe "Laserbeam" visuals by AI Manifest
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 102 min )
    Golden Body Suit, turned out cazy cool looking!
    submitted by /u/Eradication0 [link] [comments]  ( 102 min )
    Books on and about artificial intelligence
    Your favourite book on this topic? Both fiction and nonfiction. submitted by /u/tiddu [link] [comments]  ( 102 min )
    A few weeks ago I had an opportunity to give a presentation at Harvard Business School for moral and ethical challenges with Artificial Intelligence. If you are interested, here is the link to the video
    submitted by /u/akhtarabas [link] [comments]  ( 86 min )
    why are we not seeing more AI for boring, repetitive or technical work with lack of supply?
    Dall E 2 and it's image generation competitors like midjourney and stable diffusion really opened my eyes and made me feel like ai is coming faster then expected. Maybe it's because I am more interested in the creative field but I havent heard much about ai in other fields... Having a program that can generate images just by describing them in any style I want is cool but it's not something I think anyone really needs and it will just take jobs away from artists in an already precarious industry with excessive supply. Now why isn't Ai creation being focused on making the stuff most of us human need but don't enjoy doing. That's what tools are for aren't they? Why can't AI do a business plan, take care of taxes or manage finances, reply to boring work emails, write me a report, make a website, organize my calendar, diagnose me when I feel sick etc I feel like the creative fields are some of the last that need to be automated since alot of people enjoy the process of creation even more then the results and almost no one is crying out for AI to come and save art. In a utopia ai would be used to automate all the parts of work that we don't enjoy but are necessary so we could all focus on doing what we love. submitted by /u/Bsides9 [link] [comments]  ( 86 min )
    High fashion campaigns with Stable Diffusion
    submitted by /u/Straight_Soil_747 [link] [comments]  ( 102 min )
    🚨NEW COMIC ALERT 🚨 Issue 1 of my comic “Animoia” is available for pre-order now on Amazon! Art made in Midjourney. Pre-order now for digital release October 5!
    Join George Elmgrove and The Lavender Society as they tackle their biggest threat yet! Experience the Forth Disaster as The Lavender Society take down terrifying monsters that seek to destroy the Earth! submitted by /u/Ideal-Typical [link] [comments]  ( 102 min )
    Twemotion: a web app/bot to automatically measure emotions of tweets and compare them to (appropriate) news headlines
    Twemotion.com is web app/bot that measures emotions of users on Twitter, using the most popular trending topics. Emotions are classified into categories of: fear, happy, sad, angry, excited, bored, and are automatically calculated (the output being a score of 0–100) on a daily basis by taking a random sampling of the most popular trending topics and tweets associated with them (by country). The data is broken down into 5 categories according to location: worldwide, USA, Canada, UK, and Australia. These countries were chosen because their citizens primarily tweet in English, which is the only language analyzed at this time. Daily emotions from the specified country’s top news headlines are also included to show (graph/table) how events may be portrayed differently in the media compared to h…  ( 104 min )
    AI in Web Design and Development
    AI Implemented in Web Design With the rise of Artificial Intelligence in the last couple of years, AI writing and AI imagery have become quite common nowadays. Multiple companies and software like such have successfully integrated AI into their products because AI is definitely making work faster and sometimes even better than people in a lot of cases. So my question is, what is stopping companies from implementing AI in web design? Apologies if the question seems naive. I'm neither a design expert nor an AI expert. Just a student who was exploring these fields. So I was wondering whether AI can boost their design industry in Web development. Can't AI be used to improve landing pages or even create them? If AI text writing, AI painting and AI illustrations software are all readily available, how hard will it be to implement the same concept for web designing? TIA for the insight! submitted by /u/SheikhSahb [link] [comments]  ( 105 min )
    Youtube Video: A new political party in Denmark has its policies decided by an AI
    submitted by /u/geo_what [link] [comments]  ( 103 min )
    9 Minutes & 30 Seconds of 100% Stable Diffusion AI Art Glory 💐 | Rain & Piano for a more relaxing experience 💖
    submitted by /u/ArtifulDream [link] [comments]  ( 102 min )
    [Beta Release] Character.AI Beta released
    submitted by /u/roblox22y [link] [comments]  ( 102 min )
    Elon Musk Reveals Tesla Optimus AI Robot | New Meta Text To Video AI
    submitted by /u/kenickh [link] [comments]  ( 102 min )
    What advantages do humans have over AI in terms of intelligence and mental capabilities?
    I was wondering what AI struggle to surpass human minds at. We always heard about how AI performs better than human brain in certain areas, but what about the areas it doesn’t? submitted by /u/Seven1s [link] [comments]  ( 108 min )
    Tesla Bot Announced
    Tesla Bot Announced - https://medium.com/@wmindramalw/tesla-bot-announced-everything-about-the-tesla-bot-8e19d7106be7 submitted by /u/iNdramal [link] [comments]  ( 102 min )
    What do you think about resume writing services? Especially for roles in AI
    View Poll submitted by /u/the_scientist-7367 [link] [comments]  ( 118 min )
    Turns out AI is great at fanfic mashups (colored text generated by GPT3)
    submitted by /u/weeeh [link] [comments]  ( 102 min )
    What's the best AI site for generating an image description?
    submitted by /u/DecIsMuchJuvenile [link] [comments]  ( 86 min )
    Tesla AI Day 2022
    submitted by /u/6x9isreally42 [link] [comments]  ( 102 min )
  • Open

    What is the use of learning the state-value in D3QN?
    I am taking this from the official paper - "The advantage of the dueling architecture lies partly in its ability to learn the state-value function efficiently. With every update of the Q values in the dueling architecture, the value stream V is updated – this contrasts with the updates in a single-stream architecture where only the value for one of the actions is updated, the values for all other actions remain untouched. This more frequent updating of the value stream in our approach allocates more resources to V , and thus allows for better approximation of the state values, which in turn need to be accurate for temporal difference-based methods like Q-learning to work (Sutton & Barto, 1998)." ​ I think I am getting stuck on how a neural network behaves. Regardless, of how well the network learns the state value, the network still predicts the state-action value. So, what is the use of learning the state value function well? submitted by /u/Academic-Rent7800 [link] [comments]  ( 103 min )
    how to estimate the transition model and reward function?
    I am trying to use dynamic programming (policy iteration) for research purposes. The environment is stochastic. The state and action spaces are discrete. Things i have done so far. I am sampling the environment using a random policy and saving the transition data (s, a, s_n,r). Dataset size is 50,000 episodes having 1000 time steps each. Build reward function R(s,a) by averaging the reward observed by taking action a on state s over all the dataset. Set unexplored state action pair rewards to min/max reward. Build transition probability model P(s_n|s, a) by counting number of time specific (s_n,s, a) and (s, a) are in the dataset and dividing them. Set unexplored parts to 0/1. Train using policy iteration and test online. I am getting rewards which are slightly better than random policy but nowhere near the rewards that i want to get (which i was easily able to achieve using PPO). I am pretty sure that the transition model and reward function are not estimated properly. I am a beginner in this field and have no idea how it's normally done. Any idea how the estimation is normally done? submitted by /u/ZIGGY-Zz [link] [comments]  ( 103 min )
    Reinforcement Learning in Predator/Prey Simulation
    submitted by /u/enspiralart [link] [comments]  ( 102 min )
    "Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective", Ghugare et al 2022
    submitted by /u/gwern [link] [comments]  ( 102 min )
    DDPG for MuJoCo Inverted Pendulum
    Keras.io has an example DDPG algorithm for OpenAI gym's Classic Control Pendulum and I'm trying to apply it to MuJoCo Inverted Pendulum, but it is not working. The problem is essential the same except for the reward environment. Classic Pendulum reward is a combination of pendulum angle and velocity, with an upright pendulum with zero velocity having the max reward of zero. MuJoCo Inverted Pendulum rewards +1 for every time-step that the pendulum remains between a certain angle. This difference in reward structure is the only thing I can think of that's meaningfully different. Anyone have insight here that could help me out? Keras DDPG example https://keras.io/examples/rl/ddpg_pendulum/ Classic Control - Pendulum https://www.gymlibrary.dev/environments/classic_control/pendulum/ MujoCo - Inverted Pendulum https://www.gymlibrary.dev/environments/mujoco/inverted_pendulum/ submitted by /u/insignificantBeing0 [link] [comments]  ( 104 min )
    "Dropout Q-Functions for Doubly Efficient Reinforcement Learning", Hiraoka et al 2021
    submitted by /u/gwern [link] [comments]  ( 102 min )
    "Randomized Ensembled Double Q-Learning: Learning Fast Without a Model", Chen et al 2021
    submitted by /u/gwern [link] [comments]  ( 102 min )
    "Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics", Kuznetsov et al 2020 {Samsung}
    submitted by /u/gwern [link] [comments]  ( 102 min )
  • Open

    [D] 2060 RTX vs. 3060 RTX: Tensor and Cuda Core Selection
    Friends, Would appreciate some insight/guidance in choosing the optimal GPU for general training purposes against some constraints I won't delve into at much detail. I run a bare metal hypervisor on a Dell R820 and plan to perform GPU passthrough and have some constraints which restrict me to either a 3060 RTX or 2060 RTX. Cost isn't an issue Card Memory Tensor Cores Cuda Cores Core Boost 2060 RTX 12GB 240 1920 1365mhz 1680mhz 3060 RTX 12GB 112 3584 1320mhz 1780mhz Considerations: 2060 has more tensor cores, however 3060 Ampere represents 50% faster per tensor core operations than Turing. For tensor cores, including clock speeds, I think the 2060 slightly has the edge or might be equivalent? The 3060 clearly wins with CUDA cores I'm likely turd polishing, however I am leaning towards the 3060 on account of longer term support for libraries. I also don't have experience with either card, so don't know if the additional 3060 CUDA cores will make a major difference in Tensorflow/PyTorch. What's your recommendation to maximize value and future reuse for general purpose training? Thank you in advance and have a splendid weekend. submitted by /u/et_tu_brutits [link] [comments]  ( 119 min )
    [D] How to solve this problem?
    I have two entities for which I'm trying to create a mapping, User Age Location Background (multi-label) Favorite author (multi-label) Preferred book genre (multi-label) Book Abstract Author Genre I have to create a probability score on how probable a book is picked based on the user configuration. The user is shown 3 books and only one is picked. There is no single user history, all the data is one time matching between a user and a book, I have around 10k samples. What kind of architecture would I use to train a model for this kind of matching? Note: I already tried training a bert-based model where I concatenate the user details into unique tokens with the book abstract, genre, author to create a semantic mapping between them. Issue with this method is, I am not able to input how many historical matches are there for a user with exactly same configuration and the same book, how do I input this historical information into the model? submitted by /u/inginx [link] [comments]  ( 103 min )
    [Project] Text to Video with Stable Diffusion
    blue stability, fork of stability-sdk, adds a bash cli for checkpointing and automation, like this script: blue_stability text_to_video \ https://www.gutenberg.org/cache/epub/51833/pg51833.txt \ url,~dryrun,frame_count=100,marker=PART \ --seed 43 \ --start_schedule 0.9 https://i.redd.it/awefb97wr8r91.gif submitted by /u/Ill_Exercise5106 [link] [comments]  ( 103 min )
    [R] An easy-to-read preprint on Fake News Detection during US 2016 elections - Accuracy of 95%+
    The US 2016 elections is a common dataset that has attracted many people to research this using machine learning - so I decided to give it a go. The classifier I used is actually the simplest one - a Naive Bayesian classifier has been used. Surprisingly we got a higher accuracy than all the past publications on the same dataset could achieve - even though it was a simple classifier - the catch according to me was the selection of the right attributes to make it happen. We paid attention to the metadata of the news publications and in particular, the month of publication was by itself the most informative attribute when it came to classifying the news as fake. I would allow the readers to make their own conclusions on basis of the finding. The accuracy was 95.38%. I am sure that on further digging up, higher accuracy can be achieved. The preprint can be found here, it is open-access: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4074884 ​ Thanks! submitted by /u/loosefer2905 [link] [comments]  ( 86 min )
    [Discussion] If we had enough memory to always do full batch gradient descent, would we still need rmsprop/momentum/adam?
    Do optimization techniques like Adam exist primarily to overcome the noise created by mini-batch gradient descent, or would they be beneficial even if we were doing full batches every time? submitted by /u/029187 [link] [comments]  ( 105 min )
    [D] Focal loss - why it scales down the loss of minority class?
    The equation of α-balanced focal loss (binary in this case for simplicity) is given by: https://preview.redd.it/39hgb62728r91.png?width=718&format=png&auto=webp&s=8064189fe0dcd7dc4a04b24bd8acc837d12240ea What puzzles me is that it seems like weighing used here is opposite to what is intuitive when dealing with imbalanced datasets: normally you would scale the loss of class 1 (minority - foreground objects in case of object detection) higher than the class 0 (majority - background). However what happens here is that we scale class 1 by 0.25, and class 0 by 0.75. Is this behavior explained anywhere? I don't think I'm getting the foreground/background labels wrong, as I've looked into multiple implementations, as well as the original paper. Or maybe am I missing some crucial detail? Paper for reference: https://arxiv.org/abs/1708.02002 submitted by /u/Lugi [link] [comments]  ( 86 min )
    [D] Things to do for effective ML teamwork at an early stage startup
    I was reading this blog on "Effective ML Teamwork at an Early Stage Startup". More there is in the post but in short, it says, quoting from the post Don't create APIs for ML, just copy&paste. Test every line. If you aren't sure about the design, test first. Always keep your experiments reproducible (lineage, data, code, baseline). Document everything. Be clear, and avoid abbreviations. Reading the post, especially the parts about APIs, doesn't make sense to me, but I wonder what other ML professionals think about ML teamwork practices and what they do at their companies if you don't mind discussing them here. submitted by /u/coinfelix [link] [comments]  ( 104 min )
    [D] What type of entry-level job for ML career path with strong statistic background with no SQL experience ?
    Hi, I own a master in computational neuroscience and have extensive training in statistics, modelling and programming, have a couple AI projects but have no SQL experience. My intial strategy to get in the door was to look for Data Analyst jobs but they all require SQL and Tableau/Power BI skills and experience, which I havent, and seem to involve a very different set of skills and day-to-day activities where the position would not allow me to work on expanding statistical modelling or AI related programming skills and knowledge. What kind of job would you advice me to look for that are more in line with my skills in order to pursue a career in ML and expand ML skills ? Thank you very much! submitted by /u/NoHalfMeasures33 [link] [comments]  ( 104 min )
    [D] Uni's prestige vs best match programme
    As I am about to take part in the upcoming master's admissions, I am wondering whether to go with a NLP/Computational linguistics or a more "general" machine learning one. The only unis I can afford to go to are within the EU. My undergraduate thesis, as well as my current work as a research scientist, focus solely on NLP. This is definitely the branch that I would like to pursue. The issue is that the unis that offer NLP/CL master's in the EU are scarce and relatively low ranked (as per QS ranking: University of Copenhagen 69th, University of Saarland 447th), whereas there are numerous prestigious unis offering general ML (as per QS: ETH 8th, EPFL 14th). My goal is to get admitted to a top-notch PhD NLP/CL programme after the master's. I know that the prestige of the previously attended university plays a relevant role in the application process for a PhD. For this reason, I am unsure whether I should go for a well-known ML master's, or a no-name NLP/CL one. If there is any information regarding this matter you could share with me, please do so. submitted by /u/Ok-Experience5604 [link] [comments]  ( 86 min )
    [D] Medium Article: Adaptive Learning for Time Series Forecasting
    I've just published my recent article about time-series forecasting in Towards AI publication. There is no need to say the importance of time series forecasting applications in various industries from Energy to Healthcare, etc. Therefore, let’s go to the point directly. One of the complex and difficult challenges that we can face while working on time series datasets is their variety in statistical features, which can lead to shifts in their distributions and, consequently, various behaviors that make them difficult to understand by models. This article provides a two-stage model to deal with Temporal Covariate Shift (TCS); we call it ADaRNN (combination of Adaptive Learning and RNN) to make it easy. You can find a thorough explanation of all sections; simultaneously, you can see the mathematical formulation for a better understanding. Frankly, this is the first time we can work on time series datasets from the distribution perspective. Please share this article with those you think would find this helpful. link: https://pub.towardsai.net/adaptive-learning-for-time-series-forecasting-b34e640b865b submitted by /u/rezayazdanfar [link] [comments]  ( 104 min )
    [P] If you needed to choose a GPU cloud service to train your models, what would be most important to you?
    View Poll submitted by /u/GogetTheOliveOil [link] [comments]  ( 118 min )
    [P] txtai 5.0 released - build semantic search graphs
    ​ https://preview.redd.it/at4hwmr676r91.png?width=720&format=png&auto=webp&s=31fce0c30860aee480aca79bce3654ed24b3a3b8 txtai 5.0 is a major new release. This release adds the semantic graph along with enabling external integrations. It also adds a number of improvements and bug fixes. Semantic graphs, also known as knowledge graphs or semantic networks, build a graph network with semantic relationships connecting the nodes. Semantic graphs in txtai can be used for topic modeling, graph traversal and analysis. Semantic Graph https://preview.redd.it/mqsz4ge576r91.png?width=720&format=png&auto=webp&s=21fc8de00747a1c06ef92a9923cc5efc61fa8f25 Semantic graphs, also known as knowledge graphs or semantic networks, build a graph network with semantic relationships connecting the nodes. In txtai, they can take advantage of the relationships inherently learned within an embeddings index. This opens exciting possibilities for exploring relationships, such as topics and interconnections in a dataset. Semantic graphs in txtai can be used for topic modeling, graph traversal and analysis. Check out the following links for more. Article | Notebook External integrations https://preview.redd.it/u5frd53l76r91.png?width=720&format=png&auto=webp&s=0d56086e103fb5d495a32031a1e2d7df7d67d923 Want to run Weaviate as your txtai vector database, PostgreSQL for database storage and Neo4j for graphs? 5.0 makes it easier to integrate external vector engines, databases and graph stores. Check out the following links to explore how modular embeddings index components can be connected together. Article | Notebook Read More: Release Announcement - https://medium.com/neuml/whats-new-in-txtai-5-0-e5c75a13b101 Release Notes - https://github.com/neuml/txtai/releases/tag/v5.0.0 submitted by /u/davidmezzetti [link] [comments]  ( 104 min )
    [R] Looking for a survey on text summarization techniques
    I'm looking for a survey paper or a kind of resource that defines the problem well (text summarization) and discuss the approaches and models developed to solve it so far. Through google, I've got some papers behind paywalls and others that weren't what I'm looking for. So your input/suggestion will definitely help! Thanks in advance! submitted by /u/muhnash [link] [comments]  ( 104 min )
    [D] Building a deep learning imag background remover using Pytorch, u2net, coco dataset?
    Hey, I am a beginner and I am trying to build a background remover. Using the pretrained u2net model works already fine, but not perfectly. Do you think I can improve performance by training it with the coco dataset? Or is it better to do pre- and postprocessing of the images to get better output? submitted by /u/Head_Sell5554 [link] [comments]  ( 104 min )
    [D] Effectively using Levenberg–Marquardt algorithm on neural nets
    Hi there. For a multi input - multi output regression problem, it is known that LM based optimization methods achive nearly same error level with order of magnitude smalled models. We know that LM algorithm is not popular with deep learning, because it does not scale with data and model size. However, I successfully use it in my regression (multi input multi output) problem with relatively small model by dividing training data into batches. There are two different hyper parameters regarding epochs, 1. how many epochs needed for ENTIRE dataset, 2. how many iterations needed on each batch for LM algorithm. So there is double FOR loop. At each epoch, I shuffle the data so each time it trains on different splits. Lastly, I left choice on epoch / iter parameters to grid search and check their success through validation set. Am I missing something, because through this I achieve lower errors with order of magnitude small models (comparing to carefully tuned adam algorithm). When it comes to LM data size is mentioned as a possible problem but batches solves it effectively. submitted by /u/Street_Excitement_14 [link] [comments]  ( 106 min )
    [D] What's the most bare bones C++ cloud computing framework?
    What's the most bare bones C++ cloud computing framework? I've been very confused about the plethora of frameworks. I'm looking for something that's "as close" to a typical C++ API as possible. With as little "funky abstractions" as possible. By "funky abstractions" I mean that it has fruitless GUI tools, pipeline APIs that don't add to much, pointless high-level programming language APIs (e.g. Scala), ... most of which are possibly only used to differentiate from competitors, not in order to make the product more useful. Occasionally I've thought that e.g. Intel oneAPI fits this. submitted by /u/mavavilj [link] [comments]  ( 104 min )
    [D] Why is the machine learning community obsessed with the logistic distribution?
    Some of you reading this might not even realize that most of modern machine learning is based on the logistic distribution. What I'm referring to is the sigmoid function. It's technical name is the logistic function and the version which permeates the ML community is the cumulative distribution function of the logistic distribution with location 0 and scale 1. This little function is used by many to map real numbers into the (0,1) interval which is extremely useful when trying to predict probabilities. I even came across a statement in scikit-learn documentation which astounded me. It indicates that the log loss is actually named for the logistic distribution because it is the loss function for logistic regression. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_lo…  ( 86 min )
    [D] Any ICLR submission that's got your attention ?
    Seems like Twitter is flooded with text2insert modality work. Anyone stumble upon works other than these heavily talked about submissions ? submitted by /u/PaganPasta [link] [comments]  ( 103 min )
    [P] Pokémon text to image, fine tuned stable diffusion model with Gradio UI
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 104 min )
    [D] Is it worth attending Neurips 2022?
    I’m an ML software engineer and my work doesn’t involve a lot of research or publishing papers but I do have an interest in doing some research on my own and focusing some time on exploring new approaches, especially in the NLP space. I was wondering if attending Neurips would be a good stepping stone to see what the research world is like? Are the workshops beneficial? Or is it not worth it if I don’t have an accepted paper? submitted by /u/rudimentarythoughts [link] [comments]  ( 104 min )
  • Open

    Visualizing English and Japanese vowels
    Vowel sounds can be visualized in a two-dimensional space according to tongue position. The vertical axis is runs from open down to closed, and the horizontal runs from front to back. See a linguistics textbook for far more detail. English has five vowel letters, but a lot more than five vowel sounds. Scholars argue about […] Visualizing English and Japanese vowels first appeared on John D. Cook.  ( 6 min )
  • Open

    Elon Musk Reveals Tesla Optimus AI Robot | New Meta Text To Video AI
    submitted by /u/kenickh [link] [comments]  ( 105 min )
  • Open

    How to Automate Data Cleaning, in a Nutshell
    Data scientists spend 80% of their time on data cleaning and exploratory analysis. What if you could automate most of this? What if data scientists spent most of their time on higher level tasks, that better justify the salary? I explain here how to do it. Every Data Set Looks Different To the junior data… Read More »How to Automate Data Cleaning, in a Nutshell The post How to Automate Data Cleaning, in a Nutshell appeared first on Data Science Central.  ( 21 min )

  • Open

    Celebrate over 20 years of AI/ML at Innovation Day
    Be our guest as we celebrate 20 years of AI/ML innovation on October 25, 2022, 9:00 AM – 10:30 AM PT.  The first 1,500 people to register will receive $50 of AWS credits. Register here. Over the past 20 years, Amazon has delivered many world firsts for artificial intelligence (AI) and machine learning (ML). ML […]  ( 4 min )
    AWS Panorama now supports NVIDIA JetPack SDK 4.6.2
    AWS Panorama is a collection of machine learning (ML) devices and a software development kit (SDK) that brings computer vision to on-premises internet protocol (IP) cameras. AWS Panorama device options include the AWS Panorama Appliance and the Lenovo ThinkEdge SE70, powered by AWS Panorama. These device options provide you choices in price and performance, depending […]  ( 4 min )
    Build flexible and scalable distributed training architectures using Kubeflow on AWS and Amazon SageMaker
    In this post, we demonstrate how Kubeflow on AWS (an AWS-specific distribution of Kubeflow) used with AWS Deep Learning Containers and Amazon Elastic File System (Amazon EFS) simplifies collaboration and provides flexibility in training deep learning models at scale on both Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon SageMaker utilizing a hybrid architecture approach. […]  ( 19 min )
    Bundesliga Match Fact Pressure Handling: Evaluating players’ performances in high-pressure situations on AWS
    Pressing or pressure in football is a process in which a team seeks to apply stress to the opponent player who possesses the ball. A team applies pressure to limit the time an opposition player has left to make a decision, reduce passing options, and ultimately attempt to turn over ball possession. Although nearly all […]  ( 8 min )
    Bundesliga Match Fact Win Probability: Quantifying the effect of in-game events on winning chances using machine learning on AWS
    Ten years from now, the technological fitness of clubs will be a key contributor towards their success. Today we’re already witnessing the potential of technology to revolutionize the understanding of football. xGoals quantifies and allows comparison of goal scoring potential of any shooting situation, while xThreat and EPV models predict the value of any in-game […]  ( 8 min )
    Unified data preparation, model training, and deployment with Amazon SageMaker Data Wrangler and Amazon SageMaker Autopilot – Part 2
    Depending on the quality and complexity of data, data scientists spend between 45–80% of their time on data preparation tasks. This implies that data preparation and cleansing take valuable time away from real data science work. After a machine learning (ML) model is trained with prepared data and readied for deployment, data scientists must often […]  ( 9 min )
  • Open

    DREAMBOOTH Troubleshooting Guide
    submitted by /u/PuppetHere [link] [comments]  ( 102 min )
    Nick Colosimo discusses Artificial Intelligence Beyond Silicon
    submitted by /u/timothy-ventura [link] [comments]  ( 103 min )
    Anyone else tired of "AI-generated " posts?
    submitted by /u/TheNovicePhilomath [link] [comments]  ( 104 min )
    I reimagined a music video I created using 'AI Technology'. It's taken many hours but I think it's turned out pretty rad. What do you guys think?
    submitted by /u/6Witchy9 [link] [comments]  ( 102 min )
    I feel like the only artist who actually likes creative AI
    Social media in the art realm has just been overrun with constant hate for AI. People saying that it's stealing, "not real art," "not real artists," lazy, dystopian, evil, and just downright immoral. These takes are so baffling to me, because all I see is artists bashing a new artistic medium. Just like what people used to do with digital art. As an artist (not to humble brag) but I worked goddamn hard to do what I do! I practiced really fucking hard and didn't give up. At the same time, I am immensely privileged to have had the time to practice, study, and money spent on classes growing up, and on quality supplies. But not everyone has that. Those people are deserving of having a creative outlet like everyone else. Not just any creative outlet, but one that makes it easy, free, and tha…  ( 111 min )
    Bot Name Generator – here's a free tool to generate names for your chatbot
    Hi there! My team recently created one piece of content, and I'm so excited to share it with you! Bot Name Generator! It is for everyone who wants their chatbot to engage from the moment it introduces itself. You can generate names based on your industry and personality traits. For example, Creative + Travel Industry = Bethany, Mallory, Beep Boop, Anne Droid and more :) Besides, on this page, you will see instructions, tips, recommendations, and helpful guidance on how to name your chatbot based on your industry. People have different expectations when talking to an e-commerce bot and a healthcare virtual assistant. So, if you need some professional opinion, check out this page with ideas and real-world examples! submitted by /u/Avandegraund [link] [comments]  ( 102 min )
  • Open

    [Discussion] EVA: Analyzing Movies using Deep Learning
    Hey! We are developing EVA, a new open-source database system for analyzing movies using deep learning models. EVA allows us to do "deep analysis" of movies by using computer vision models to "look" at the actors in every frame, and even extract their emotions. Here are the results of an emotion detection query over an Interstellar movie scene: Emotion Analysis of Interstellar using EVA database system Emotion Palette of Interstellar scene We are interested in hearing your ideas on video queries you would be interested in. Do check out EVA at: https://github.com/georgia-tech-db/eva Python Notebook: https://github.com/georgia-tech-db/eva/blob/master/tutorials/04-movie-analysis-interstellar.ipynb (works on Colab) Thanks for your time! submitted by /u/sqlcheck [link] [comments]  ( 104 min )
    [R] Preventing mode collapse and overfitting in Seq2Seq transformers
    I have a Seq2Seq dataset where a given target sentence can occur for multiple source sentences. Eg: S1 -> T1 S2 -> T1 S3 -> T1 I tried to use BartForConditionalGeneration and T5 models as Seq2Seq models. But I notice that after a single epoch the model starts collapsing to produce the same sentence as output regardless of input. And this most likely happens because of the repetition. If I simply eleiminate all repetition in my dataset by picking only unique targets and discarding the rest, this mode collapse problem does not happen. Is there a way to train the model on this full dataset and not have it collapse? I tried reducing the learning rate and changing some hyperparameters like gradient clipping and weight decay to prevent overfitting, they don't seem to help. Any advice? Do you know of any papers or existing research that solve this problem? submitted by /u/vikigenius [link] [comments]  ( 104 min )
    [D] Have you ever used deep reinforcement learning on financial data?
    Trading in financial markets seems quite fitting the usual structure of problems that RL can handle. I have used Forex minutely data to train a DQN with experience buffer, but after two hours of training there is no single indication of the network learning anything. The state is the past 30 (ask + bid)/2 prices and the type of current open position (if any) and total net worth. I haven't scaled the input data. I wanted to know about your experiences with utilizing RL in financial context, and if there are any rules of thumb in designing and training such models. Thanks, submitted by /u/Kiizmodo [link] [comments]  ( 104 min )
    [Project] Stable Diffusion image generation as an API
    Stable Diffusion models are now available as an API on Tiyaro Simply search for ‘stable diffusion’ or click here The model card (also shown below) gives you the API endpoint and even sample code in different languages to invoke the API NOTE: If you are interested in trying out the API simply reach out to me and I can grant you the credits needed to use the API https://preview.redd.it/11wysgmd42r91.png?width=2411&format=png&auto=webp&s=b90b1483eef7c4de24903c6ebcd9b1931b6cfe94 submitted by /u/maheshtro [link] [comments]  ( 103 min )
    [D] Has any algorithm managed to perform inductive inferences like humans do?
    I'm trying to test how different RL algorithms perform in the Snake game and I see that all the algorithms I tried do not infer inductively but just memorize strategies by exploring them through random/guided sampling. But none of them is capable to infer inductively the dynamics of the game (i.e. discover that when you eat an apple the length of the snake increases by 1 block). If you let a human play Snake for the first time without knowing anything about the game, it will take only a few games for the human to discover that eating an apple increases the length of the snake. This is fundamental for understanding the game and increases the ability to generalize a policy for unknown states. I don't need to explore states with length 10/11 to know that the snake eats the apple the length will increase to 11. However, RL (and the rest of ML) algorithms don't work like this. Some of them might get to strategies that work by memorizing thousands/millions of situations, but won't be able to inductively extrapolate their knowledge to new situations. So my question, do you know any algorithm that sort of infers inductively strategies/models for unknown situations like the one I described? submitted by /u/XRatorX [link] [comments]  ( 119 min )
    [D] Evaluating Image Generation Intelligence: Did Astral Codex Ten Win His Bet on AI Progress?
    Scott at Astral Codex Ten claims that he already won his bet on the accuracy/quality of image generation models given the current capabilities of Imagen — so I ran a series of human feedback tests to evaluate his victory claim more rigorously. Blog: https://www.surgehq.ai/blog/dall-e-vs-imagen-and-evaluating-astral-codex-tens-3000-ai-bet Curious for all of your opinions as well — do Scott's images pass muster? submitted by /u/BB4evaTB12 [link] [comments]  ( 115 min )
    Image processing[D]
    [D Hey Everyone! I m Computer science and engineering students into final year and dealing with final year project which use image processing and the image format is dicom file (which is known as Magnetic Resonance Imaging). Project is about Automated Knee MRI articular cartilage segmentation and severity Anyone who could help? submitted by /u/Elij_ha [link] [comments]  ( 103 min )
    [D] A Colab to remove noise from audio, preferrably with training on your own data
    I’m creating music with OpenAI Jukebox (link), and the results are full of non-standard noises, which don’t succumb to usual denoising filters. So my idea was to create a relatively small (10–20 examples) set of non-noisy audio (real music) together with the same audio put through Jukebox (without any AI generation, just conversion). Then I would need some neural net to “back train” to remove that kind of noise. Do you think this is doable? If so, is there any Colab or Python library available for this? submitted by /u/vzakharov [link] [comments]  ( 104 min )
    [P] High-performance image generation using Stable Diffusion in KerasCV
    We (KerasCV) launched the world's most performant stable diffusion inference pipeline (as of September 2022). You can assemble it in three lines of code: ![Otter image](https://keras.io/img/guides/generate_images_with_stable_diffusion/generate_images_with_stable_diffusion_23_1.png) keras.mixed_precision.set_global_policy("mixed_float16") model = keras_cv.models.StableDiffusion(jit_compile=True) Check it out! https://keras.io/guides/keras_cv/generate_images_with_stable_diffusion/ submitted by /u/puppet_pals [link] [comments]  ( 104 min )
    [D] From a high level, what's the current status of using neural networks in molecular biology?
    Obviously this is a greedy broad question, but I don't know another way to pose it. I mainly just wanted an entry point into what the research (and commercial if it exists) world is doing and struggling with in molec bio and neural networks. I've dipped my toes in both worlds a little background-wise, but never simultaneously, and wanted to have a base sense of the topology of research/work there so I could do my own further exploring. Happy to take suggestions of good blogs/summary papers/etc! For more specific examples of my interest: I know we've done some work on getting folded protein shape from sequences, but was this practical or just niche applications? I figure we've probably done a lot of work with tagging functional domains in DNA/amino acid sequences? Curious as to what types of models have worked well? Transformers/LSTS/convolutions/etc etc and for what? I imagine there's been some classification work as well on molecules and agonists groups they might fall under. I wonder if we're getting to the point of generative models rather than purely descriptive/predictive ones. Thanks! Happy to take whatever you also find interesting novel in this domain as reading points. submitted by /u/jshkk [link] [comments]  ( 119 min )
    [D] How to find research papers about a specific study?
    Hello, I am new to the research community. I just started working on a research project, and I want to find out if there is a paper written about a specific approach. I tried Google Scholar and Connected papers using the most recent papers relevant to the area I'm planning to work on. However, I am yet to find a way to search for research papers written about a specific approach. Is there any website/ tool I can use for this work? submitted by /u/hirushi_wijesinghe [link] [comments]  ( 105 min )
    [P] Bot Name Generator – here's a free tool to generate names for your chatbot
    Hi there! My team recently created one piece of content, and I'm so excited to share it with you! Bot Name Generator! It is for everyone who wants their chatbot to engage from the moment it introduces itself. You can generate names based on your industry and personality traits. For example, Creative + Travel Industry = Bethany, Mallory, Beep Boop, Anne Droid and more :) Besides, on this page, you will see instructions, tips, recommendations, and helpful guidance on how to name your chatbot based on your industry. People have different expectations when talking to an e-commerce bot and a healthcare virtual assistant. So, if you need some professional opinion, check out this page with ideas and real-world examples! submitted by /u/Avandegraund [link] [comments]  ( 104 min )
    [D] I need graduation project ideas
    Hello reddit, I really need your help, I’m a senior undergraduate student (Telecommunications engineering) , I’ll be starting my final project this year and i haven’t decided the topic yet, i’m interested in AI, machine learning and data science, and I want a graduation project that brings me closer to those topics, any ideas? submitted by /u/Evening-Noise5691 [link] [comments]  ( 103 min )
    [D] Machine learning in medicine: the easy path to professorship?
    Hey, I'm an associate professor at a medical University. I'm a physicist by training and I do some machine learning (not my job but I like to do it). To be clear, I don't do any research on machine learning, I just apply it to problems that come my way. I have this friend who is a post doc. He is biologist by training and from time to time he asks me to apply machine learning to his data. I'm always happy to help him, because it doesn't take long, usually a couple of hours, and I get a co-authorship. This friend of mine, who to be clear has no idea how to do any machine learning or deep learning on his own, has now applied for a associate professor position at our university. The position is titled something with "machine learning in his field". He got the position and I'm happy for him, but it got me thinking. This guy, who has no clue about machine learning, is a professor on machine learning in medicine. Does stuff like that happen frequently? Do you also know similar stories or is that just an outlier? submitted by /u/CountSnort [link] [comments]  ( 122 min )
    [D] Some of the EMNLP 2022 final decisions are released
    Just heard that some of the EMNLP 2022 final decisions are released. You can change the paperID to yours and see the result in the link https://softconf.com/emnlp2022/papers/user/scmd.cgi?scmd=submitPaperCustom&pageid=10000&paperID=1 Note that only some certain tracks have released the results, so if you see We are sorry, but this Submission has not been accepted. doesn't mean rejection at this time, maybe your track hasn't released the decisions, and you don't need to be worried if you see this. If you can only see You need a passcode to submit some material. then congrats! It's accepted. submitted by /u/BwwwS [link] [comments]  ( 105 min )
    [D] Any less-boilerplate framework for Jax/Flax/Haiku?
    I have been looking for a framework/library that can be like/smiliar to PyTorch Lightning. I even checked on Awesome-Jax. Do we have any framework but for Jax/flax/haiku? I mostly need features: Checkpoint saving, reproducibility, and logging. submitted by /u/KingsmanVince [link] [comments]  ( 117 min )
    [R] DreamFusion: Text-to-3D using 2D Diffusion
    Project page: https://dreamfusion3d.github.io/ Paper: https://drive.google.com/file/d/1YC8xQSjxz7r8qyQY6LTuzC9L1AVU5O8V/view?usp=sharing submitted by /u/levng [link] [comments]  ( 103 min )
  • Open

    Has any algorithm managed to perform inductive inferences?
    I'm trying to test how different algorithms perform in the Snake game and I see that all the algorithms I tried do not infer inductively but just memorize strategies by exploring them through random/guided sampling. But none of them is capable to infer inductively the dynamics of the game (i.e. discover that when you eat an apple the length of the snake increases by 1 block). If you let a human play Snake for the first time without knowing anything about the game, it will take only a few games for the human to discover that eating an apple increases the length of the snake. This is fundamental for understanding the game and increases the ability to generalize a policy for unknown states. I don't need to explore states with length 10/11 to know that the snake eats the apple the length will increase to 11. However, RL algorithms don't work like this. Some of them might get to strategies that work by memorizing thousands/millions of situations, but won't be able to inductively extrapolate their knowledge to new situations. So my question, do you know any algorithm that sort of infers inductively strategies/models for unknown situations like the one I described? submitted by /u/XRatorX [link] [comments]  ( 106 min )
    A Recap of Our Interview with Pieter Abbeel on Deep Reinforcement Learning
    submitted by /u/Open_Data_Science [link] [comments]  ( 102 min )
    Atlas with Reinforcement Learning - RL task completed
    ​ Source code is coming next week. submitted by /u/mrmanmachine [link] [comments]  ( 102 min )
    Google AI Introduces A Novel Reinforcement Learning (RL) Training Paradigm, ‘ActorQ,’ To Speed Up Actor-Learner Distributed RL Training
    Several sequential decision-making challenges, like robotics, gaming, nuclear physics, balloon navigation, etc., have been successfully addressed using deep reinforcement learning. However, despite its potential, prolonged training times are one of its limitations. Although the present method for accelerating RL training on challenging problems uses distributed training to scale up to thousands of processing nodes, it still necessitates the employment of substantial hardware resources. This increases the cost of RL training while also having a negative impact on the environment. However, several recent studies show that performance enhancements on already-existing technology can lessen the training and inference processes’ carbon footprints. Similar system optimization strategies that can shorten training times, increase hardware efficiency, and cut carbon dioxide emissions are also advantageous for RL. One method is quantization, which involves converting full-precision floating point (FP32) numbers to lower precision (int8) quantities before calculation. It can reduce the cost and bandwidth of memory storage, enabling quicker and more energy-efficient processing. In order to facilitate the deployment of machine learning models at the edge and to speed up training, quantization has been successfully applied to supervised learning. However, quantization has not yet been used in RL training. Continue reading | Check out the paper and reference article. ​ https://i.redd.it/v68w187x7xq91.gif submitted by /u/ai-lover [link] [comments]  ( 103 min )
  • Open

    How Digital Asset Management is putting firms on the front foot?
    With customer demands growing, regulations around health and safety and net zero increasing, and existing infrastructure aging with each…  ( 9 min )
    Hyper-Automation vs. Robotics Process Automation (RPA)
    Automation has taken much of the business world by storm. For a good reason, it presents the digital transformation journey transition…  ( 10 min )
    The Internal Relation Between Blockchain and Artificial Intelligence
    Artificial Intelligence and blockchain have been two of the most promising technologies in recent years. They are still waiting to be fully…  ( 10 min )
    Dreamforce 2022, Salesforce 20th Event Since Its Founding
    Salesforce.com, the BEST software company in the world. Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 12 min )
  • Open

    Preventing characters from displaying as emoji
    I rarely intentionally use emoji, and yet I often run into them unbidden. This is because some Unicode characters double as emoji. For example, the zodiac symbol for Aries is used both in celestial navigation and in astrology. The latter is much more common, and so when some software sees U+2648 it interprets the character […] Preventing characters from displaying as emoji first appeared on John D. Cook.  ( 5 min )
  • Open

    Distributional Reinforcement Learning via Sinkhorn Iterations. (arXiv:2202.00769v3 [cs.LG] UPDATED)
    Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the entire distribution of the total return rather than only its expectation. The empirical success of distributional RL is determined by the representation of return distributions and the choice of distribution divergence. In this paper, we propose a new class of \textit{Sinkhorn distributional RL~(SinkhornDRL)} algorithm that learns a finite set of statistics, i.e., deterministic samples, from each return distribution and then uses Sinkhorn iterations to evaluate the Sinkhorn distance between the current and target Bellmen distributions. Sinkhorn divergence features as the interpolation between the Wasserstein distance and Maximum Mean Discrepancy~(MMD). SinkhornDRL finds a sweet spot by taking advantage of the geometry of optimal transport-based distance and the unbiased gradient estimate property of MMD. Finally, compared to state-of-the-art algorithms, SinkhornDRL's competitive performance is demonstrated on the suit of 55 Atari games.  ( 2 min )
    Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging. (arXiv:2209.14981v1 [cs.LG])
    Training vision or language models on large datasets can take days, if not weeks. We show that averaging the weights of the k latest checkpoints, each collected at the end of an epoch, can speed up the training progression in terms of loss and accuracy by dozens of epochs, corresponding to time savings up to ~68 and ~30 GPU hours when training a ResNet50 on ImageNet and RoBERTa-Base model on WikiText-103, respectively. We also provide the code and model checkpoint trajectory to reproduce the results and facilitate research on reusing historical weights for faster convergence.  ( 2 min )
    Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently. (arXiv:2205.12808v2 [cs.LG] UPDATED)
    Driven by the empirical success and wide use of deep neural networks, understanding the generalization performance of overparameterized models has become an increasingly popular question. To this end, there has been substantial effort to characterize the implicit bias of the optimization algorithms used, such as gradient descent (GD), and the structural properties of their preferred solutions. This paper answers an open question in this literature: For the classification setting, what solution does mirror descent (MD) converge to? Specifically, motivated by its efficient implementation, we consider the family of mirror descent algorithms with potential function chosen as the $p$-th power of the $\ell_p$-norm, which is an important generalization of GD. We call this algorithm $p$-$\textsf{GD}$. For this family, we characterize the solutions it obtains and show that it converges in direction to a generalized maximum-margin solution with respect to the $\ell_p$-norm for linearly separable classification. While the MD update rule is in general expensive to compute and perhaps not suitable for deep learning, $p$-$\textsf{GD}$ is fully parallelizable in the same manner as SGD and can be used to train deep neural networks with virtually no additional computational overhead. Using comprehensive experiments with both linear and deep neural network models, we demonstrate that $p$-$\textsf{GD}$ can noticeably affect the structure and the generalization performance of the learned models.  ( 3 min )
    Single-Node Attacks for Fooling Graph Neural Networks. (arXiv:2011.03574v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have shown broad applicability in a variety of domains. These domains, e.g., social networks and product recommendations, are fertile ground for malicious users and behavior. In this paper, we show that GNNs are vulnerable to the extremely limited (and thus quite realistic) scenarios of a single-node adversarial attack, where the perturbed node cannot be chosen by the attacker. That is, an attacker can force the GNN to classify any target node to a chosen label, by only slightly perturbing the features or the neighbor list of another single arbitrary node in the graph, even when not being able to select that specific attacker node. When the adversary is allowed to select the attacker node, these attacks are even more effective. We demonstrate empirically that our attack is effective across various common GNN types (e.g., GCN, GraphSAGE, GAT, GIN) and robustly optimized GNNs (e.g., Robust GCN, SM GCN, GAL, LAT-GCN), outperforming previous attacks across different real-world datasets both in a targeted and non-targeted attacks. Our code is available at https://github.com/benfinkelshtein/SINGLE .  ( 3 min )
    Evolutionary Echo State Network: evolving reservoirs in the Fourier space. (arXiv:2206.04951v2 [cs.NE] UPDATED)
    The Echo State Network (ESN) is a class of Recurrent Neural Network with a large number of hidden-hidden weights (in the so-called reservoir). Canonical ESN and its variations have recently received significant attention due to their remarkable success in the modeling of non-linear dynamical systems. The reservoir is randomly connected with fixed weights that don't change in the learning process. Only the weights from reservoir to output are trained. Since the reservoir is fixed during the training procedure, we may wonder if the computational power of the recurrent structure is fully harnessed. In this article, we propose a new computational model of the ESN type, that represents the reservoir weights in the Fourier space and performs a fine-tuning of these weights applying genetic algorithms in the frequency domain. The main interest is that this procedure will work in a much smaller space compared to the classical ESN, thus providing a dimensionality reduction transformation of the initial method. The proposed technique allows us to exploit the benefits of the large recurrent structure avoiding the training problems of gradient-based method. We provide a detailed experimental study that demonstrates the good performances of our approach with well-known chaotic systems and real-world data.  ( 3 min )
    Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making. (arXiv:2209.14997v1 [cs.LG])
    This paper introduces a simple efficient learning algorithms for general sequential decision making. The algorithm combines Optimism for exploration with Maximum Likelihood Estimation for model estimation, which is thus named OMLE. We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples. This rich class includes not only a majority of known tractable model-based Reinforcement Learning (RL) problems (such as tabular MDPs, factored MDPs, low witness rank problems, tabular weakly-revealing/observable POMDPs and multi-step decodable POMDPs), but also many new challenging RL problems especially in the partially observable setting that were not previously known to be tractable. Notably, the new problems addressed by this paper include (1) observable POMDPs with continuous observation and function approximation, where we achieve the first sample complexity that is completely independent of the size of observation space; (2) well-conditioned low-rank sequential decision making problems (also known as Predictive State Representations (PSRs)), which include and generalize all known tractable POMDP examples under a more intrinsic representation; (3) general sequential decision making problems under SAIL condition, which unifies our existing understandings of model-based RL in both fully observable and partially observable settings. SAIL condition is identified by this paper, which can be viewed as a natural generalization of Bellman/witness rank to address partial observability.  ( 3 min )
    D-HYPR: Harnessing Neighborhood Modeling and Asymmetry Preservation for Digraph Representation Learning. (arXiv:2112.11734v2 [cs.LG] UPDATED)
    Digraph Representation Learning (DRL) aims to learn representations for directed homogeneous graphs (digraphs). Prior work in DRL is largely constrained (e.g., limited to directed acyclic graphs), or has poor generalizability across tasks (e.g., evaluated solely on one task). Most Graph Neural Networks (GNNs) exhibit poor performance on digraphs due to the neglect of modeling neighborhoods and preserving asymmetry. In this paper, we address these notable challenges by leveraging hyperbolic collaborative learning from multi-ordered and partitioned neighborhoods, and regularizers inspired by socio-psychological factors. Our resulting formalism, Digraph Hyperbolic Networks (D-HYPR) - albeit conceptually simple - generalizes to digraphs where cycles and non-transitive relations are common, and is applicable to multiple downstream tasks including node classification, link presence prediction, and link property prediction. In order to assess the effectiveness of D-HYPR, extensive evaluations were performed across 8 real-world digraph datasets involving 21 prior techniques. D-HYPR statistically significantly outperforms the current state of the art. We release our code at https://github.com/hongluzhou/dhypr  ( 2 min )
    From Kepler to Newton: Explainable AI for Science Discovery. (arXiv:2111.12210v6 [cs.AI] UPDATED)
    The Observation--Hypothesis--Prediction--Experimentation loop paradigm for scientific research has been practiced by researchers for years towards scientific discoveries. However, with data explosion in both mega-scale and milli-scale scientific research, it has been sometimes very difficult to manually analyze the data and propose new hypotheses to drive the cycle for scientific discovery. In this paper, we discuss the role of Explainable AI in scientific discovery process by demonstrating an Explainable AI-based paradigm for science discovery. The key is to use Explainable AI to help derive data or model interpretations, hypotheses, as well as scientific discoveries or insights. We show how computational and data-intensive methodology -- together with experimental and theoretical methodology -- can be seamlessly integrated for scientific research. To demonstrate the AI-based science discovery process, and to pay our respect to some of the greatest minds in human history, we show how Kepler's laws of planetary motion and Newton's law of universal gravitation can be rediscovered by (Explainable) AI based on Tycho Brahe's astronomical observation data, whose works were leading the scientific revolution in the 16-17th century. This work also highlights the important role of Explainable AI (as compared to Blackbox AI) in science discovery to help humans prevent or better prepare for the possible technological singularity that may happen in the future, since science is not only about the know how, but also the know why. Presentation of the work is available at https://slideslive.com/38986142/from-kepler-to-newton-explainable-ai-for-science-discovery.  ( 3 min )
    Deep Neural Networks for Rank-Consistent Ordinal Regression Based On Conditional Probabilities. (arXiv:2111.08851v4 [cs.LG] UPDATED)
    In recent times, deep neural networks achieved outstanding predictive performance on various classification and pattern recognition tasks. However, many real-world prediction problems have ordinal response variables, and this ordering information is ignored by conventional classification losses such as the multi-category cross-entropy. Ordinal regression methods for deep neural networks address this. One such method is the CORAL method, which is based on an earlier binary label extension framework and achieves rank consistency among its output layer tasks by imposing a weight-sharing constraint. However, while earlier experiments showed that CORAL's rank consistency is beneficial for performance, {it is limited by a weight-sharing constraint in a neural network's fully connected output layer. We propose a new method for rank-consistent ordinal regression without this limitation. Our rank-consistent ordinal regression framework (CORN) achieves rank consistency by a novel training scheme. This training scheme uses} conditional training sets to obtain the unconditional rank probabilities through applying the chain rule for conditional probability distributions. Experiments on various datasets demonstrate the efficacy of the proposed method to utilize the ordinal target information, and the absence of the weight-sharing restriction improves the performance substantially compared to the CORAL reference approach.  ( 3 min )
    Dataset Summarization by K Principal Concepts. (arXiv:2104.03952v2 [cs.CV] UPDATED)
    We propose the new task of K principal concept identification for dataset summarizarion. The objective is to find a set of K concepts that best explain the variation within the dataset. Concepts are high-level human interpretable terms such as "tiger", "kayaking" or "happy". The K concepts are selected from a (potentially long) input list of candidates, which we denote the concept-bank. The concept-bank may be taken from a generic dictionary or constructed by task-specific prior knowledge. An image-language embedding method (e.g. CLIP) is used to map the images and the concept-bank into a shared feature space. To select the K concepts that best explain the data, we formulate our problem as a K-uncapacitated facility location problem. An efficient optimization technique is used to scale the local search algorithm to very large concept-banks. The output of our method is a set of K principal concepts that summarize the dataset. Our approach provides a more explicit summary in comparison to selecting K representative images, which are often ambiguous. As a further application of our method, the K principal concepts can be used to classify the dataset into K groups. Extensive experiments demonstrate the efficacy of our approach.  ( 3 min )
    Gradient flows and randomised thresholding: sparse inversion and classification. (arXiv:2203.11555v2 [math.NA] UPDATED)
    Sparse inversion and classification problems are ubiquitous in modern data science and imaging. They are often formulated as non-smooth minimisation problems. In sparse inversion, we minimise, e.g., the sum of a data fidelity term and an L1/LASSO regulariser. In classification, we consider, e.g., the sum of a data fidelity term and a non-smooth Ginzburg--Landau energy. Standard (sub)gradient descent methods have shown to be inefficient when approaching such problems. Splitting techniques are much more useful: here, the target function is partitioned into a sum of two subtarget functions -- each of which can be efficiently optimised. Splitting proceeds by performing optimisation steps alternately with respect to each of the two subtarget functions. In this work, we study splitting from a stochastic continuous-time perspective. Indeed, we define a differential inclusion that follows one of the two subtarget function's negative subdifferential at each point in time. The choice of the subtarget function is controlled by a binary continuous-time Markov process. The resulting dynamical system is a stochastic approximation of the underlying subgradient flow. We investigate this stochastic approximation for an L1-regularised sparse inversion flow and for a discrete Allen-Cahn equation minimising a Ginzburg--Landau energy. In both cases, we study the longtime behaviour of the stochastic dynamical system and its ability to approximate the underlying subgradient flow at any accuracy. We illustrate our theoretical findings in a simple sparse estimation problem and also in low- and high-dimensional classification problems.  ( 3 min )
    Contrastive Unsupervised Learning of World Model with Invariant Causal Features. (arXiv:2209.14932v1 [cs.LG])
    In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart.  ( 2 min )
    Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning. (arXiv:2111.14585v2 [cs.CV] UPDATED)
    Contrastive representation learning has proven to be an effective self-supervised learning method. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be contrasted with other instances, called negatives, that are considered as noise. However, several instances in a dataset are drawn from the same distribution and share underlying semantic information. A good data representation should contain relations, or semantic similarity, between the instances. Contrastive learning implicitly learns relations but considering all negatives as noise harms the quality of the learned relations. To circumvent this issue, we propose a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE). Our training objective is a soft contrastive learning one. Instead of hard classifying positives and negatives, we estimate from one view of a batch a continuous distribution to push or pull instances based on their semantic similarities. This target similarity distribution is sharpened to eliminate noisy relations. The model predicts for each instance, from another view, the target distribution while contrasting its positive with negatives. Experimental results show that SCE is Top-1 on the ImageNet linear evaluation protocol at 100 pretraining epochs with 72.1% accuracy and is competitive with state-of-the-art algorithms by reaching 75.4% for 200 epochs with multi-crop. We also show that SCE is able to generalize to several tasks. Source code is available here: https://github.com/CEA-LIST/SCE.  ( 3 min )
    No Free Lunch in "Privacy for Free: How does Dataset Condensation Help Privacy". (arXiv:2209.14987v1 [cs.LG])
    New methods designed to preserve data privacy require careful scrutiny. Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a ``privacy-preserving'' method is attacked. A recent work selected for an Outstanding Paper Award at ICML 2022 (Dong et al., 2022) claims that dataset condensation (DC) significantly improves data privacy when training machine learning models. This claim is supported by theoretical analysis of a specific dataset condensation technique and an empirical evaluation of resistance to some existing membership inference attacks. In this note we examine the claims in the work of Dong et al. (2022) and describe major flaws in the empirical evaluation of the method and its theoretical analysis. These flaws imply that their work does not provide statistically significant evidence that DC improves the privacy of training ML models over a naive baseline. Moreover, previously published results show that DP-SGD, the standard approach to privacy preserving ML, simultaneously gives better accuracy and achieves a (provably) lower membership attack success rate.  ( 2 min )
    False Data Injection Threats in Active Distribution Systems: A Comprehensive Survey. (arXiv:2111.14251v2 [cs.CR] UPDATED)
    With the proliferation of smart devices and revolutions in communications, electrical distribution systems are gradually shifting from passive, manually-operated and inflexible ones, to a massively interconnected cyber-physical smart grid to address the energy challenges of the future. However, the integration of several cutting-edge technologies has introduced several security and privacy vulnerabilities due to the large-scale complexity and resource limitations of deployments. Recent research trends have shown that False Data Injection (FDI) attacks are becoming one of the most malicious cyber threats within the entire smart grid paradigm. Therefore, this paper presents a comprehensive survey of the recent advances in FDI attacks within active distribution systems and proposes a taxonomy to classify the FDI threats with respect to smart grid targets. The related studies are contrasted and summarized in terms of the attack methodologies and implications on the electrical power distribution networks. Finally, we identify some research gaps and recommend a number of future research directions to guide and motivate prospective researchers.  ( 3 min )
    Unsupervised Learning From Incomplete Measurements for Inverse Problems. (arXiv:2201.12151v4 [stat.ML] UPDATED)
    In many real-world inverse problems, only incomplete measurement data are available for training which can pose a problem for learning a reconstruction function. Indeed, unsupervised learning using a fixed incomplete measurement process is impossible in general, as there is no information in the nullspace of the measurement operator. This limitation can be overcome by using measurements from multiple operators. While this idea has been successfully applied in various applications, a precise characterization of the conditions for learning is still lacking. In this paper, we fill this gap by presenting necessary and sufficient conditions for learning the underlying signal model needed for reconstruction which indicate the interplay between the number of distinct measurement operators, the number of measurements per operator, the dimension of the model and the dimension of the signals. Furthermore, we propose a novel and conceptually simple unsupervised learning loss which only requires access to incomplete measurement data and achieves a performance on par with supervised learning when the sufficient condition is verified. We validate our theoretical bounds and demonstrate the advantages of the proposed unsupervised loss compared to previous methods via a series of experiments on various imaging inverse problems, such as accelerated magnetic resonance imaging, compressed sensing and image inpainting.  ( 3 min )
    Does Zero-Shot Reinforcement Learning Exist?. (arXiv:2209.14935v1 [cs.LG])
    A zero-shot RL agent is an agent that can solve any RL task in a given environment, instantly with no additional planning or learning, after an initial reward-free learning phase. This marks a shift from the reward-centric RL paradigm towards "controllable" agents that can follow arbitrary instructions in an environment. Current RL agents can solve families of related tasks at best, or require planning anew for each task. Strategies for approximate zero-shot RL ave been suggested using successor features (SFs) [BBQ+ 18] or forward-backward (FB) representations [TO21], but testing has been limited. After clarifying the relationships between these schemes, we introduce improved losses and new SF models, and test the viability of zero-shot RL schemes systematically on tasks from the Unsupervised RL benchmark [LYL+21]. To disentangle universal representation learning from exploration, we work in an offline setting and repeat the tests on several existing replay buffers. SFs appear to suffer from the choice of the elementary state features. SFs with Laplacian eigenfunctions do well, while SFs based on auto-encoders, inverse curiosity, transition models, low-rank transition matrix, contrastive learning, or diversity (APS), perform unconsistently. In contrast, FB representations jointly learn the elementary and successor features from a single, principled criterion. They perform best and consistently across the board, reaching 85% of supervised RL performance with a good replay buffer, in a zero-shot manner.  ( 2 min )
    On the influence of stochastic roundoff errors on the convergence of the gradient descent method with low-precision floating-point computation. (arXiv:2202.12276v2 [cs.LG] UPDATED)
    When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.  ( 2 min )
    Learning Causal Models from Conditional Moment Restrictions by Importance Weighting. (arXiv:2108.01312v2 [econ.EM] UPDATED)
    We consider learning causal relationships under conditional moment restrictions. Unlike causal inference under unconditional moment restrictions, conditional moment restrictions pose serious challenges for causal inference, especially in high-dimensional settings. To address this issue, we propose a method that transforms conditional moment restrictions to unconditional moment restrictions through importance weighting, using a conditional density ratio estimator. Using this transformation, we successfully estimate nonparametric functions defined under conditional moment restrictions. Our proposed framework is general and can be applied to a wide range of methods, including neural networks. We analyze the estimation error, providing theoretical support for our proposed method. In experiments, we confirm the soundness of our proposed method.  ( 2 min )
    Graph Neural Networks in Network Neuroscience. (arXiv:2106.03535v2 [cs.LG] UPDATED)
    Noninvasive medical neuroimaging has yielded many discoveries about the brain connectivity. Several substantial techniques mapping morphological, structural and functional brain connectivities were developed to create a comprehensive road map of neuronal activities in the human brain -namely brain graph. Relying on its non-Euclidean data type, graph neural network (GNN) provides a clever way of learning the deep graph structure and it is rapidly becoming the state-of-the-art leading to enhanced performance in various network neuroscience tasks. Here we review current GNN-based methods, highlighting the ways that they have been used in several applications related to brain graphs such as missing brain graph synthesis and disease classification. We conclude by charting a path toward a better application of GNN models in network neuroscience field for neurological disorder diagnosis and population graph integration. The list of papers cited in our work is available at https://github.com/basiralab/GNNs-in-Network-Neuroscience.  ( 2 min )
    Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning. (arXiv:2206.01342v2 [cs.LG] UPDATED)
    While the empirical success of self-supervised learning (SSL) heavily relies on the usage of deep nonlinear models, existing theoretical works on SSL understanding still focus on linear ones. In this paper, we study the role of nonlinearity in the training dynamics of contrastive learning (CL) on one and two-layer nonlinear networks with homogeneous activation $h(x) = h'(x)x$. We have two major theoretical discoveries. First, the presence of nonlinearity can lead to many local optima even in 1-layer setting, each corresponding to certain patterns from the data distribution, while with linear activation, only one major pattern can be learned. This suggests that models with lots of parameters can be regarded as a \emph{brute-force} way to find these local optima induced by nonlinearity. Second, in the 2-layer case, linear activation is proven not capable of learning specialized weights into diverse patterns, demonstrating the importance of nonlinearity. In addition, for 2-layer setting, we also discover \emph{global modulation}: those local patterns discriminative from the perspective of global-level patterns are prioritized to learn, further characterizing the learning process. Simulation verifies our theoretical findings.
    Revisiting Global Pooling through the Lens of Optimal Transport. (arXiv:2201.09191v2 [cs.LG] UPDATED)
    Global pooling is one of the most significant operations in many machine learning models and tasks, whose implementation, however, is often empirical in practice. In this study, we develop a novel and solid global pooling framework through the lens of optimal transport. We demonstrate that most existing global pooling methods are equivalent to solving some specializations of an unbalanced optimal transport (UOT) problem. Making the parameters of the UOT problem learnable, we unify various global pooling methods in the same framework, and accordingly, propose a generalized global pooling layer called UOT-Pooling (UOTP) for neural networks. Besides implementing the UOTP layer based on the classic Sinkhorn-scaling algorithm, we design a new model architecture based on the Bregman ADMM algorithm, which has better numerical stability and can reproduce existing pooling layers more effectively. We test our UOTP layers in several application scenarios, including multi-instance learning, graph classification, and image classification. Our UOTP layers can either imitate conventional global pooling layers or learn some new pooling mechanisms leading to better performance.
    Algorithms that get old : the case of generative deep neural networks. (arXiv:2202.03008v3 [stat.ML] UPDATED)
    Generative deep neural networks used in machine learning, like the Variational Auto-Encoders (VAE), and Generative Adversarial Networks (GANs) produce new objects each time when asked to do so with the constraint that the new objects remain similar to some list of examples given as input. However, this behavior is unlike that of human artists that change their style as time goes by and seldom return to the style of the initial creations. We investigate a situation where VAEs are used to sample from a probability measure described by some empirical dataset. Based on recent works on Radon-Sobolev statistical distances, we propose a numerical paradigm, to be used in conjunction with a generative algorithm, that satisfies the two following requirements: the objects created do not repeat and evolve to fill the entire target probability distribution.
    GemNet-OC: Developing Graph Neural Networks for Large and Diverse Molecular Simulation Datasets. (arXiv:2204.02782v2 [cs.LG] UPDATED)
    Recent years have seen the advent of molecular simulation datasets that are orders of magnitude larger and more diverse. These new datasets differ substantially in four aspects of complexity: 1. Chemical diversity (number of different elements), 2. system size (number of atoms per sample), 3. dataset size (number of data samples), and 4. domain shift (similarity of the training and test set). Despite these large differences, benchmarks on small and narrow datasets remain the predominant method of demonstrating progress in graph neural networks (GNNs) for molecular simulation, likely due to cheaper training compute requirements. This raises the question -- does GNN progress on small and narrow datasets translate to these more complex datasets? This work investigates this question by first developing the GemNet-OC model based on the large Open Catalyst 2020 (OC20) dataset. GemNet-OC outperforms the previous state-of-the-art on OC20 by 16% while reducing training time by a factor of 10. We then compare the impact of 18 model components and hyperparameter choices on performance in multiple datasets. We find that the resulting model would be drastically different depending on the dataset used for making model choices. To isolate the source of this discrepancy we study six subsets of the OC20 dataset that individually test each of the above-mentioned four dataset aspects. We find that results on the OC-2M subset correlate well with the full OC20 dataset while being substantially cheaper to train on. Our findings challenge the common practice of developing GNNs solely on small datasets, but highlight ways of achieving fast development cycles and generalizable results via moderately-sized, representative datasets such as OC-2M and efficient models such as GemNet-OC. Our code and pretrained model weights are open-sourced.
    Heterogeneous Graph-Based Multimodal Brain Network Learning. (arXiv:2110.08465v5 [cs.LG] UPDATED)
    Graph neural networks (GNNs) provide powerful insights for brain neuroimaging technology from the view of graphical networks. However, most existing GNN-based models assume that the neuroimaging-produced brain connectome network is a homogeneous graph with single types of nodes and edges. In fact, emerging studies have reported and emphasized the significance of heterogeneity among human brain activities, especially between the two cerebral hemispheres. Thus, homogeneous-structured brain network-based graph methods are insufficient for modelling complicated cerebral activity states. To overcome this problem, in this paper, we present a heterogeneous graph neural network (HebrainGNN) for multimodal brain neuroimaging fusion learning. We first model the brain network as a heterogeneous graph with multitype nodes (i.e., left and right hemispheric nodes) and multitype edges (i.e., intra- and interhemispheric edges). Then, we propose a self-supervised pretraining strategy based on a heterogeneous brain network to address the potential overfitting problem caused by the conflict between a large parameter size and a small medical data sample size. Our results show the superiority of the proposed model over other existing methods in brain-related disease prediction tasks. Ablation experiments show that our heterogeneous graph-based model attaches more importance to hemispheric connections that may be neglected due to their low strength by previous homogeneous graph models. Other experiments also indicate that our proposed model with a pretraining strategy alleviates the problem of limited labelled data and yields a significant improvement in accuracy.
    Equivariant maps from invariant functions. (arXiv:2209.14991v1 [stat.ML])
    In equivariant machine learning the idea is to restrict the learning to a hypothesis class where all the functions are equivariant with respect to some group action. Irreducible representations or invariant theory are typically used to parameterize the space of such functions. In this note, we explicate a general procedure, attributed to Malgrange, to express all polynomial maps between linear spaces that are equivariant with respect to the action of a group $G$, given a characterization of the invariant polynomials on a bigger space. The method also parametrizes smooth equivariant maps in the case that $G$ is a compact Lie group.
    Differentiable and Transportable Structure Learning. (arXiv:2206.06354v2 [cs.LG] UPDATED)
    Directed acyclic graphs (DAGs) encode a lot of information about a particular distribution in its structure. However, compute required to infer these structures is typically super-exponential in the number of variables, as inference requires a sweep of a combinatorially large space of potential structures. That is, until recent advances made it possible to search this space using a differentiable metric, drastically reducing search time. While this technique -- named NOTEARS -- is widely considered a seminal work in DAG-discovery, it concedes an important property in favour of differentiability: transportability. To be transportable, the structures discovered on one dataset must apply to another dataset from the same domain. In our paper, we introduce D-Struct which recovers transportability in the discovered structures through a novel architecture and loss function, while remaining completely differentiable. Because D-Struct remains differentiable, our method can be easily adopted in existing differentiable architectures, as was previously done with NOTEARS. In our experiments, we empirically validate D-Struct with respect to edge accuracy and structural Hamming distance in a variety of settings.
    ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models. (arXiv:2204.08790v4 [cs.CV] UPDATED)
    Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public benchmarks. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark and toolkit for evaluating(pre-trained) language-augmented visual models. ELEVATER is composed of three components. (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. (ii) Toolkit. An automatic hyper-parameter tuning toolkit is developed to facilitate model evaluation on downstream tasks. (iii) Metrics. A variety of evaluation metrics are used to measure sample-efficiency (zero-shot and few-shot) and parameter-efficiency (linear probing and full model fine-tuning). ELEVATER is a platform for Computer Vision in the Wild (CVinW), and is publicly released at at https://computer-vision-in-the-wild.github.io/ELEVATER/
    Regularizing Neural Network Training via Identity-wise Discriminative Feature Suppression. (arXiv:2209.14553v1 [cs.CV])
    It is well-known that a deep neural network has a strong fitting capability and can easily achieve a low training error even with randomly assigned class labels. When the number of training samples is small, or the class labels are noisy, networks tend to memorize patterns specific to individual instances to minimize the training error. This leads to the issue of overfitting and poor generalisation performance. This paper explores a remedy by suppressing the network's tendency to rely on instance-specific patterns for empirical error minimisation. The proposed method is based on an adversarial training framework. It suppresses features that can be utilized to identify individual instances among samples within each class. This leads to classifiers only using features that are both discriminative across classes and common within each class. We call our method Adversarial Suppression of Identity Features (ASIF), and demonstrate the usefulness of this technique in boosting generalisation accuracy when faced with small datasets or noisy labels. Our source code is available.
    Variance-Aware Sparse Linear Bandits. (arXiv:2205.13450v2 [cs.LG] UPDATED)
    It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity which is much smaller than the ambient dimension, the worst-case minimax regret is $\widetilde{\Theta}\left(\sqrt{dT}\right)$ where $d$ is the ambient dimension and $T$ is the number of rounds. On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$. In this paper, we present the first variance-aware regret guarantee for sparse linear bandits: $\widetilde{\mathcal O}\left(\sqrt{d\sum_{t=1}^T \sigma_t^2} + 1\right)$, where $\sigma_t^2$ is the variance of the noise at the $t$-th round. This bound naturally interpolates the regret bounds for the worst-case constant-variance regime (i.e., $\sigma_t \equiv \Omega(1)$) and the benign deterministic regimes (i.e., $\sigma_t \equiv 0$). To achieve this variance-aware regret guarantee, we develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits in a "black-box" manner. Specifically, we take two recent algorithms as black boxes to illustrate that the claimed bounds indeed hold, where the first algorithm can handle unknown-variance cases and the second one is more efficient.
    Adversarial confound regression and uncertainty measurements to classify heterogeneous clinical MRI in Mass General Brigham. (arXiv:2205.02885v2 [cs.LG] UPDATED)
    Automated disease detection in neuroimaging holds promise to improve the diagnostic ability of radiologists, but routinely collected clinical data frequently contains technical and demographic confounding factors that cause data to both differ between sites and be systematically associated with the disease of interest, thus negatively affecting the robustness of diagnostic models. There is a critical need for diagnostic deep learning models that can train on such imbalanced datasets without being influenced by these confounds. In this work, we introduce a novel deep learning architecture, MUCRAN (Multi-Confound Regression Adversarial Network), to train a deep learning model on clinical brain MRI while regressing demographic and technical confounding factors. We trained MUCRAN using 17,076 clinical T1 Axial brain MRIs collected from Massachusetts General Hospital before 2019 and demonstrated that MUCRAN could successfully regress major confounding factors in the vast clinical data. We also applied a method for quantifying uncertainty across an ensemble of these models to automatically exclude out-of-distribution data in the AD detection. By combining MUCRAN and the uncertainty quantification method, we showed consistent and significant increases in the AD detection accuracy for newly collected MGH data (post-2019) and for data from other hospitals. MUCRAN offers a generalizable approach for heterogenous clinical data for deep-learning-based automatic disease detection.
    Denoising Diffusion Probabilistic Models for Styled Walking Synthesis. (arXiv:2209.14828v1 [cs.CV])
    Generating realistic motions for digital humans is time-consuming for many graphics applications. Data-driven motion synthesis approaches have seen solid progress in recent years through deep generative models. These results offer high-quality motions but typically suffer in motion style diversity. For the first time, we propose a framework using the denoising diffusion probabilistic model (DDPM) to synthesize styled human motions, integrating two tasks into one pipeline with increased style diversity compared with traditional motion synthesis methods. Experimental results show that our system can generate high-quality and diverse walking motions.
    Continuous PDE Dynamics Forecasting with Implicit Neural Representations. (arXiv:2209.14855v1 [cs.LG])
    Effective data-driven PDE forecasting methods often rely on fixed spatial and / or temporal discretizations. This raises limitations in real-world applications like weather prediction where flexible extrapolation at arbitrary spatiotemporal locations is required. We address this problem by introducing a new data-driven approach, DINo, that models a PDE's flow with continuous-time dynamics of spatially continuous functions. This is achieved by embedding spatial observations independently of their discretization via Implicit Neural Representations in a small latent space temporally driven by a learned ODE. This separate and flexible treatment of time and space makes DINo the first data-driven model to combine the following advantages. It extrapolates at arbitrary spatial and temporal locations; it can learn from sparse irregular grids or manifolds; at test time, it generalizes to new grids or resolutions. DINo outperforms alternative neural PDE forecasters in a variety of challenging generalization scenarios on representative PDE systems.
    Dataset Distillation for Medical Dataset Sharing. (arXiv:2209.14603v1 [cs.CR])
    Sharing medical datasets between hospitals is challenging because of the privacy-protection problem and the massive cost of transmitting and storing many high-resolution medical images. However, dataset distillation can synthesize a small dataset such that models trained on it achieve comparable performance with the original large dataset, which shows potential for solving the existing medical sharing problems. Hence, this paper proposes a novel dataset distillation-based method for medical dataset sharing. Experimental results on a COVID-19 chest X-ray image dataset show that our method can achieve high detection performance even using scarce anonymized chest X-ray images.
    Power and limitations of single-qubit native quantum neural networks. (arXiv:2205.07848v2 [quant-ph] UPDATED)
    Quantum neural networks (QNNs) have emerged as a leading strategy to establish applications in machine learning, chemistry, and optimization. While the applications of QNN have been widely investigated, its theoretical foundation remains less understood. In this paper, we formulate a theoretical framework for the expressive ability of data re-uploading quantum neural networks that consist of interleaved encoding circuit blocks and trainable circuit blocks. First, we prove that single-qubit quantum neural networks can approximate any univariate function by mapping the model to a partial Fourier series. We in particular establish the exact correlations between the parameters of the trainable gates and the Fourier coefficients, resolving an open problem on the universal approximation property of QNN. Second, we discuss the limitations of single-qubit native QNNs on approximating multivariate functions by analyzing the frequency spectrum and the flexibility of Fourier coefficients. We further demonstrate the expressivity and limitations of single-qubit native QNNs via numerical experiments. We believe these results would improve our understanding of QNNs and provide a helpful guideline for designing powerful QNNs for machine learning tasks.
    The Survival Bandit Problem. (arXiv:2206.03019v2 [cs.LG] UPDATED)
    We study the survival bandit problem, a variant of the multi-armed bandit problem introduced in an open problem by Perotto et al. (2019), with a constraint on the cumulative reward; at each time step, the agent receives a (possibly negative) reward and if the cumulative reward becomes lower than a prespecified threshold, the procedure stops, and this phenomenon is called ruin. This is the first paper studying a framework where the ruin might occur but not always. We first discuss that a sublinear regret is unachievable under a naive definition of the regret. Next, we provide tight lower bounds on the probability of ruin (as well as matching policies). Based on this lower bound, we define the survival regret as an objective to minimize and provide a policy achieving a sublinear survival regret (at least in the case of integral rewards) when the time horizon $T$ is known.
    Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems. (arXiv:2205.09809v2 [cs.LG] UPDATED)
    Calibration is defined as the ratio of the average predicted click rate to the true click rate. The optimization of calibration is essential to many online advertising recommendation systems because it directly affects the downstream bids in ads auctions and the amount of money charged to advertisers. Despite its importance, calibration optimization often suffers from a problem called "maximization bias". Maximization bias refers to the phenomenon that the maximum of predicted values overestimates the true maximum. The problem is introduced because the calibration is computed on the set selected by the prediction model itself. It persists even if unbiased predictions can be achieved on every datapoint and worsens when covariate shifts exist between the training and test sets. To mitigate this problem, we theorize the quantification of maximization bias and propose a variance-adjusting debiasing (VAD) meta-algorithm in this paper. The algorithm is efficient, robust, and practical as it is able to mitigate maximization bias problems under covariate shifts, neither incurring additional online serving costs nor compromising the ranking performance. We demonstrate the effectiveness of the proposed algorithm using a state-of-the-art recommendation neural network model on a large-scale real-world dataset.
    REST: REtrieve & Self-Train for generative action recognition. (arXiv:2209.15000v1 [cs.CV])
    This work is on training a generative action/video recognition model whose output is a free-form action-specific caption describing the video (rather than an action class label). A generative approach has practical advantages like producing more fine-grained and human-readable output, and being naturally open-world. To this end, we propose to adapt a pre-trained generative Vision & Language (V&L) Foundation Model for video/action recognition. While recently there have been a few attempts to adapt V&L models trained with contrastive learning (e.g. CLIP) for video/action, to the best of our knowledge, we propose the very first method that sets outs to accomplish this goal for a generative model. We firstly show that direct fine-tuning of a generative model to produce action classes suffers from severe overfitting. To alleviate this, we introduce REST, a training framework consisting of two key components: an unsupervised method for adapting the generative model to action/video by means of pseudo-caption generation and Self-training, i.e. without using any action-specific labels; (b) a Retrieval approach based on CLIP for discovering a diverse set of pseudo-captions for each video to train the model. Importantly, we show that both components are necessary to obtain high accuracy. We evaluate REST on the problem of zero-shot action recognition where we show that our approach is very competitive when compared to contrastive learning-based methods. Code will be made available.
    A Survey on Multimodal Disinformation Detection. (arXiv:2103.12541v2 [cs.MM] UPDATED)
    Recent years have witnessed the proliferation of offensive content online such as fake news, propaganda, misinformation, and disinformation. While initially this was mostly about textual content, over time images and videos gained popularity, as they are much easier to consume, attract more attention, and spread further than text. As a result, researchers started leveraging different modalities and combinations thereof to tackle online multimodal offensive content. In this study, we offer a survey on the state-of-the-art on multimodal disinformation detection covering various combinations of modalities: text, images, speech, video, social media network structure, and temporal information. Moreover, while some studies focused on factuality, others investigated how harmful the content is. While these two components in the definition of disinformation (i) factuality, and (ii) harmfulness, are equally important, they are typically studied in isolation. Thus, we argue for the need to tackle disinformation detection by taking into account multiple modalities as well as both factuality and harmfulness, in the same framework. Finally, we discuss current challenges and future research directions
    Learning Parsimonious Dynamics for Generalization in Reinforcement Learning. (arXiv:2209.14781v1 [cs.LG])
    Humans are skillful navigators: We aptly maneuver through new places, realize when we are back at a location we have seen before, and can even conceive of shortcuts that go through parts of our environments we have never visited. Current methods in model-based reinforcement learning on the other hand struggle with generalizing about environment dynamics out of the training distribution. We argue that two principles can help bridge this gap: latent learning and parsimonious dynamics. Humans tend to think about environment dynamics in simple terms -- we reason about trajectories not in reference to what we expect to see along a path, but rather in an abstract latent space, containing information about the places' spatial coordinates. Moreover, we assume that moving around in novel parts of our environment works the same way as in parts we are familiar with. These two principles work together in tandem: it is in the latent space that the dynamics show parsimonious characteristics. We develop a model that learns such parsimonious dynamics. Using a variational objective, our model is trained to reconstruct experienced transitions in a latent space using locally linear transformations, while encouraged to invoke as few distinct transformations as possible. Using our framework, we demonstrate the utility of learning parsimonious latent dynamics models in a range of policy learning and planning tasks.
    PnP-ReG: Learned Regularizing Gradient for Plug-and-Play Gradient Descent. (arXiv:2204.13940v2 [eess.IV] UPDATED)
    The Plug-and-Play (PnP) framework makes it possible to integrate advanced image denoising priors into optimization algorithms, to efficiently solve a variety of image restoration tasks generally formulated as Maximum A Posteriori (MAP) estimation problems. The Plug-and-Play alternating direction method of multipliers (ADMM) and the Regularization by Denoising (RED) algorithms are two examples of such methods that made a breakthrough in image restoration. However, while the former method only applies to proximal algorithms, it has recently been shown that there exists no regularization that explains the RED algorithm when the denoisers lack Jacobian symmetry, which happen to be the case of most practical denoisers. To the best of our knowledge, there exists no method for training a network that directly represents the gradient of a regularizer, which can be directly used in Plug-and-Play gradient-based algorithms. We show that it is possible to train a network directly modeling the gradient of a MAP regularizer while jointly training the corresponding MAP denoiser. We use this network in gradient-based optimization methods and obtain better results comparing to other generic Plug-and-Play approaches. We also show that the regularizer can be used as a pre-trained network for unrolled gradient descent. Lastly, we show that the resulting denoiser allows for a better convergence of the Plug-and-Play ADMM.
    A Causal Approach to Detecting Multivariate Time-series Anomalies and Root Causes. (arXiv:2206.15033v2 [cs.LG] UPDATED)
    Detecting anomalies and the corresponding root causes in multivariate time series plays an important role in monitoring the behaviors of various real-world systems, e.g., IT system operations or manufacturing industry. Previous anomaly detection approaches model the joint distribution without considering the underlying mechanism of multivariate time series, making them computationally hungry and hard to identify root causes. In this paper, we formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data. We then propose a causality-based framework for detecting anomalies and root causes. It first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism whose conditional distribution can be directly estimated from data. In light of the modularity property of causal systems (the causal processes to generate different variables are irrelevant modules), the original problem is divided into a series of separate, simpler, and low-dimensional anomaly detection problems so that where an anomaly happens (root causes) can be directly identified. We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications, showing its efficacy, robustness, and practical feasibility.
    Neighborhood Gradient Clustering: An Efficient Decentralized Learning Method for Non-IID Data Distributions. (arXiv:2209.14390v1 [cs.LG])
    Decentralized learning algorithms enable the training of deep learning models over large distributed datasets generated at different devices and locations, without the need for a central server. In practical scenarios, the distributed datasets can have significantly different data distributions across the agents. The current state-of-the-art decentralized algorithms mostly assume the data distributions to be Independent and Identically Distributed (IID). This paper focuses on improving decentralized learning over non-IID data distributions with minimal compute and memory overheads. We propose Neighborhood Gradient Clustering (NGC), a novel decentralized learning algorithm that modifies the local gradients of each agent using self- and cross-gradient information. In particular, the proposed method replaces the local gradients of the model with the weighted mean of the self-gradients, model-variant cross-gradients (derivatives of the received neighbors' model parameters with respect to the local dataset), and data-variant cross-gradients (derivatives of the local model with respect to its neighbors' datasets). Further, we present CompNGC, a compressed version of NGC that reduces the communication overhead by $32 \times$ by compressing the cross-gradients. We demonstrate the empirical convergence and efficiency of the proposed technique over non-IID data distributions sampled from the CIFAR-10 dataset on various model architectures and graph topologies. Our experiments demonstrate that NGC and CompNGC outperform the existing state-of-the-art (SoTA) decentralized learning algorithm over non-IID data by $1-5\%$ with significantly less compute and memory requirements. Further, we also show that the proposed NGC method outperforms the baseline by $5-40\%$ with no additional communication.
    Transformer Meets Boundary Value Inverse Problems. (arXiv:2209.14977v1 [cs.LG])
    A Transformer-based deep direct sampling method is proposed for solving a class of boundary value inverse problem. A real-time reconstruction is achieved by evaluating the learned inverse operator between carefully designed data and the reconstructed images. An effort is made to give a case study for a fundamental and critical question: whether and how one can benefit from the theoretical structure of a mathematical problem to develop task-oriented and structure-conforming deep neural network? Inspired by direct sampling methods for inverse problems, the 1D boundary data are preprocessed by a partial differential equation-based feature map to yield 2D harmonic extensions in different frequency input channels. Then, by introducing learnable non-local kernel, the approximation of direct sampling is recast to a modified attention mechanism. The proposed method is then applied to electrical impedance tomography, a well-known severely ill-posed nonlinear inverse problem. The new method achieves superior accuracy over its predecessors and contemporary operator learners, as well as shows robustness with respect to noise. This research shall strengthen the insights that the attention mechanism, despite being invented for natural language processing tasks, offers great flexibility to be modified in conformity with the a priori mathematical knowledge, which ultimately leads to the design of more physics-compatible neural architectures.
    Fool SHAP with Stealthily Biased Sampling. (arXiv:2205.15419v2 [cs.LG] UPDATED)
    SHAP explanations aim at identifying which features contribute the most to the difference in model prediction at a specific input versus a background distribution. Recent studies have shown that they can be manipulated by malicious adversaries to produce arbitrary desired explanations. However, existing attacks focus solely on altering the black-box model itself. In this paper, we propose a complementary family of attacks that leave the model intact and manipulate SHAP explanations using stealthily biased sampling of the data points used to approximate expectations w.r.t the background distribution. In the context of fairness audit, we show that our attack can reduce the importance of a sensitive feature when explaining the difference in outcomes between groups while remaining undetected. These results highlight the manipulability of SHAP explanations and encourage auditors to treat them with skepticism.
    VC Theoretical Explanation of Double Descent. (arXiv:2205.15549v3 [stat.ML] UPDATED)
    There has been growing interest in generalization performance of large multilayer neural networks that can be trained to achieve zero training error, while generalizing well on test data. This regime is known as 'second descent' and it appears to contradict the conventional view that optimal model complexity should reflect an optimal balance between underfitting and overfitting, i.e., the bias-variance trade-off. This paper presents a VC-theoretical analysis of double descent and shows that it can be fully explained by classical VC-generalization bounds. We illustrate an application of analytic VC-bounds for modeling double descent for classification, using empirical results for several learning methods, such as SVM, Least Squares, and Multilayer Perceptron classifiers. In addition, we discuss several reasons for the misinterpretation of VC-theoretical results in Deep Learning community.
    A Decision Support System for Safer Airplane Landings: Predicting Runway Conditions Using XGBoost and Explainable AI. (arXiv:2107.04010v2 [cs.CY] UPDATED)
    The presence of snow and ice on runway surfaces reduces the available tire-pavement friction needed for retardation and directional control and causes potential economic and safety threats for the aviation industry during the winter seasons. To activate appropriate safety procedures, pilots need accurate and timely information on the actual runway surface conditions. In this study, XGBoost is used to create a combined runway assessment system, which includes a classification model to identify slippery conditions and a regression model to predict the level of slipperiness. The models are trained on weather data and runway reports. The runway surface conditions are represented by the tire-pavement friction coefficient, which is estimated from flight sensor data from landing aircrafts. The XGBoost models are combined with SHAP approximations to provide a reliable decision support system for airport operators, which can contribute to safer and more economic operations of airport runways. To evaluate the performance of the prediction models, they are compared to several state-of-the-art runway assessment methods. The XGBoost models identify slippery runway conditions with a ROC AUC of 0.95, predict the friction coefficient with a MAE of 0.0254, and outperforms all the previous methods. The results show the strong abilities of machine learning methods to model complex, physical phenomena with a good accuracy. Published version: https://doi.org/10.1016/j.coldregions.2022.103556.
    Enumeration of max-pooling responses with generalized permutohedra. (arXiv:2209.14978v1 [math.CO])
    We investigate the combinatorics of max-pooling layers, which are functions that downsample input arrays by taking the maximum over shifted windows of input coordinates, and which are commonly used in convolutional neural networks. We obtain results on the number of linearity regions of these functions by equivalently counting the number of vertices of certain Minkowski sums of simplices. We characterize the faces of such polytopes and obtain generating functions and closed formulas for the number of vertices and facets in a 1D max-pooling layer depending on the size of the pooling windows and stride, and for the number of vertices in a special case of 2D max-pooling.
    Dilated Neighborhood Attention Transformer. (arXiv:2209.15001v1 [cs.CV])
    Transformers are quickly becoming one of the most heavily applied deep learning architectures across modalities, domains, and tasks. In vision, on top of ongoing efforts into plain transformers, hierarchical transformers have also gained significant attention, thanks to their performance and easy integration into existing frameworks. These models typically employ localized attention mechanisms, such as the sliding-window Neighborhood Attention (NA) or Swin Transformer's Shifted Window Self Attention. While effective at reducing self attention's quadratic complexity, local attention weakens two of the most desirable properties of self attention: long range inter-dependency modeling, and global receptive field. In this paper, we introduce Dilated Neighborhood Attention (DiNA), a natural, flexible and efficient extension to NA that can capture more global context and expand receptive fields exponentially at no additional cost. NA's local attention and DiNA's sparse global attention complement each other, and therefore we introduce Dilated Neighborhood Attention Transformer (DiNAT), a new hierarchical vision transformer built upon both. DiNAT variants enjoy significant improvements over attention-based baselines such as NAT and Swin, as well as modern convolutional baseline ConvNeXt. Our Large model is ahead of its Swin counterpart by 1.5% box AP in COCO object detection, 1.3% mask AP in COCO instance segmentation, and 1.1% mIoU in ADE20K semantic segmentation, and faster in throughput. We believe combinations of NA and DiNA have the potential to empower various tasks beyond those presented in this paper. To support and encourage research in this direction, in vision and beyond, we open-source our project at: https://github.com/SHI-Labs/Neighborhood-Attention-Transformer.
    PiFold: Toward effective and efficient protein inverse folding. (arXiv:2209.12643v2 [cs.AI] UPDATED)
    How can we design protein sequences folding into the desired structures effectively and efficiently? Structure-based protein design has attracted increasing attention in recent years; however, few methods can simultaneously improve the accuracy and efficiency due to the lack of expressive features and autoregressive sequence decoder. To address these issues, we propose PiFold, which contains a novel residue featurizer and PiGNN layers to generate protein sequences in a one-shot way with improved recovery. Experiments show that PiFold could achieve 51.66\% recovery on CATH 4.2, while the inference speed is 70 times faster than the autoregressive competitors. In addition, PiFold achieves 58.72\% and 60.42\% recovery scores on TS50 and TS500, respectively. We conduct comprehensive ablation studies to reveal the role of different types of protein features and model designs, inspiring further simplification and improvement.
    On Transfer Learning in Functional Linear Regression. (arXiv:2206.04277v2 [stat.ML] UPDATED)
    This work studies the problem of transfer learning under the functional linear model framework, which aims to improve the fit of the target model by leveraging the knowledge from related source models. We measure the relatedness between target and source models using Reproducing Kernel Hilbert Spaces, allowing the type of knowledge being transferred to be interpreted by the structure of the spaces. Two algorithms are proposed: one transfers knowledge when the index of transferable sources is known, while the other one utilizes aggregation to achieve knowledge transfer without prior information about the sources. Furthermore, we establish the optimal convergence rates for excess risk, making the statistical gain via transfer learning mathematically provable. The effectiveness of the proposed algorithms is demonstrated on synthetic data as well as real financial data.
    Spectral Bias in Practice: The Role of Function Frequency in Generalization. (arXiv:2110.02424v4 [cs.LG] UPDATED)
    Despite their ability to represent highly expressive functions, deep learning models seem to find simple solutions that generalize surprisingly well. Spectral bias -- the tendency of neural networks to prioritize learning low frequency functions -- is one possible explanation for this phenomenon, but so far spectral bias has primarily been observed in theoretical models and simplified experiments. In this work, we propose methodologies for measuring spectral bias in modern image classification networks on CIFAR-10 and ImageNet. We find that these networks indeed exhibit spectral bias, and that interventions that improve test accuracy on CIFAR-10 tend to produce learned functions that have higher frequencies overall but lower frequencies in the vicinity of examples from each class. This trend holds across variation in training time, model architecture, number of training examples, data augmentation, and self-distillation. We also explore the connections between function frequency and image frequency and find that spectral bias is sensitive to the low frequencies prevalent in natural images. On ImageNet, we find that learned function frequency also varies with internal class diversity, with higher frequencies on more diverse classes. Our work enables measuring and ultimately influencing the spectral behavior of neural networks used for image classification, and is a step towards understanding why deep models generalize well.
    Physics-informed neural networks for solving parametric magnetostatic problems. (arXiv:2202.04041v2 [cs.CE] UPDATED)
    The objective of this paper is to investigate the ability of physics-informed neural networks to learn the magnetic field response as a function of design parameters in the context of a two-dimensional (2-D) magnetostatic problem. Our approach is as follows. First, we present a functional whose minimization is equivalent to solving parametric magnetostatic problems. Subsequently, we use a deep neural network (DNN) to represent the magnetic field as a function of space and parameters that describe geometric features and operating points. We train the DNN by minimizing the physics-informed functional using stochastic gradient descent. Lastly, we demonstrate our approach on a \mbox{ten-dimensional} EI-core electromagnet problem with parameterized geometry. We evaluate the accuracy of the DNN by comparing its predictions to those of finite element analysis.
    A New Index for Clustering Evaluation Based on Density Estimation. (arXiv:2207.01294v3 [cs.LG] UPDATED)
    A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index $ I_a $ is called the Ambiguous Index; the second sub-index $ I_s $ is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with six other internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, Davies-Bouldin index, CDbw, DBCV, and VIASCKDE, on a set of 145 datasets. The result shows the new index significantly improves other internal clustering evaluation indices.
    Why do networks have inhibitory/negative connections?. (arXiv:2208.03211v2 [cs.LG] UPDATED)
    Why do brains have inhibitory connections? Neuroscientists may answer: to balance excitatory connections, to memorize, to decide, to avoid constant seizure, and many more. There seem to be many function-specific stories for the necessity of inhibitory connections. However, in its most general form, there lacks a theoretical result on why brains have inhibitory connections. Leveraging deep neural networks (DNNs), a well-established model for the brain, we ask: why do networks have negative weights? Our answer is: to learn more functions. We prove that, in the absence of negative weights, neural networks are \textit{not} universal approximators. Further, we provide insights on the geometric properties of the representation space that non-negative DNNs cannot represent. While this may be an intuitive result, to the best of our knowledge, there is no formal theory, in neither machine learning nor neuroscience literature, that demonstrates \textit{why} negative weights are crucial in the context of representation capacity. Our result provides the first theoretical justification on why inhibitory connections in brains and negative weights in DNNs are important for networks to represent all functions.
    FedorAS: Federated Architecture Search under system heterogeneity. (arXiv:2206.11239v3 [cs.LG] UPDATED)
    Federated learning (FL) has recently gained considerable attention due to its ability to learn on decentralised data while preserving client privacy. However, it also poses additional challenges related to the heterogeneity of the participating devices, both in terms of their computational capabilities and contributed data. Meanwhile, Neural Architecture Search (NAS) has been successfully used with centralised datasets, producing state-of-the-art results in constrained or unconstrained settings. However, such centralised datasets may not be always available for training. Most recent work at the intersection of NAS and FL attempts to alleviate this issue in a cross-silo federated setting, which assumes homogeneous compute environments with datacenter-grade hardware. In this paper we explore the question of whether we can design architectures of different footprints in a cross-device federated setting, where the device landscape, availability and scale are very different. To this end, we design our system, FedorAS, to discover and train promising architectures in a resource-aware manner when dealing with devices of varying capabilities holding non-IID distributed data. We present empirical evidence of its effectiveness across different settings, spanning across three different modalities (vision, speech, text), and showcase its better performance compared to state-of-the-art federated solutions, while maintaining resource efficiency.
    Robustness to corruption in pre-trained Bayesian neural networks. (arXiv:2206.12361v2 [cs.LG] UPDATED)
    We develop ShiftMatch, a new training-data-dependent likelihood for robustness to corruption in Bayesian neural networks (BNNs). ShiftMatch is inspired by the training-data-dependent "EmpCov" priors from Izmailov et al. (2021a), and efficiently matches test-time spatial correlations to those at training time. Critically, ShiftMatch is designed to leave the neural network's training time likelihood unchanged, allowing it to use publicly available samples from pre-trained BNNs. Using pre-trained HMC samples, ShiftMatch gives strong performance improvements on CIFAR-10-C, outperforms EmpCov priors (though ShiftMatch uses extra information from a minibatch of corrupted test points), and is perhaps the first Bayesian method capable of convincingly outperforming plain deep ensembles.
    Look where you look! Saliency-guided Q-networks for visual RL tasks. (arXiv:2209.09203v2 [cs.LG] UPDATED)
    Deep reinforcement learning policies, despite their outstanding efficiency in simulated visual control tasks, have shown disappointing ability to generalize across disturbances in the input training images. Changes in image statistics or distracting background elements are pitfalls that prevent generalization and real-world applicability of such control policies. We elaborate on the intuition that a good visual policy should be able to identify which pixels are important for its decision, and preserve this identification of important sources of information across images. This implies that training of a policy with small generalization gap should focus on such important pixels and ignore the others. This leads to the introduction of saliency-guided Q-networks (SGQN), a generic method for visual reinforcement learning, that is compatible with any value function learning method. SGQN vastly improves the generalization capability of Soft Actor-Critic agents and outperforms existing stateof-the-art methods on the Deepmind Control Generalization benchmark, setting a new reference in terms of training efficiency, generalization gap, and policy interpretability.
    The Role of Local Steps in Local SGD. (arXiv:2203.06798v3 [cs.LG] UPDATED)
    We consider the distributed stochastic optimization problem where $n$ agents want to minimize a global function given by the sum of agents' local functions, and focus on the heterogeneous setting when agents' local functions are defined over non-i.i.d. data sets. We study the Local SGD method, where agents perform a number of local stochastic gradient steps and occasionally communicate with a central node to improve their local optimization tasks. We analyze the effect of local steps on the convergence rate and the communication complexity of Local SGD. In particular, instead of assuming a fixed number of local steps across all communication rounds, we allow the number of local steps during the $i$-th communication round, $H_i$, to be different and arbitrary numbers. Our main contribution is to characterize the convergence rate of Local SGD as a function of $\{H_i\}_{i=1}^R$ under various settings of strongly convex, convex, and nonconvex local functions, where $R$ is the total number of communication rounds. Based on this characterization, we provide sufficient conditions on the sequence $\{H_i\}_{i=1}^R$ such that Local SGD can achieve linear speed-up with respect to the number of workers. Furthermore, we propose a new communication strategy with increasing local steps superior to existing communication strategies for strongly convex local functions. On the other hand, for convex and nonconvex local functions, we argue that fixed local steps are the best communication strategy for Local SGD and recover state-of-the-art convergence rate results. Finally, we justify our theoretical results through extensive numerical experiments.
    Pyramidal Denoising Diffusion Probabilistic Models. (arXiv:2208.01864v2 [cs.CV] UPDATED)
    Recently, diffusion model have demonstrated impressive image generation performances, and have been extensively studied in various computer vision tasks. Unfortunately, training and evaluating diffusion models consume a lot of time and computational resources. To address this problem, here we present a novel pyramidal diffusion model that can generate high resolution images starting from much coarser resolution images using a {\em single} score function trained with a positional embedding. This enables a neural network to be much lighter and also enables time-efficient image generation without compromising its performances. Furthermore, we show that the proposed approach can be also efficiently used for multi-scale super-resolution problem using a single score function.
    Joint Embedding Self-Supervised Learning in the Kernel Regime. (arXiv:2209.14884v1 [cs.LG])
    The fundamental goal of self-supervised learning (SSL) is to produce useful representations of data without access to any labels for classifying the data. Modern methods in SSL, which form representations based on known or constructed relationships between samples, have been particularly effective at this task. Here, we aim to extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel. In this kernel regime, we derive methods to find the optimal form of the output representations for contrastive and non-contrastive loss functions. This procedure produces a new representation space with an inner product denoted as the induced kernel which generally correlates points which are related by an augmentation in kernel space and de-correlates points otherwise. We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.
    Understanding Collapse in Non-Contrastive Learning. (arXiv:2209.15007v1 [cs.LG])
    Contrastive methods have led a recent surge in the performance of self-supervised representation learning (SSL). Recent methods like BYOL or SimSiam purportedly distill these contrastive methods down to their essence, removing bells and whistles, including the negative examples, that do not contribute to downstream performance. These "non-contrastive" methods work surprisingly well without using negatives even though the global minimum lies at trivial collapse. We empirically analyze these non-contrastive methods and find that SimSiam is extraordinarily sensitive to dataset and model size. In particular, SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size. We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels. We further analyze architectural design choices and their effect on the downstream performance. Finally, we demonstrate that shifting to a continual learning setting acts as a regularizer and prevents collapse, and a hybrid between continual and multi-epoch training can improve linear probe accuracy by as many as 18 percentage points using ResNet-18 on ImageNet.
    Multiple Modes for Continual Learning. (arXiv:2209.14996v1 [cs.LG])
    Adapting model parameters to incoming streams of data is a crucial factor to deep learning scalability. Interestingly, prior continual learning strategies in online settings inadvertently anchor their updated parameters to a local parameter subspace to remember old tasks, else drift away from the subspace and forget. From this observation, we formulate a trade-off between constructing multiple parameter modes and allocating tasks per mode. Mode-Optimized Task Allocation (MOTA), our contributed adaptation strategy, trains multiple modes in parallel, then optimizes task allocation per mode. We empirically demonstrate improvements over baseline continual learning strategies and across varying distribution shifts, namely sub-population, domain, and task shift.
    Joint Optimization of Energy Consumption and Completion Time in Federated Learning. (arXiv:2209.14900v1 [cs.LG])
    Federated Learning (FL) is an intriguing distributed machine learning approach due to its privacy-preserving characteristics. To balance the trade-off between energy and execution latency, and thus accommodate different demands and application scenarios, we formulate an optimization problem to minimize a weighted sum of total energy consumption and completion time through two weight parameters. The optimization variables include bandwidth, transmission power and CPU frequency of each device in the FL system, where all devices are linked to a base station and train a global model collaboratively. Through decomposing the non-convex optimization problem into two subproblems, we devise a resource allocation algorithm to determine the bandwidth allocation, transmission power, and CPU frequency for each participating device. We further present the convergence analysis and computational complexity of the proposed algorithm. Numerical results show that our proposed algorithm not only has better performance at different weight parameters (i.e., different demands) but also outperforms the state of the art.
    Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges. (arXiv:2209.14930v1 [cs.LG])
    Graphs are used widely to model complex systems, and detecting anomalies in a graph is an important task in the analysis of complex systems. Graph anomalies are patterns in a graph that do not conform to normal patterns expected of the attributes and/or structures of the graph. In recent years, graph neural networks (GNNs) have been studied extensively and have successfully performed difficult machine learning tasks in node classification, link prediction, and graph classification thanks to the highly expressive capability via message passing in effectively learning graph representations. To solve the graph anomaly detection problem, GNN-based methods leverage information about the graph attributes (or features) and/or structures to learn to score anomalies appropriately. In this survey, we review the recent advances made in detecting graph anomalies using GNN models. Specifically, we summarize GNN-based methods according to the graph type (i.e., static and dynamic), the anomaly type (i.e., node, edge, subgraph, and whole graph), and the network architecture (e.g., graph autoencoder, graph convolutional network). To the best of our knowledge, this survey is the first comprehensive review of graph anomaly detection methods based on GNNs.
    Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms. (arXiv:2209.14990v1 [cs.LG])
    Partial Observability -- where agents can only observe partial information about the true underlying state of the system -- is ubiquitous in real-world applications of Reinforcement Learning (RL). Theoretically, learning a near-optimal policy under partial observability is known to be hard in the worst case due to an exponential sample complexity lower bound. Recent work has identified several tractable subclasses that are learnable with polynomial samples, such as Partially Observable Markov Decision Processes (POMDPs) with certain revealing or decodability conditions. However, this line of research is still in its infancy, where (1) unified structural conditions enabling sample-efficient learning are lacking; (2) existing sample complexities for known tractable subclasses are far from sharp; and (3) fewer sample-efficient algorithms are available than in fully observable RL. This paper advances all three aspects above for Partially Observable RL in the general setting of Predictive State Representations (PSRs). First, we propose a natural and unified structural condition for PSRs called \emph{B-stability}. B-stable PSRs encompasses the vast majority of known tractable subclasses such as weakly revealing POMDPs, low-rank future-sufficient POMDPs, decodable POMDPs, and regular PSRs. Next, we show that any B-stable PSR can be learned with polynomial samples in relevant problem parameters. When instantiated in the aforementioned subclasses, our sample complexities improve substantially over the current best ones. Finally, our results are achieved by three algorithms simultaneously: Optimistic Maximum Likelihood Estimation, Estimation-to-Decisions, and Model-Based Optimistic Posterior Sampling. The latter two algorithms are new for sample-efficient learning of POMDPs/PSRs.
    Surface Similarity Parameter: A New Machine Learning Loss Metric for Oscillatory Spatio-Temporal Data. (arXiv:2204.06843v2 [cs.LG] UPDATED)
    Supervised machine learning approaches require the formulation of a loss functional to be minimized in the training phase. Sequential data are ubiquitous across many fields of research, and are often treated with Euclidean distance-based loss functions that were designed for tabular data. For smooth oscillatory data, those conventional approaches lack the ability to penalize amplitude, frequency and phase prediction errors at the same time, and tend to be biased towards amplitude errors. We introduce the surface similarity parameter (SSP) as a novel loss function that is especially useful for training machine learning models on smooth oscillatory sequences. Our extensive experiments on chaotic spatio-temporal dynamics systems indicate that the SSP is beneficial for shaping gradients, thereby accelerating the training process, reducing the final prediction error, increasing weight initialization robustness, and implementing a stronger regularization effect compared to using classical loss functions. The results indicate the potential of the novel loss metric particularly for highly complex and chaotic data, such as data stemming from the nonlinear two-dimensional Kuramoto-Sivashinsky equation and the linear propagation of dispersive surface gravity waves in fluids.
    Bridging the Gap to Real-World Object-Centric Learning. (arXiv:2209.14860v1 [cs.CV])
    Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing object-centric learning models on simulated data and is the first unsupervised object-centric model that scales to real world-datasets such as COCO and PASCAL VOC. DINOSAUR is conceptually simple and shows competitive performance compared to more involved pipelines from the computer vision literature.
    Statistical Learning and Inverse Problems: An Stochastic Gradient Approach. (arXiv:2209.14967v1 [stat.ML])
    Inverse problems are paramount in Science and Engineering. In this paper, we consider the setup of Statistical Inverse Problem (SIP) and demonstrate how Stochastic Gradient Descent (SGD) algorithms can be used in the linear SIP setting. We provide consistency and finite sample bounds for the excess risk. We also propose a modification for the SGD algorithm where we leverage machine learning methods to smooth the stochastic gradients and improve empirical performance. We exemplify the algorithm in a setting of great interest nowadays: the Functional Linear Regression model. In this case we consider a synthetic data example and examples with a real data classification problem.
    Deep Isolation Forest for Anomaly Detection. (arXiv:2206.06602v2 [cs.LG] UPDATED)
    Isolation forest (iForest) has been emerging as arguably the most popular anomaly detector in recent years due to its general effectiveness across different benchmarks and strong scalability. Nevertheless, its linear axis-parallel isolation method often leads to (i) failure in detecting hard anomalies that are difficult to isolate in high-dimensional/non-linear-separable data space, and (ii) notorious algorithmic bias that assigns unexpectedly lower anomaly scores to artefact regions. These issues contribute to high false negative errors. Several iForest extensions are introduced, but they essentially still employ shallow, linear data partition, restricting their power in isolating true anomalies. Therefore, this paper proposes deep isolation forest. We introduce a new representation scheme that utilises casually initialised neural networks to map original data into random representation ensembles, where random axis-parallel cuts are subsequently applied to perform the data partition. This representation scheme facilitates high freedom of the partition in the original data space (equivalent to non-linear partition on subspaces of varying sizes), encouraging a unique synergy between random representations and random partition-based isolation. Extensive experiments show that our model achieves significant improvement over state-of-the-art isolation-based methods and deep detectors on tabular, graph and time series datasets; our model also inherits desired scalability from iForest.
    Reinforcement Learning Algorithms: An Overview and Classification. (arXiv:2209.14940v1 [cs.LG])
    The desire to make applications and machines more intelligent and the aspiration to enable their operation without human interaction have been driving innovations in neural networks, deep learning, and other machine learning techniques. Although reinforcement learning has been primarily used in video games, recent advancements and the development of diverse and powerful reinforcement algorithms have enabled the reinforcement learning community to move from playing video games to solving complex real-life problems in autonomous systems such as self-driving cars, delivery drones, and automated robotics. Understanding the environment of an application and the algorithms' limitations plays a vital role in selecting the appropriate reinforcement learning algorithm that successfully solves the problem on hand in an efficient manner. Consequently, in this study, we identify three main environment types and classify reinforcement learning algorithms according to those environment types. Moreover, within each category, we identify relationships between algorithms. The overview of each algorithm provides insight into the algorithms' foundations and reviews similarities and differences among algorithms. This study provides a perspective on the field and helps practitioners and researchers to select the appropriate algorithm for their use case.
    Greybox XAI: a Neural-Symbolic learning framework to produce interpretable predictions for image classification. (arXiv:2209.14974v1 [cs.CV])
    Although Deep Neural Networks (DNNs) have great generalization and prediction capabilities, their functioning does not allow a detailed explanation of their behavior. Opaque deep learning models are increasingly used to make important predictions in critical environments, and the danger is that they make and use predictions that cannot be justified or legitimized. Several eXplainable Artificial Intelligence (XAI) methods that separate explanations from machine learning models have emerged, but have shortcomings in faithfulness to the model actual functioning and robustness. As a result, there is a widespread agreement on the importance of endowing Deep Learning models with explanatory capabilities so that they can themselves provide an answer to why a particular prediction was made. First, we address the problem of the lack of universal criteria for XAI by formalizing what an explanation is. We also introduced a set of axioms and definitions to clarify XAI from a mathematical perspective. Finally, we present the Greybox XAI, a framework that composes a DNN and a transparent model thanks to the use of a symbolic Knowledge Base (KB). We extract a KB from the dataset and use it to train a transparent model (i.e., a logistic regression). An encoder-decoder architecture is trained on RGB images to produce an output similar to the KB used by the transparent model. Once the two models are trained independently, they are used compositionally to form an explainable predictive model. We show how this new architecture is accurate and explainable in several datasets.
    Evaluating the temporal understanding of neural networks on event-based action recognition with DVS-Gesture-Chain. (arXiv:2209.14915v1 [cs.CV])
    Enabling artificial neural networks (ANNs) to have temporal understanding in visual tasks is an essential requirement in order to achieve complete perception of video sequences. A wide range of benchmark datasets is available to allow for the evaluation of such capabilities when using conventional frame-based video sequences. In contrast, evaluating them for systems targeting neuromorphic data is still a challenge due to the lack of appropriate datasets. In this work we define a new benchmark task for action recognition in event-based video sequences, DVS-Gesture-Chain (DVS-GC), which is based on the temporal combination of multiple gestures from the widely used DVS-Gesture dataset. This methodology allows to create datasets that are arbitrarily complex in the temporal dimension. Using our newly defined task, we evaluate the spatio-temporal understanding of different feed-forward convolutional ANNs and convolutional Spiking Neural Networks (SNNs). Our study proves how the original DVS Gesture benchmark could be solved by networks without temporal understanding, unlike the new DVS-GC which demands an understanding of the ordering of events. From there, we provide a study showing how certain elements such as spiking neurons or time-dependent weights allow for temporal understanding in feed-forward networks without the need for recurrent connections. Code available at: https://github.com/VicenteAlex/DVS-Gesture-Chain
    Training Normalizing Flows from Dependent Data. (arXiv:2209.14933v1 [cs.LG])
    Normalizing flows are powerful non-parametric statistical models that function as a hybrid between density estimators and generative models. Current learning algorithms for normalizing flows assume that data points are sampled independently, an assumption that is frequently violated in practice, which may lead to erroneous density estimation and data generation. We propose a likelihood objective of normalizing flows incorporating dependencies between the data points, for which we derive a flexible and efficient learning algorithm suitable for different dependency structures. We show that respecting dependencies between observations can improve empirical results on both synthetic and real-world data.
    NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizers. (arXiv:2209.14937v1 [math.OC])
    Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms. The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper we propose a novel, robust and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differential Equation (SDE) and (2) its semi-implicit Gauss-Seidel type discretization. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively in the case of the minimization of a quadratic function. This analysis allows us to come up with an optimal step size (or learning rate) in terms of rate of convergence while ensuring the stability of NAG-GS. This is achieved by the careful analysis of the spectral radius of the iteration matrix and the covariance matrix at stationarity with respect to all hyperparameters of our method. We show that NAG-GS is competitive with state-of-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models such as the logistic regression model, the residual networks models on standard computer vision datasets, and Transformers in the frame of the GLUE benchmark.
    Hyperspectral Remote Sensing Benchmark Database for Oil Spill Detection with an Isolation Forest-Guided Unsupervised Detector. (arXiv:2209.14971v1 [cs.CV])
    Oil spill detection has attracted increasing attention in recent years since marine oil spill accidents severely affect environments, natural resources, and the lives of coastal inhabitants. Hyperspectral remote sensing images provide rich spectral information which is beneficial for the monitoring of oil spills in complex ocean scenarios. However, most of the existing approaches are based on supervised and semi-supervised frameworks to detect oil spills from hyperspectral images (HSIs), which require a huge amount of effort to annotate a certain number of high-quality training sets. In this study, we make the first attempt to develop an unsupervised oil spill detection method based on isolation forest for HSIs. First, considering that the noise level varies among different bands, a noise variance estimation method is exploited to evaluate the noise level of different bands, and the bands corrupted by severe noise are removed. Second, kernel principal component analysis (KPCA) is employed to reduce the high dimensionality of the HSIs. Then, the probability of each pixel belonging to one of the classes of seawater and oil spills is estimated with the isolation forest, and a set of pseudo-labeled training samples is automatically produced using the clustering algorithm on the detected probability. Finally, an initial detection map can be obtained by performing the support vector machine (SVM) on the dimension-reduced data, and then, the initial detection result is further optimized with the extended random walker (ERW) model so as to improve the detection accuracy of oil spills. Experiments on airborne hyperspectral oil spill data (HOSD) created by ourselves demonstrate that the proposed method obtains superior detection performance with respect to other state-of-the-art detection approaches.
    Extracting Dynamical Models from Data. (arXiv:2110.06917v3 [cs.LG] UPDATED)
    The problem of determining the underlying dynamics of a system when only given data of its state over time has challenged scientists for decades. In this paper, the approach of using machine learning to model the {\em updates} of the phase space variables is introduced; this is done as a function of the phase space variables. (More generally, the modeling is done over the jet space of the variables.) This approach is shown to accurately replicate the dynamics for the examples of the harmonic oscillator, the pendulum, and the Duffing oscillator; the underlying differential equation is also accurately recovered in each example. In addition, the results in no way depend on how the data is sampled over time (i.e., regularly or irregularly). It is demonstrated that this approach (named "FJet") is similar to the model resulting from a Taylor series expansion of the Runge-Kutta (RK) numerical integration scheme. This analogy confers the advantage of explicitly revealing the appropriate functions to use in the modeling, as well as revealing the error estimate for the updates. Thus, this new approach can be thought of as a way to determine the coefficients of an RK scheme by machine learning. Finally, it is shown in the undamped harmonic oscillator example that the stability of the updates is stable for $10^9$ times longer than with $4$th-order RK.
    Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark. (arXiv:2202.06767v4 [cs.CV] UPDATED)
    Vision-Language Pre-training (VLP) models have shown remarkable performance on various downstream tasks. Their success heavily relies on the scale of pre-trained cross-modal datasets. However, the lack of large-scale datasets and benchmarks in Chinese hinders the development of Chinese VLP models and broader multilingual applications. In this work, we release a large-scale Chinese cross-modal dataset named Wukong, which contains 100 million Chinese image-text pairs collected from the web. Wukong aims to benchmark different multi-modal pre-training methods to facilitate the VLP research and community development. Furthermore, we release a group of models pre-trained with various image encoders (ViT-B/ViT-L/SwinT) and also apply advanced pre-training techniques into VLP such as locked-image text tuning, token-wise similarity in contrastive learning, and reduced-token interaction. Extensive experiments and a benchmarking of different downstream tasks including a new largest human-verified image-text test dataset are also provided. Experiments show that Wukong can serve as a promising Chinese pre-training dataset and benchmark for different cross-modal learning methods. For the zero-shot image classification task on 10 datasets, $Wukong_{ViT-L}$ achieves an average accuracy of 73.03%. For the image-text retrieval task, it achieves a mean recall of 71.6% on AIC-ICC which is 12.9% higher than WenLan 2.0. Also, our Wukong models are benchmarked on downstream tasks with other variants on multiple datasets, e.g., Flickr8K-CN, Flickr-30K-CN, COCO-CN, et al. More information can be referred to: https://wukong-dataset.github.io/wukong-dataset/.
    Deep Unfolding for Iterative Stripe Noise Removal. (arXiv:2209.14973v1 [eess.IV])
    The non-uniform photoelectric response of infrared imaging systems results in fixed-pattern stripe noise being superimposed on infrared images, which severely reduces image quality. As the applications of degraded infrared images are limited, it is crucial to effectively preserve original details. Existing image destriping methods struggle to concurrently remove all stripe noise artifacts, preserve image details and structures, and balance real-time performance. In this paper we propose a novel algorithm for destriping degraded images, which takes advantage of neighbouring column signal correlation to remove independent column stripe noise. This is achieved through an iterative deep unfolding algorithm where the estimated noise of one network iteration is used as input to the next iteration. This progression substantially reduces the search space of possible function approximations, allowing for efficient training on larger datasets. The proposed method allows for a more precise estimation of stripe noise to preserve scene details more accurately. Extensive experimental results demonstrate that the proposed model outperforms existing destriping methods on artificially corrupted images on both quantitative and qualitative assessments.
    DreamFusion: Text-to-3D using 2D Diffusion. (arXiv:2209.14988v1 [cs.CV])
    Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
    Causal Inference via Nonlinear Variable Decorrelation for Healthcare Applications. (arXiv:2209.14975v1 [cs.LG])
    Causal inference and model interpretability research are gaining increasing attention, especially in the domains of healthcare and bioinformatics. Despite recent successes in this field, decorrelating features under nonlinear environments with human interpretable representations has not been adequately investigated. To address this issue, we introduce a novel method with a variable decorrelation regularizer to handle both linear and nonlinear confounding. Moreover, we employ association rules as new representations using association rule mining based on the original features to further proximate human decision patterns to increase model interpretability. Extensive experiments are conducted on four healthcare datasets (one synthetically generated and three real-world collections on different diseases). Quantitative results in comparison to baseline approaches on parameter estimation and causality computation indicate the model's superior performance. Furthermore, expert evaluation given by healthcare professionals validates the effectiveness and interpretability of the proposed model.
    Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations. (arXiv:2209.14905v1 [cs.LG])
    Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W-MSE avoid collapse of their joint embedding architectures by constraining or regularizing the covariance matrix of their projector's output. This study highlights important properties of such strategy, which we coin Variance-Covariance regularization (VCReg). More precisely, we show that VCReg enforces pairwise independence between the features of the learned representation. This result emerges by bridging VCReg applied on the projector's output to kernel independence criteria applied on the projector's input. This provides the first theoretical motivations and explanations of VCReg. We empirically validate our findings where (i) we observe that SSL methods employing VCReg learn visual representations with greater pairwise independence than other methods, (i) we put in evidence which projector's characteristics favor pairwise independence, and show it to emerge independently from learning the projector, (ii) we use these findings to obtain nontrivial performance gains for VICReg, (iii) we demonstrate that the scope of VCReg goes beyond SSL by using it to solve Independent Component Analysis. We hope that our findings will support the adoption of VCReg in SSL and beyond.
    Applying Machine Learning for Duplicate Detection, Throttling and Prioritization of Equipment Commissioning Audits at Fulfillment Network. (arXiv:2209.14409v1 [cs.LG])
    VQ (Vendor Qualification) and IOQ (Installation and Operation Qualification) audits are implemented in warehouses to ensure all equipment being turned over in the fulfillment network meets the quality standards. Audit checks are likely to be skipped if there are many checks to be performed in a short time. In addition, exploratory data analysis reveals several instances of similar checks being performed on the same assets and thus, duplicating the effort. In this work, Natural Language Processing and Machine Learning are applied to trim a large checklist dataset for a network of warehouses by identifying similarities and duplicates, and predict the non-critical ones with a high passing rate. The study proposes ML classifiers to identify checks which have a high passing probability of IOQ and VQ and assign priorities to checks to be prioritized when the time is not available to perform all checks. This research proposes using NLP-based BlazingText classifier to throttle the checklists with a high passing rate, which can reduce 10%-37% of the checks and achieve significant cost reduction. The applied algorithm over performs Random Forest and Neural Network classifiers and achieves an area under the curve of 90%. Because of imbalanced data, down-sampling and upweighting have shown a positive impact on the models' accuracy using F1 score, which improve from 8% to 75%. In addition, the proposed duplicate detection process identifies 17% possible redundant checks to be trimmed.
    polyBERT: A chemical language model to enable fully machine-driven ultrafast polymer informatics. (arXiv:2209.14803v1 [cond-mat.mtrl-sci])
    Polymers are a vital part of everyday life. Their chemical universe is so large that it presents unprecedented opportunities as well as significant challenges to identify suitable application-specific candidates. We present a complete end-to-end machine-driven polymer informatics pipeline that can search this space for suitable candidates at unprecedented speed and accuracy. This pipeline includes a polymer chemical fingerprinting capability called polyBERT (inspired by Natural Language Processing concepts), and a multitask learning approach that maps the polyBERT fingerprints to a host of properties. polyBERT is a chemical linguist that treats the chemical structure of polymers as a chemical language. The present approach outstrips the best presently available concepts for polymer property prediction based on handcrafted fingerprint schemes in speed by two orders of magnitude while preserving accuracy, thus making it a strong candidate for deployment in scalable architectures including cloud infrastructures.
    Towards Lightweight Black-Box Attacks against Deep Neural Networks. (arXiv:2209.14826v1 [cs.LG])
    Black-box attacks can generate adversarial examples without accessing the parameters of target model, largely exacerbating the threats of deployed deep neural networks (DNNs). However, previous works state that black-box attacks fail to mislead target models when their training data and outputs are inaccessible. In this work, we argue that black-box attacks can pose practical attacks in this extremely restrictive scenario where only several test samples are available. Specifically, we find that attacking the shallow layers of DNNs trained on a few test samples can generate powerful adversarial examples. As only a few samples are required, we refer to these attacks as lightweight black-box attacks. The main challenge to promoting lightweight attacks is to mitigate the adverse impact caused by the approximation error of shallow layers. As it is hard to mitigate the approximation error with few available samples, we propose Error TransFormer (ETF) for lightweight attacks. Namely, ETF transforms the approximation error in the parameter space into a perturbation in the feature space and alleviates the error by disturbing features. In experiments, lightweight black-box attacks with the proposed ETF achieve surprising results. For example, even if only 1 sample per category available, the attack success rate in lightweight black-box attacks is only about 3% lower than that of the black-box attacks with complete training data.
    Sparse PCA With Multiple Components. (arXiv:2209.14790v1 [math.OC])
    Sparse Principal Component Analysis is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. At its heart, this involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing work address sparse PCA via heuristics such as iteratively computing one sparse PC and deflating the covariance matrix, which does not guarantee the orthogonality, let alone the optimality, of the resulting solution. We challenge this status by reformulating the orthogonality conditions as rank constraints and optimizing over the sparsity and rank constraints simultaneously. We design tight semidefinite relaxations and propose tractable second-order cone versions of these relaxations which supply high-quality upper bounds. We also design valid second-order cone inequalities which hold when each PC's individual sparsity is specified, and demonstrate that these inequalities tighten our relaxations significantly. Moreover, we propose exact methods and rounding mechanisms that exploit these relaxations' tightness to obtain solutions with a bound gap on the order of 1%-5% for real-world datasets with p = 100s or 1000s of features and r \in {2, 3} components. We investigate the performance of our methods in spiked covariance settings and demonstrate that simultaneously considering the orthogonality and sparsity constraints leads to improvements in the Area Under the ROC curve of 2%-8% compared to state-of-the-art deflation methods. All in all, our approach solves sparse PCA problems with multiple components to certifiable (near) optimality in a practically tractable fashion.
    META-STORM: Generalized Fully-Adaptive Variance Reduced SGD for Unbounded Functions. (arXiv:2209.14853v1 [cs.LG])
    We study the application of variance reduction (VR) techniques to general non-convex stochastic optimization problems. In this setting, the recent work STORM [Cutkosky-Orabona '19] overcomes the drawback of having to compute gradients of "mega-batches" that earlier VR methods rely on. There, STORM utilizes recursive momentum to achieve the VR effect and is then later made fully adaptive in STORM+ [Levy et al., '21], where full-adaptivity removes the requirement for obtaining certain problem-specific parameters such as the smoothness of the objective and bounds on the variance and norm of the stochastic gradients in order to set the step size. However, STORM+ crucially relies on the assumption that the function values are bounded, excluding a large class of useful functions. In this work, we propose META-STORM, a generalized framework of STORM+ that removes this bounded function values assumption while still attaining the optimal convergence rate for non-convex optimization. META-STORM not only maintains full-adaptivity, removing the need to obtain problem specific parameters, but also improves the convergence rate's dependency on the problem parameters. Furthermore, META-STORM can utilize a large range of parameter settings that subsumes previous methods allowing for more flexibility in a wider range of settings. Finally, we demonstrate the effectiveness of META-STORM through experiments across common deep learning tasks. Our algorithm improves upon the previous work STORM+ and is competitive with widely used algorithms after the addition of per-coordinate update and exponential moving average heuristics.
    Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus. (arXiv:2209.14927v1 [cs.CV])
    Mobile UI understanding is important for enabling various interaction tasks such as UI automation and accessibility. Previous mobile UI modeling often depends on the view hierarchy information of a screen, which directly provides the structural data of the UI, with the hope to bypass challenging tasks of visual modeling from screen pixels. However, view hierarchy is not always available, and is often corrupted with missing object descriptions or misaligned bounding box positions. As a result, although using view hierarchy offers some short-term gains, it may ultimately hinder the applicability and performance of the model. In this paper, we propose Spotlight, a vision-only approach for mobile UI understanding. Specifically, we enhance a vision-language model that only takes the screenshot of the UI and a region of interest on the screen -- the focus -- as the input. This general architecture is easily scalable and capable of performing a range of UI modeling tasks. Our experiments show that our model obtains SoTA results on several representative UI tasks and outperforms previous methods that use both screenshots and view hierarchies as input. Furthermore, we explore the multi-task learning and few-shot prompting capacity of the proposed models, demonstrating promising results in the multi-task learning direction.
    Sequential Attention for Feature Selection. (arXiv:2209.14881v1 [cs.LG])
    Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a resource budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and stochastic gates, typically select all of the features in one evaluation round, ignoring the residual value of the features during selection (i.e., the marginal contribution of a feature conditioned on the previously selected features). We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient implementation of greedy forward selection and uses attention weights at each step as a proxy for marginal feature importance. We provide theoretical insights into our Sequential Attention algorithm for linear regression models by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit algorithm [PRK1993], and thus inherits all of its provable guarantees. Lastly, our theoretical and empirical analyses provide new explanations towards the effectiveness of attention and its connections to overparameterization, which might be of independent interest.
    R2C-GAN: Restore-to-Classify GANs for Blind X-Ray Restoration and COVID-19 Classification. (arXiv:2209.14770v1 [eess.IV])
    Restoration of poor quality images with a blended set of artifacts plays a vital role for a reliable diagnosis. Existing studies have focused on specific restoration problems such as image deblurring, denoising, and exposure correction where there is usually a strong assumption on the artifact type and severity. As a pioneer study in blind X-ray restoration, we propose a joint model for generic image restoration and classification: Restore-to-Classify Generative Adversarial Networks (R2C-GANs). Such a jointly optimized model keeps any disease intact after the restoration. Therefore, this will naturally lead to a higher diagnosis performance thanks to the improved X-ray image quality. To accomplish this crucial objective, we define the restoration task as an Image-to-Image translation problem from poor quality having noisy, blurry, or over/under-exposed images to high quality image domain. The proposed R2C-GAN model is able to learn forward and inverse transforms between the two domains using unpaired training samples. Simultaneously, the joint classification preserves the disease label during restoration. Moreover, the R2C-GANs are equipped with operational layers/neurons reducing the network depth and further boosting both restoration and classification performances. The proposed joint model is extensively evaluated over the QaTa-COV19 dataset for Coronavirus Disease 2019 (COVID-19) classification. The proposed restoration approach achieves over 90% F1-Score which is significantly higher than the performance of any deep model. Moreover, in the qualitative analysis, the restoration performance of R2C-GANs is approved by a group of medical doctors. We share the software implementation at https://github.com/meteahishali/R2C-GAN.
    Diffusion Posterior Sampling for General Noisy Inverse Problems. (arXiv:2209.14687v1 [stat.ML])
    Diffusion models have been recently studied as powerful generative inverse problem solvers, owing to their high quality reconstructions and the ease of combining existing iterative solvers. However, most works focus on solving simple linear inverse problems in noiseless settings, which significantly under-represents the complexity of real-world problems. In this work, we extend diffusion solvers to efficiently handle general noisy (non)linear inverse problems via the Laplace approximation of the posterior sampling. Interestingly, the resulting posterior sampling scheme is a blended version of diffusion sampling with the manifold constrained gradient without a strict measurement consistency projection step, yielding a more desirable generative path in noisy settings compared to the previous studies. Our method demonstrates that diffusion models can incorporate various measurement noise statistics such as Gaussian and Poisson, and also efficiently handle noisy nonlinear inverse problems such as Fourier phase retrieval and non-uniform deblurring.
    Optimal Stopping with Gaussian Processes. (arXiv:2209.14738v1 [stat.ML])
    We propose a novel group of Gaussian Process based algorithms for fast approximate optimal stopping of time series with specific applications to financial markets. We show that structural properties commonly exhibited by financial time series (e.g., the tendency to mean-revert) allow the use of Gaussian and Deep Gaussian Process models that further enable us to analytically evaluate optimal stopping value functions and policies. We additionally quantify uncertainty in the value function by propagating the price model through the optimal stopping analysis. We compare and contrast our proposed methods against a sampling-based method, as well as a deep learning based benchmark that is currently considered the state-of-the-art in the literature. We show that our family of algorithms outperforms benchmarks on three historical time series datasets that include intra-day and end-of-day equity asset prices as well as the daily US treasury yield curve rates.
    Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area Under Laplacian Spectrum. (arXiv:2209.14743v1 [cs.CV])
    Dataset complexity assessment aims to predict classification performance on a dataset with complexity calculation before training a classifier, which can also be used for classifier selection and dataset reduction. The training process of deep convolutional neural networks (DCNNs) is iterative and time-consuming because of hyperparameter uncertainty and the domain shift introduced by different datasets. Hence, it is meaningful to predict classification performance by assessing the complexity of datasets effectively before training DCNN models. This paper proposes a novel method called cumulative maximum scaled Area Under Laplacian Spectrum (cmsAULS), which can achieve state-of-the-art complexity assessment performance on six datasets.
    On the Convergence of AdaGrad on $\R^{d}$: Beyond Convexity, Non-Asymptotic Rate and Acceleration. (arXiv:2209.14827v1 [cs.LG])
    Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.
    Offline Reinforcement Learning via High-Fidelity Generative Behavior Modeling. (arXiv:2209.14548v1 [cs.LG])
    In offline reinforcement learning, weighted regression is a common method to ensure the learned policy stays close to the behavior policy and to prevent selecting out-of-sample actions. In this work, we show that due to the limited distributional expressivity of policy models, previous methods might still select unseen actions during training, which deviates from their initial motivation. To address this problem, we adopt a generative approach by decoupling the learned policy into two parts: an expressive generative behavior model and an action evaluation model. The key insight is that such decoupling avoids learning an explicitly parameterized policy model with a closed-form expression. Directly learning the behavior policy allows us to leverage existing advances in generative modeling, such as diffusion-based methods, to model diverse behaviors. As for action evaluation, we combine our method with an in-sample planning technique to further avoid selecting out-of-sample actions and increase computational efficiency. Experimental results on D4RL datasets show that our proposed method achieves competitive or superior performance compared with state-of-the-art offline RL methods, especially in complex tasks such as AntMaze. We also empirically demonstrate that our method can successfully learn from a heterogeneous dataset containing multiple distinctive but similarly successful strategies, whereas previous unimodal policies fail.
    Analyzing Diffusion as Serial Reproduction. (arXiv:2209.14821v1 [cs.LG])
    Diffusion models are a class of generative models that learn to synthesize samples by inverting a diffusion process that gradually maps data into noise. While these models have enjoyed great success recently, a full theoretical understanding of their observed properties is still lacking, in particular, their weak sensitivity to the choice of noise family and the role of adequate scheduling of noise levels for good synthesis. By identifying a correspondence between diffusion models and a well-known paradigm in cognitive science known as serial reproduction, whereby human agents iteratively observe and reproduce stimuli from memory, we show how the aforementioned properties of diffusion models can be explained as a natural consequence of this correspondence. We then complement our theoretical analysis with simulations that exhibit these key features. Our work highlights how classic paradigms in cognitive science can shed light on state-of-the-art machine learning problems.
    Towards Equalised Odds as Fairness Metric in Academic Performance Prediction. (arXiv:2209.14670v1 [cs.LG])
    The literature for fairness-aware machine learning knows a plethora of different fairness notions. It is however wellknown, that it is impossible to satisfy all of them, as certain notions contradict each other. In this paper, we take a closer look at academic performance prediction (APP) systems and try to distil which fairness notions suit this task most. For this, we scan recent literature proposing guidelines as to which fairness notion to use and apply these guidelines onto APP. Our findings suggest equalised odds as most suitable notion for APP, based on APP's WYSIWYG worldview as well as potential long-term improvements for the population.
    Model Zoos: A Dataset of Diverse Populations of Neural Network Models. (arXiv:2209.14764v1 [cs.LG])
    In the last years, neural networks (NN) have evolved from laboratory environments to the state-of-the-art for many real-world problems. It was shown that NN models (i.e., their weights and biases) evolve on unique trajectories in weight space during training. Following, a population of such neural network models (referred to as model zoo) would form structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can reveal latent properties of individual models. With such model zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of NN weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of NNs. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of NN models for further research. In total the proposed model zoo dataset is based on eight image datasets, consists of 27 model zoos trained with varying hyperparameter combinations and includes 50'360 unique NN models as well as their sparsified twins, resulting in over 3'844'360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks. The dataset can be found at www.modelzoos.cc.
    Neural Networks Efficiently Learn Low-Dimensional Representations with SGD. (arXiv:2209.14863v1 [stat.ML])
    We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol{x}\in \mathbb{R}^d$ is Gaussian and the target $y \in \mathbb{R}$ follows a multiple-index model, i.e., $y=g(\langle\boldsymbol{u_1},\boldsymbol{x}\rangle,...,\langle\boldsymbol{u_k},\boldsymbol{x}\rangle)$ with a noisy link function $g$. We prove that the first-layer weights of the NN converge to the $k$-dimensional principal subspace spanned by the vectors $\boldsymbol{u_1},...,\boldsymbol{u_k}$ of the true model, when online SGD with weight decay is used for training. This phenomenon has several important consequences when $k \ll d$. First, by employing uniform convergence on this smaller subspace, we establish a generalization error bound of $\mathcal{O}(\sqrt{{kd}/{T}})$ after $T$ iterations of SGD, which is independent of the width of the NN. We further demonstrate that, SGD-trained ReLU NNs can learn a single-index target of the form $y=f(\langle\boldsymbol{u},\boldsymbol{x}\rangle) + \epsilon$ by recovering the principal direction, with a sample complexity linear in $d$ (up to log factors), where $f$ is a monotonic function with at most polynomial growth, and $\epsilon$ is the noise. This is in contrast to the known $d^{\Omega(p)}$ sample requirement to learn any degree $p$ polynomial in the kernel regime, and it shows that NNs trained with SGD can outperform the neural tangent kernel at initialization. Finally, we also provide compressibility guarantees for NNs using the approximate low-rank structure produced by SGD.
    Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights. (arXiv:2209.14733v1 [cs.LG])
    Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we extend hyper-representations for generative use to sample new model weights. We propose layer-wise loss normalization which we demonstrate is key to generate high-performing models and several sampling methods based on the topology of hyper-representations. The models generated using our methods are diverse, performant and capable to outperform strong baselines as evaluated on several downstream tasks: initialization, ensemble sampling and transfer learning. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations thereby paving the avenue for novel research directions.
    Access Control with Encrypted Feature Maps for Object Detection Models. (arXiv:2209.14831v1 [cs.CV])
    In this paper, we propose an access control method with a secret key for object detection models for the first time so that unauthorized users without a secret key cannot benefit from the performance of trained models. The method enables us not only to provide a high detection performance to authorized users but to also degrade the performance for unauthorized users. The use of transformed images was proposed for the access control of image classification models, but these images cannot be used for object detection models due to performance degradation. Accordingly, in this paper, selected feature maps are encrypted with a secret key for training and testing models, instead of input images. In an experiment, the protected models allowed authorized users to obtain almost the same performance as that of non-protected models but also with robustness against unauthorized access without a key.
    Facial Landmark Predictions with Applications to Metaverse. (arXiv:2209.14698v1 [cs.CV])
    This research aims to make metaverse characters more realistic by adding lip animations learnt from videos in the wild. To achieve this, our approach is to extend Tacotron 2 text-to-speech synthesizer to generate lip movements together with mel spectrogram in one pass. The encoder and gate layer weights are pre-trained on LJ Speech 1.1 data set while the decoder is retrained on 93 clips of TED talk videos extracted from LRS 3 data set. Our novel decoder predicts displacement in 20 lip landmark positions across time, using labels automatically extracted by OpenFace 2.0 landmark predictor. Training converged in 7 hours using less than 5 minutes of video. We conducted ablation study for Pre/Post-Net and pre-trained encoder weights to demonstrate the effectiveness of transfer learning between audio and visual speech data.
    Learning Gradient-based Mixup towards Flatter Minima for Domain Generalization. (arXiv:2209.14742v1 [cs.LG])
    To address the distribution shifts between training and test data, domain generalization (DG) leverages multiple source domains to learn a model that generalizes well to unseen domains. However, existing DG methods generally suffer from overfitting to the source domains, partly due to the limited coverage of the expected region in feature space. Motivated by this, we propose to perform mixup with data interpolation and extrapolation to cover the potential unseen regions. To prevent the detrimental effects of unconstrained extrapolation, we carefully design a policy to generate the instance weights, named Flatness-aware Gradient-based Mixup (FGMix). The policy employs a gradient-based similarity to assign greater weights to instances that carry more invariant information, and learns the similarity function towards flatter minima for better generalization. On the DomainBed benchmark, we validate the efficacy of various designs of FGMix and demonstrate its superiority over other DG algorithms.
    Non-contrastive approaches to similarity learning: positive examples are all you need. (arXiv:2209.14750v1 [cs.AI])
    The similarity learning problem in the oil \& gas industry aims to construct a model that estimates similarity between interval measurements for logging data. Previous attempts are mostly based on empirical rules, so our goal is to automate this process and exclude expensive and time-consuming expert labelling. One of the approaches for similarity learning is self-supervised learning (SSL). In contrast to the supervised paradigm, this one requires little or no labels for the data. Thus, we can learn such models even if the data labelling is absent or scarce. Nowadays, most SSL approaches are contrastive and non-contrastive. However, due to possible wrong labelling of positive and negative samples, contrastive methods don't scale well with the number of objects. Non-contrastive methods don't rely on negative samples. Such approaches are actively used in the computer vision. We introduce non-contrastive SSL for time series data. In particular, we build on top of BYOL and Barlow Twins methods that avoid using negative pairs and focus only on matching positive pairs. The crucial part of these methods is an augmentation strategy. Different augmentations of time series exist, while their effect on the performance can be both positive and negative. Our augmentation strategies and adaption for BYOL and Barlow Twins together allow us to achieve a higher quality (ARI $= 0.49$) than other self-supervised methods (ARI $= 0.34$ only), proving usefulness of the proposed non-contrastive self-supervised approach for the interval similarity problem and time series representation learning in general.
    An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation. (arXiv:2209.14627v1 [cs.CL])
    Open-domain dialogue systems aim to interact with humans through natural language texts in an open-ended fashion. However, the widely successful neural networks may not work well for dialogue systems, as they tend to generate generic responses. In this work, we propose an Equal-size Hard Expectation--Maximization (EqHard-EM) algorithm to train a multi-decoder model for diverse dialogue generation. Our algorithm assigns a sample to a decoder in a hard manner and additionally imposes an equal-assignment constraint to ensure that all decoders are well-trained. We provide detailed theoretical analysis to justify our approach. Further, experiments on two large-scale, open-domain dialogue datasets verify that our EqHard-EM algorithm generates high-quality diverse responses.
    Meta Knowledge Condensation for Federated Learning. (arXiv:2209.14851v1 [cs.LG])
    Existing federated learning paradigms usually extensively exchange distributed models at a central solver to achieve a more powerful model. However, this would incur severe communication burden between a server and multiple clients especially when data distributions are heterogeneous. As a result, current federated learning methods often require a large number of communication rounds in training. Unlike existing paradigms, we introduce an alternative perspective to significantly decrease the communication cost in federate learning. In this work, we first introduce a meta knowledge representation method that extracts meta knowledge from distributed clients. The extracted meta knowledge encodes essential information that can be used to improve the current model. As the training progresses, the contributions of training samples to a federated model also vary. Thus, we introduce a dynamic weight assignment mechanism that enables samples to contribute adaptively to the current model update. Then, informative meta knowledge from all active clients is sent to the server for model update. Training a model on the combined meta knowledge without exposing original data among different clients can significantly mitigate the heterogeneity issues. Moreover, to further ameliorate data heterogeneity, we also exchange meta knowledge among clients as conditional initialization for local meta knowledge extraction. Extensive experiments demonstrate the effectiveness and efficiency of our proposed method. Remarkably, our method outperforms the state-of-the-art by a large margin (from $74.07\%$ to $92.95\%$) on MNIST with a restricted communication budget (i.e. 10 rounds).
    Creative Painting with Latent Diffusion Models. (arXiv:2209.14697v1 [cs.CV])
    Artistic painting has achieved significant progress during recent years by applying hundreds of GAN variants. However, adversarial training has been reported to be notoriously unstable and can lead to mode collapse. Recently, diffusion models have achieved GAN-level sample quality without adversarial training. Using autoencoders to project the original images into compressed latent spaces and cross attention enhanced U-Net as the backbone of diffusion, latent diffusion models have achieved stable and high fertility image generation. In this paper, we focus on enhancing the creative painting ability of current latent diffusion models in two directions, textual condition extension and model retraining with Wikiart dataset. Through textual condition extension, users' input prompts are expanded in temporal and spacial directions for deeper understanding and explaining the prompts. Wikiart dataset contains 80K famous artworks drawn during recent 400 years by more than 1,000 famous artists in rich styles and genres. Through the retraining, we are able to ask these artists to draw novel and creative painting on modern topics.
    Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning. (arXiv:2209.14610v1 [cs.LG])
    Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in the selection of in-context examples.
    Masked Multi-Step Multivariate Time Series Forecasting with Future Information. (arXiv:2209.14413v1 [cs.LG])
    In this paper, we introduce Masked Multi-Step Multivariate Forecasting (MMMF), a novel and general self-supervised learning framework for time series forecasting with known future information. In many real-world forecasting scenarios, some future information is known, e.g., the weather information when making a short-to-mid-term electricity demand forecast, or the oil price forecasts when making an airplane departure forecast. Existing machine learning forecasting frameworks can be categorized into (1) sample-based approaches where each forecast is made independently, and (2) time series regression approaches where the future information is not fully incorporated. To overcome the limitations of existing approaches, we propose MMMF, a framework to train any neural network model capable of generating a sequence of outputs, that combines both the temporal information from the past and the known information about the future to make better predictions. Experiments are performed on two real-world datasets for (1) mid-term electricity demand forecasting, and (2) two-month ahead flight departures forecasting. They show that the proposed MMMF framework outperforms not only sample-based methods but also existing time series forecasting models with the exact same base models. Furthermore, once a neural network model is trained with MMMF, its inference speed is similar to that of the same model trained with traditional regression formulations, thus making MMMF a better alternative to existing regression-trained time series forecasting models if there is some available future information.
    Causal inference in drug discovery and development. (arXiv:2209.14664v1 [q-bio.QM])
    To discover new drugs is to seek and to prove causality. As an emerging approach leveraging human knowledge and creativity, data, and machine intelligence, causal inference holds the promise of reducing cognitive bias and improving decision making in drug discovery. While it has been applied across the value chain, the concepts and practice of causal inference remain obscure to many practitioners. This article offers a non-technical introduction to causal inference, reviews its recent applications, and discusses opportunities and challenges of adopting the causal language in drug discovery and development.
    Minimax Optimal Kernel Operator Learning via Multilevel Training. (arXiv:2209.14430v1 [cs.LG])
    Learning mappings between infinite-dimensional function spaces has achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbert spaces. We establish the information-theoretic lower bound in terms of the Sobolev Hilbert-Schmidt norm and show that a regularization that learns the spectral components below the bias contour and ignores the ones that are above the variance contour can achieve the optimal learning rate. At the same time, the spectral components between the bias and variance contours give us flexibility in designing computationally feasible machine learning algorithms. Based on this observation, we develop a multilevel kernel operator learning algorithm that is optimal when learning linear operators between infinite-dimensional function spaces.
    DiGress: Discrete Denoising diffusion for graph generation. (arXiv:2209.14734v1 [cs.LG])
    This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes. Our model defines a diffusion process that progressively edits a graph with noise (adding or removing edges, changing the categories), and a graph transformer network that learns to revert this process. With these two ingredients in place, we reduce distribution learning over graphs to a simple sequence of classification tasks. We further improve sample quality by proposing a new Markovian noise model that preserves the marginal distribution of node and edge types during diffusion, and by adding auxiliary graph-theoretic features derived from the noisy graph at each diffusion step. Finally, we propose a guidance procedure for conditioning the generation on graph-level features. Overall, DiGress achieves state-of-the-art performance on both molecular and non-molecular datasets, with up to 3x validity improvement on a dataset of planar graphs. In particular, it is the first model that scales to the large GuacaMol dataset containing 1.3M drug-like molecules without using a molecule-specific representation such as SMILES or fragments.
    Computational Complexity of Sub-linear Convergent Algorithms. (arXiv:2209.14558v1 [cs.LG])
    Optimizing machine learning algorithms that are used to solve the objective function has been of great interest. Several approaches to optimize common algorithms, such as gradient descent and stochastic gradient descent, were explored. One of these approaches is reducing the gradient variance through adaptive sampling to solve large-scale optimization's empirical risk minimization (ERM) problems. In this paper, we will explore how starting with a small sample and then geometrically increasing it and using the solution of the previous sample ERM to compute the new ERM. This will solve ERM problems with first-order optimization algorithms of sublinear convergence but with lower computational complexity. This paper starts with theoretical proof of the approach, followed by two experiments comparing the gradient descent with the adaptive sampling of the gradient descent and ADAM with adaptive sampling ADAM on different datasets.
    Increasing Model Generalizability for Unsupervised Domain Adaptation. (arXiv:2209.14644v1 [cs.LG])
    A dominant approach for addressing unsupervised domain adaptation is to map data points for the source and the target domains into an embedding space which is modeled as the output-space of a shared deep encoder. The encoder is trained to make the embedding space domain-agnostic to make a source-trained classifier generalizable on the target domain. A secondary mechanism to improve UDA performance further is to make the source domain distribution more compact to improve model generalizability. We demonstrate that increasing the interclass margins in the embedding space can help to develop a UDA algorithm with improved performance. We estimate the internally learned multi-modal distribution for the source domain, learned as a result of pretraining, and use it to increase the interclass class separation in the source domain to reduce the effect of domain shift. We demonstrate that using our approach leads to improved model generalizability on four standard benchmark UDA image classification datasets and compares favorably against exiting methods.
    NVRadarNet: Real-Time Radar Obstacle and Free Space Detection for Autonomous Driving. (arXiv:2209.14499v1 [cs.CV])
    Detecting obstacles is crucial for safe and efficient autonomous driving. To this end, we present NVRadarNet, a deep neural network (DNN) that detects dynamic obstacles and drivable free space using automotive RADAR sensors. The network utilizes temporally accumulated data from multiple RADAR sensors to detect dynamic obstacles and compute their orientation in a top-down bird's-eye view (BEV). The network also regresses drivable free space to detect unclassified obstacles. Our DNN is the first of its kind to utilize sparse RADAR signals in order to perform obstacle and free space detection in real time from RADAR data only. The network has been successfully used for perception on our autonomous vehicles in real self-driving scenarios. The network runs faster than real time on an embedded GPU and shows good generalization across geographic regions.
    Parameterized Quantum Circuits with Quantum Kernels for Machine Learning: A Hybrid Quantum-Classical Approach. (arXiv:2209.14449v1 [quant-ph])
    Quantum machine learning (QML) is the use of quantum computing for the computation of machine learning algorithms. With the prevalence and importance of classical data, a hybrid quantum-classical approach to QML is called for. Parameterized Quantum Circuits (PQCs), and particularly Quantum Kernel PQCs, are generally used in the hybrid approach to QML. In this paper we discuss some important aspects of PQCs with quantum kernels including PQCs, quantum kernels, quantum kernels with quantum advantage, and the trainability of quantum kernels. We conclude that quantum kernels with hybrid kernel methods, a.k.a. quantum kernel methods, offer distinct advantages as a hybrid approach to QML. Not only do they apply to Noisy Intermediate-Scale Quantum (NISQ) devices, but they also can be used to solve all types of machine learning problems including regression, classification, clustering, and dimension reduction. Furthermore, beyond quantum utility, quantum advantage can be attained if the quantum kernels, i.e., the quantum feature encodings, are classically intractable.
    Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation. (arXiv:2209.14566v1 [eess.IV])
    Vessel segmentation in medical images is one of the important tasks in the diagnosis of vascular diseases and therapy planning. Although learning-based segmentation approaches have been extensively studied, a large amount of ground-truth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address this, here we introduce a novel diffusion adversarial representation learning (DARL) model that leverages a denoising diffusion probabilistic model with adversarial learning, and apply it for vessel segmentation. In particular, for self-supervised vessel segmentation, DARL learns background image distribution using a diffusion module, which lets a generation module effectively provide vessel representations. Also, by adversarial learning based on the proposed switchable spatially-adaptive denormalization, our model estimates synthetic fake vessel images as well as vessel segmentation masks, which further makes the model capture vessel-relevant semantic information. Once the proposed model is trained, the model generates segmentation masks by one step and can be applied to general vascular structure segmentation of coronary angiography and retinal images. Experimental results on various datasets show that our method significantly outperforms existing unsupervised and self-supervised methods in vessel segmentation.
    Proportional Multicalibration. (arXiv:2209.14613v1 [cs.LG])
    Multicalibration is a desirable fairness criteria that constrains calibration error among flexibly-defined groups in the data while maintaining overall calibration. However, when outcome probabilities are correlated with group membership, multicalibrated models can exhibit a higher percent calibration error among groups with lower base rates than groups with higher base rates. As a result, it remains possible for a decision-maker to learn to trust or distrust model predictions for specific groups. To alleviate this, we propose proportional multicalibration, a criteria that constrains the percent calibration error among groups and within prediction bins. We prove that satisfying proportional multicalibration bounds a model's multicalibration as well its differential calibration, a stronger fairness criteria inspired by the fairness notion of sufficiency. We provide an efficient algorithm for post-processing risk prediction models for proportional multicalibration and evaluate it empirically. We conduct simulation studies and investigate a real-world application of PMC-postprocessing to prediction of emergency department patient admissions. We observe that proportional multicalibration is a promising criteria for controlling simultenous measures of calibration fairness of a model over intersectional groups with virtually no cost in terms of classification performance.
    Rectified Flow: A Marginal Preserving Approach to Optimal Transport. (arXiv:2209.14577v1 [stat.ML])
    We present a flow-based approach to the optimal transport (OT) problem between two continuous distributions $\pi_0,\pi_1$ on $\mathbb{R}^d$, of minimizing a transport cost $\mathbb{E}[c(X_1-X_0)]$ in the set of couplings $(X_0,X_1)$ whose marginal distributions on $X_0,X_1$ equals $\pi_0,\pi_1$, respectively, where $c$ is a cost function. Our method iteratively constructs a sequence of neural ordinary differentiable equations (ODE), each learned by solving a simple unconstrained regression problem, which monotonically reduce the transport cost while automatically preserving the marginal constraints. This yields a monotonic interior approach that traverses inside the set of valid couplings to decrease the transport cost, which distinguishes itself from most existing approaches that enforce the coupling constraints from the outside. The main idea of the method draws from rectified flow, a recent approach that simultaneously decreases the whole family of transport costs induced by convex functions $c$ (and is hence multi-objective in nature), but is not tailored to minimize a specific transport cost. Our method is a single-object variant of rectified flow that guarantees to solve the OT problem for a fixed, user-specified convex cost function $c$.
    Feature Selection via the Intervened Interpolative Decomposition and its Application in Diversifying Quantitative Strategies. (arXiv:2209.14532v1 [cs.LG])
    In this paper, we propose a probabilistic model for computing an interpolative decomposition (ID) in which each column of the observed matrix has its own priority or importance, so that the end result of the decomposition finds a set of features that are representative of the entire set of features, and the selected features also have higher priority than others. This approach is commonly used for low-rank approximation, feature selection, and extracting hidden patterns in data, where the matrix factors are latent variables associated with each data dimension. Gibbs sampling for Bayesian inference is applied to carry out the optimization. We evaluate the proposed models on real-world datasets, including ten Chinese A-share stocks, and demonstrate that the proposed Bayesian ID algorithm with intervention (IID) produces comparable reconstructive errors to existing Bayesian ID algorithms while selecting features with higher scores or priority.
    Denoising MCMC for Accelerating Diffusion-Based Generative Models. (arXiv:2209.14593v1 [cs.LG])
    Diffusion models are powerful generative models that simulate the reverse of diffusion processes using score functions to synthesize data from noise. The sampling process of diffusion models can be interpreted as solving the reverse stochastic differential equation (SDE) or the ordinary differential equation (ODE) of the diffusion process, which often requires up to thousands of discretization steps to generate a single image. This has sparked a great interest in developing efficient integration techniques for reverse-S/ODEs. Here, we propose an orthogonal approach to accelerating score-based sampling: Denoising MCMC (DMCMC). DMCMC first uses MCMC to produce samples in the product space of data and variance (or diffusion time). Then, a reverse-S/ODE integrator is used to denoise the MCMC samples. Since MCMC traverses close to the data manifold, the computation cost of producing a clean sample for DMCMC is much less than that of producing a clean sample from noise. To verify the proposed concept, we show that Denoising Langevin Gibbs (DLG), an instance of DMCMC, successfully accelerates all six reverse-S/ODE integrators considered in this work on the tasks of CIFAR10 and CelebA-HQ-256 image generation. Notably, combined with integrators of Karras et al. (2022) and pre-trained score models of Song et al. (2021b), DLG achieves SOTA results. In the limited number of score function evaluation (NFE) settings on CIFAR10, we have $3.86$ FID with $\approx 10$ NFE and $2.63$ FID with $\approx 20$ NFE. On CelebA-HQ-256, we have $6.99$ FID with $\approx 160$ NFE, which beats the current best record of Kim et al. (2022) among score-based models, $7.16$ FID with $4000$ NFE. Code: https://github.com/1202kbs/DMCMC
    Dataset Distillation using Parameter Pruning. (arXiv:2209.14609v1 [cs.CV])
    The acquisition of advanced models relies on large datasets in many fields, which makes storing datasets and training models expensive. As a solution, dataset distillation can synthesize a small dataset such that models trained on it achieve high performance on par with the original large dataset. The recently proposed dataset distillation method by matching network parameters has been proved effective for several datasets. However, a few parameters in the distillation process are difficult to match, which harms the distillation performance. Based on this observation, this paper proposes a new method to solve the problem using parameter pruning. The proposed method can synthesize more robust distilled datasets and improve the distillation performance by pruning difficult-to-match parameters in the distillation process. Experimental results on three datasets show that the proposed method outperformed other SOTA dataset distillation methods.
    Low-Stabilizer-Complexity Quantum States Are Not Pseudorandom. (arXiv:2209.14530v1 [quant-ph])
    We show that quantum states with "low stabilizer complexity" can be efficiently distinguished from Haar-random. Specifically, given an $n$-qubit pure state $|\psi\rangle$, we give an efficient algorithm that distinguishes whether $|\psi\rangle$ is (i) Haar-random or (ii) a state with stabilizer fidelity at least $\frac{1}{k}$ (i.e., has fidelity at least $\frac{1}{k}$ with some stabilizer state), promised that one of these is the case. With black-box access to $|\psi\rangle$, our algorithm uses $O\!\left( k^{12} \log(1/\delta)\right)$ copies of $|\psi\rangle$ and $O\!\left(n k^{12} \log(1/\delta)\right)$ time to succeed with probability at least $1-\delta$, and, with access to a state preparation unitary for $|\psi\rangle$ (and its inverse), $O\!\left( k^{3} \log(1/\delta)\right)$ queries and $O\!\left(n k^{3} \log(1/\delta)\right)$ time suffice. As a corollary, we prove that $\omega(\log(n))$ $T$-gates are necessary for any Clifford+$T$ circuit to prepare computationally pseudorandom quantum states, a first-of-its-kind lower bound.
    A Multi-Agent Framework for the Asynchronous and Collaborative Extension of Multitask ML Systems. (arXiv:2209.14745v1 [cs.LG])
    Tradition ML development methodology does not enable a large number of contributors, each with distinct objectives, to work collectively on the creation and extension of a shared intelligent system. Enabling such a collaborative methodology can accelerate the rate of innovation, increase ML technologies accessibility and enable the emergence of novel capabilities. We believe that this can be achieved through the definition of abstraction boundaries and a modularized representation of ML models and methods. We present a multi-agent framework for collaborative and asynchronous extension of dynamic large-scale multitask intelligent systems.
    Neural Methods for Logical Reasoning Over Knowledge Graphs. (arXiv:2209.14464v1 [cs.AI])
    Reasoning is a fundamental problem for computers and deeply studied in Artificial Intelligence. In this paper, we specifically focus on answering multi-hop logical queries on Knowledge Graphs (KGs). This is a complicated task because, in real-world scenarios, the graphs tend to be large and incomplete. Most previous works have been unable to create models that accept full First-Order Logical (FOL) queries, which include negative queries, and have only been able to process a limited set of query structures. Additionally, most methods present logic operators that can only perform the logical operation they are made for. We introduce a set of models that use Neural Networks to create one-point vector embeddings to answer the queries. The versatility of neural networks allows the framework to handle FOL queries with Conjunction ($\wedge$), Disjunction ($\vee$) and Negation ($\neg$) operators. We demonstrate experimentally the performance of our model through extensive experimentation on well-known benchmarking datasets. Besides having more versatile operators, the models achieve a 10\% relative increase over the best performing state of the art and more than 30\% over the original method based on single-point vector embeddings.
    Generalized Kernel Regularized Least Squares. (arXiv:2209.14355v1 [stat.ML])
    Kernel Regularized Least Squares (KRLS) is a popular method for flexibly estimating models that may have complex relationships between variables. However, its usefulness to many researchers is limited for two reasons. First, existing approaches are inflexible and do not allow KRLS to be combined with theoretically-motivated extensions such as fixed effects or non-linear outcomes. Second, estimation is extremely computationally intensive for even modestly sized datasets. Our paper addresses both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be re-formulated as a hierarchical model thereby allowing easy inference and modular model construction. Computationally, we also implement random sketching to dramatically accelerate estimation while incurring a limited penalty in estimation quality. We demonstrate that gKRLS can be fit on datasets with tens of thousands of observations in under one minute. Further, state-of-the-art techniques that require fitting the model over a dozen times (e.g. meta-learners) can be estimated quickly.
    Dynamic Surrogate Switching: Sample-Efficient Search for Factorization Machine Configurations in Online Recommendations. (arXiv:2209.14598v1 [cs.LG])
    Hyperparameter optimization is the process of identifying the appropriate hyperparameter configuration of a given machine learning model with regard to a given learning task. For smaller data sets, an exhaustive search is possible; However, when the data size and model complexity increase, the number of configuration evaluations becomes the main computational bottleneck. A promising paradigm for tackling this type of problem is surrogate-based optimization. The main idea underlying this paradigm considers an incrementally updated model of the relation between the hyperparameter space and the output (target) space; the data for this model are obtained by evaluating the main learning engine, which is, for example, a factorization machine-based model. By learning to approximate the hyperparameter-target relation, the surrogate (machine learning) model can be used to score large amounts of hyperparameter configurations, exploring parts of the configuration space beyond the reach of direct machine learning engine evaluation. Commonly, a surrogate is selected prior to optimization initialization and remains the same during the search. We investigated whether dynamic switching of surrogates during the optimization itself is a sensible idea of practical relevance for selecting the most appropriate factorization machine-based models for large-scale online recommendation. We conducted benchmarks on data sets containing hundreds of millions of instances against established baselines such as Random Forest- and Gaussian process-based surrogates. The results indicate that surrogate switching can offer good performance while considering fewer learning engine evaluations.
    Breaking Time Invariance: Assorted-Time Normalization for RNNs. (arXiv:2209.14439v1 [cs.LG])
    Methods such as Layer Normalization (LN) and Batch Normalization (BN) have proven to be effective in improving the training of Recurrent Neural Networks (RNNs). However, existing methods normalize using only the instantaneous information at one particular time step, and the result of the normalization is a preactivation state with a time-independent distribution. This implementation fails to account for certain temporal differences inherent in the inputs and the architecture of RNNs. Since these networks share weights across time steps, it may also be desirable to account for the connections between time steps in the normalization scheme. In this paper, we propose a normalization method called Assorted-Time Normalization (ATN), which preserves information from multiple consecutive time steps and normalizes using them. This setup allows us to introduce longer time dependencies into the traditional normalization methods without introducing any new trainable parameters. We present theoretical derivations for the gradient propagation and prove the weight scaling invariance property. Our experiments applying ATN to LN demonstrate consistent improvement on various tasks, such as Adding, Copying, and Denoise Problems and Language Modeling Problems.
    Semantics-Guided Object Removal for Facial Images: with Broad Applicability and Robust Style Preservation. (arXiv:2209.14479v1 [cs.CV])
    Object removal and image inpainting in facial images is a task in which objects that occlude a facial image are specifically targeted, removed, and replaced by a properly reconstructed facial image. Two different approaches utilizing U-net and modulated generator respectively have been widely endorsed for this task for their unique advantages but notwithstanding each method's innate disadvantages. U-net, a conventional approach for conditional GANs, retains fine details of unmasked regions but the style of the reconstructed image is inconsistent with the rest of the original image and only works robustly when the size of the occluding object is small enough. In contrast, the modulated generative approach can deal with a larger occluded area in an image and provides {a} more consistent style, yet it usually misses out on most of the detailed features. This trade-off between these two models necessitates an invention of a model that can be applied to any size of mask while maintaining a consistent style and preserving minute details of facial features. Here, we propose Semantics-Guided Inpainting Network (SGIN) which itself is a modification of the modulated generator, aiming to take advantage of its advanced generative capability and preserve the high-fidelity details of the original image. By using the guidance of a semantic map, our model is capable of manipulating facial features which grants direction to the one-to-many problem for further practicability.
    How Does Value Distribution in Distributional Reinforcement Learning Help Optimization?. (arXiv:2209.14513v1 [cs.LG])
    We consider the problem of learning a set of probability distributions from the Bellman dynamics in distributional reinforcement learning~(RL) that learns the whole return distribution compared with only its expectation in classical RL. Despite its success to obtain superior performance, we still have a poor understanding of how the value distribution in distributional RL works. In this study, we analyze the optimization benefits of distributional RL by leverage of additional value distribution information over classical RL in the Neural Fitted Z-Iteration~(Neural FZI) framework. To begin with, we demonstrate that the distribution loss of distributional RL has desirable smoothness characteristics and hence enjoys stable gradients, which is in line with its tendency to promote optimization stability. Furthermore, the acceleration effect of distributional RL is revealed by decomposing the return distribution. It turns out that distributional RL can perform favorably if the value distribution approximation is appropriate, measured by the variance of gradient estimates in each environment for any specific distributional RL algorithm. Rigorous experiments validate the stable optimization behaviors of distributional RL, contributing to its acceleration effects compared to classical RL. The findings of our research illuminate how the value distribution in distributional RL algorithms helps the optimization.
    Efficient Approximation of Gromov-Wasserstein Distance using Importance Sparsification. (arXiv:2205.13573v2 [cs.LG] UPDATED)
    As a valid metric of metric-measure spaces, Gromov-Wasserstein (GW) distance has shown the potential for matching problems of structured data like point clouds and graphs. However, its application in practice is limited due to its high computational complexity. To overcome this challenge, we propose a novel importance sparsification method, called Spar-GW, to approximate GW distance efficiently. In particular, instead of considering a dense coupling matrix, our method leverages a simple but effective sampling strategy to construct a sparse coupling matrix and update it with few computations. We demonstrate that the proposed Spar-GW method is applicable to the GW distance with arbitrary ground cost, and it reduces the complexity from $\mathcal{O}(n^4)$ to $\mathcal{O}(n^{2+\delta})$ for an arbitrary small $\delta>0$. In addition, this method can be extended to approximate the variants of GW distance, including the entropic GW distance, the fused GW distance, and the unbalanced GW distance. Experiments show the superiority of our Spar-GW to state-of-the-art methods in both synthetic and real-world tasks.
    Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning. (arXiv:2209.14344v1 [cs.LG])
    Equilibrium selection in multi-agent games refers to the problem of selecting a Pareto-optimal equilibrium. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address suboptimal equilibrium selection, we propose Pareto-AC (PAC), an actor-critic algorithm that utilises a simple principle of no-conflict games (a superset of cooperative games with identical rewards): each agent can assume the others will choose actions that will lead to a Pareto-optimal equilibrium. We evaluate PAC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to alternative MARL algorithms, as well as successfully converging to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose a graph neural network extension which is shown to efficiently scale in games with up to 15 agents.
    GeONet: a neural operator for learning the Wasserstein geodesic. (arXiv:2209.14440v1 [cs.LG])
    Optimal transport (OT) offers a versatile framework to compare complex data distributions in a geometrically meaningful way. Traditional methods for computing the Wasserstein distance and geodesic between probability measures require mesh-dependent domain discretization and suffer from the curse-of-dimensionality. We present GeONet, a mesh-invariant deep neural operator network that learns the non-linear mapping from the input pair of initial and terminal distributions to the Wasserstein geodesic connecting the two endpoint distributions. In the offline training stage, GeONet learns the saddle point optimality conditions for the dynamic formulation of the OT problem in the primal and dual spaces that are characterized by a coupled PDE system. The subsequent inference stage is instantaneous and can be deployed for real-time predictions in the online learning setting. We demonstrate that GeONet achieves comparable testing accuracy to the standard OT solvers on a simulation example and the CIFAR-10 dataset with considerably reduced inference-stage computational cost by orders of magnitude.
    Label driven Knowledge Distillation for Federated Learning with non-IID Data. (arXiv:2209.14520v1 [cs.LG])
    In real-world applications, Federated Learning (FL) meets two challenges: (1) scalability, especially when applied to massive IoT networks; and (2) how to be robust against an environment with heterogeneous data. Realizing the first problem, we aim to design a novel FL framework named Full-stack FL (F2L). More specifically, F2L utilizes a hierarchical network architecture, making extending the FL network accessible without reconstructing the whole network system. Moreover, leveraging the advantages of hierarchical network design, we propose a new label-driven knowledge distillation (LKD) technique at the global server to address the second problem. As opposed to current knowledge distillation techniques, LKD is capable of training a student model, which consists of good knowledge from all teachers' models. Therefore, our proposed algorithm can effectively extract the knowledge of the regions' data distribution (i.e., the regional aggregated models) to reduce the divergence between clients' models when operating under the FL system with non-independent identically distributed data. Extensive experiment results reveal that: (i) our F2L method can significantly improve the overall FL efficiency in all global distillations, and (ii) F2L rapidly achieves convergence as global distillation stages occur instead of increasing on each communication cycle.
    A Secure Federated Learning Framework for Residential Short Term Load Forecasting. (arXiv:2209.14547v1 [cs.CR])
    Smart meter measurements, though critical for accurate demand forecasting, face several drawbacks including consumers' privacy, data breach issues, to name a few. Recent literature has explored Federated Learning (FL) as a promising privacy-preserving machine learning alternative which enables collaborative learning of a model without exposing private raw data for short term load forecasting. Despite its virtue, standard FL is still vulnerable to an intractable cyber threat known as Byzantine attack carried out by faulty and/or malicious clients. Therefore, to improve the robustness of federated short-term load forecasting against Byzantine threats, we develop a state-of-the-art differentially private secured FL-based framework that ensures the privacy of the individual smart meter's data while protect the security of FL models and architecture. Our proposed framework leverages the idea of gradient quantization through the Sign Stochastic Gradient Descent (SignSGD) algorithm, where the clients only transmit the `sign' of the gradient to the control centre after local model training. As we highlight through our experiments involving benchmark neural networks with a set of Byzantine attack models, our proposed approach mitigates such threats quite effectively and thus outperforms conventional Fed-SGD models.
    Text Independent Speaker Identification System for Access Control. (arXiv:2209.14335v1 [eess.AS])
    Even human intelligence system fails to offer 100% accuracy in identifying speeches from a specific individual. Machine intelligence is trying to mimic humans in speaker identification problems through various approaches to speech feature extraction and speech modeling techniques. This paper presents a text-independent speaker identification system that employs Mel Frequency Cepstral Coefficients (MFCC) for feature extraction and k-Nearest Neighbor (kNN) for classification. The maximum cross-validation accuracy obtained was 60%. This will be improved upon in subsequent research.
    Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees. (arXiv:2209.14414v1 [stat.ML])
    We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after interacting with the environment for $T$ episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of posterior sampling that only needs a number of posterior samples logarithmic in $H$, $S$, $A$, and $T$ per state-action pair. For OPSRL we guarantee a high-probability regret bound of order at most $\widetilde{\mathcal{O}}(\sqrt{H^3SAT})$ ignoring $\text{poly}\log(HSAT)$ terms. The key novel technical ingredient is a new sharp anti-concentration inequality for linear forms which may be of independent interest. Specifically, we extend the normal approximation-based lower bound for Beta distributions by Alfers and Dinges [1984] to Dirichlet distributions. Our bound matches the lower bound of order $\Omega(\sqrt{H^3SAT})$, thereby answering the open problems raised by Agrawal and Jia [2017b] for the episodic setting.
    Variational Bayes for robust radar single object tracking. (arXiv:2209.14397v1 [eess.SP])
    We address object tracking by radar and the robustness of the current state-of-the-art methods to process outliers. The standard tracking algorithms extract detections from radar image space to use it in the filtering stage. Filtering is performed by a Kalman filter, which assumes Gaussian distributed noise. However, this assumption does not account for large modeling errors and results in poor tracking performance during abrupt motions. We take the Gaussian Sum Filter (single-object variant of the Multi Hypothesis Tracker) as our baseline and propose a modification by modelling process noise with a distribution that has heavier tails than a Gaussian. Variational Bayes provides a fast, computationally cheap inference algorithm. Our simulations show that - in the presence of process outliers - the robust tracker outperforms the Gaussian Sum filter when tracking single objects.
    Feature Decoupling in Self-supervised Representation Learning for Open Set Recognition. (arXiv:2209.14385v1 [cs.CV])
    Assuming unknown classes could be present during classification, the open set recognition (OSR) task aims to classify an instance into a known class or reject it as unknown. In this paper, we use a two-stage training strategy for the OSR problems. In the first stage, we introduce a self-supervised feature decoupling method that finds the content features of the input samples from the known classes. Specifically, our feature decoupling approach learns a representation that can be split into content features and transformation features. In the second stage, we fine-tune the content features with the class labels. The fine-tuned content features are then used for the OSR problems. Moreover, we consider an unsupervised OSR scenario, where we cluster the content features learned from the first stage. To measure representation quality, we introduce intra-inter ratio (IIR). Our experimental results indicate that our proposed self-supervised approach outperforms others in image and malware OSR problems. Also, our analyses indicate that IIR is correlated with OSR performance.
    Using Multivariate Linear Regression for Biochemical Oxygen Demand Prediction in Waste Water. (arXiv:2209.14297v1 [q-bio.OT])
    There exist opportunities for Multivariate Linear Regression (MLR) in the prediction of Biochemical Oxygen Demand (BOD) in waste water, using the diverse water quality parameters as the input variables. The goal of this work is to examine the capability of MLR in prediction of BOD in waste water through four input variables: Dissolved Oxygen (DO), Nitrogen, Fecal Coliform and Total Coliform. The four input variables have higher correlation strength to BOD out of the seven parameters examined for the strength of correlation. Machine Learning (ML) was done with both 80% and 90% of the data as the training set and 20% and 10% as the test set respectively. MLR performance was evaluated through the coefficient of correlation (r), Root Mean Square Error (RMSE) and the percentage accuracy in prediction of BOD. The performance indices for the input variables of Dissolved Oxygen, Nitrogen, Fecal Coliform and Total Coliform in prediction of BOD are: RMSE=6.77mg/L, r=0.60 and accuracy 70.3% for training dataset of 80% and RMSE=6.74mg/L, r=0.60 and accuracy of 87.5% for training set of 90% of the dataset. It was found that increasing the percentage of the training set above 80% of the dataset improved the accuracy of the model only but did not have a significant impact on the prediction capacity of the model. The results showed that MLR model could be successfully employed in the estimation of BOD in waste water using appropriately selected input parameters.
    Intrinsic Dimensionality Estimation within Tight Localities: A Theoretical and Experimental Analysis. (arXiv:2209.14475v1 [cs.LG])
    Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since their convergence generally requires sample sizes (that is, neighborhood sizes) on the order of hundreds of points, existing ID estimation methods may have only limited usefulness for applications in which the data consists of many natural groups of small size. In this paper, we propose a local ID estimation strategy stable even for `tight' localities consisting of as few as 20 sample points. The estimator applies MLE techniques over all available pairwise distances among the members of the sample, based on a recent extreme-value-theoretic model of intrinsic dimensionality, the Local Intrinsic Dimension (LID). Our experimental results show that our proposed estimation technique can achieve notably smaller variance, while maintaining comparable levels of bias, at much smaller sample sizes than state-of-the-art estimators.
    Bidirectional Language Models Are Also Few-shot Learners. (arXiv:2209.14500v1 [cs.LG])
    Large language models such as GPT-3 (Brown et al., 2020) can perform arbitrary tasks without undergoing fine-tuning after being prompted with only a few labeled examples. An arbitrary task can be reformulated as a natural language prompt, and a language model can be asked to generate the completion, indirectly performing the task in a paradigm known as prompt-based learning. To date, emergent prompt-based learning capabilities have mainly been demonstrated for unidirectional language models. However, bidirectional language models pre-trained on denoising objectives such as masked language modeling produce stronger learned representations for transfer learning. This motivates the possibility of prompting bidirectional models, but their pre-training objectives have made them largely incompatible with the existing prompting paradigm. We present SAP (Sequential Autoregressive Prompting), a technique that enables the prompting of bidirectional models. Utilizing the machine translation task as a case study, we prompt the bidirectional mT5 model (Xue et al., 2021) with SAP and demonstrate its few-shot and zero-shot translations outperform the few-shot translations of unidirectional models like GPT-3 and XGLM (Lin et al., 2021), despite mT5's approximately 50% fewer parameters. We further show SAP is effective on question answering and summarization. For the first time, our results demonstrate prompt-based learning is an emergent property of a broader class of language models, rather than only unidirectional models.
    How Powerful is Implicit Denoising in Graph Neural Networks. (arXiv:2209.14514v1 [cs.LG])
    Graph Neural Networks (GNNs), which aggregate features from neighbors, are widely used for graph-structured data processing due to their powerful representation learning capabilities. It is generally believed that GNNs can implicitly remove the non-predictive noises. However, the analysis of implicit denoising effect in graph neural networks remains open. In this work, we conduct a comprehensive theoretical study and analyze when and why the implicit denoising happens in GNNs. Specifically, we study the convergence properties of noise matrix. Our theoretical analysis suggests that the implicit denoising largely depends on the connectivity, the graph size, and GNN architectures. Moreover, we formally define and propose the adversarial graph signal denoising (AGSD) problem by extending graph signal denoising problem. By solving such a problem, we derive a robust graph convolution, where the smoothness of the node representations and the implicit denoising effect can be enhanced. Extensive empirical evaluations verify our theoretical analyses and the effectiveness of our proposed model.
    Hierarchical Training of Deep Ensemble Policies for Reinforcement Learning in Continuous Spaces. (arXiv:2209.14488v1 [cs.LG])
    Many actor-critic deep reinforcement learning (DRL) algorithms have achieved cutting-edge performance in tackling various challenging reinforcement learning (RL) problems, including complex control tasks with high-dimensional continuous state and action spaces. Despite of widely reported success, existing DRL algorithms often suffer from the ineffective exploration issue, resulting in limited learning stability and performance. To address this limitation, several ensemble DRL algorithms have been proposed recently to boost exploration and stabilize the learning process. However, many existing ensemble algorithms are designed to train each base learner individually without controlling explicitly the collaboration among the trained base learners. In this paper, we propose a new technique to train an ensemble of base learners based on the multi-step integration methods. The new multi-step training technique enables us to develop a new hierarchical training algorithm for ensemble DRL that promotes inter-learner collaboration through explicit inter-learner parameter sharing. The design of our new algorithm is verified theoretically. The algorithm is also shown empirically to outperform several cutting-edge DRL algorithms on multiple benchmark RL problems.
    DiffuseMorph: Unsupervised Deformable Image Registration Using Diffusion Model. (arXiv:2112.05149v2 [eess.IV] UPDATED)
    Deformable image registration is one of the fundamental tasks in medical imaging. Classical registration algorithms usually require a high computational cost for iterative optimizations. Although deep-learning-based methods have been developed for fast image registration, it is still challenging to obtain realistic continuous deformations from a moving image to a fixed image with less topological folding problem. To address this, here we present a novel diffusion-model-based image registration method, called DiffuseMorph. DiffuseMorph not only generates synthetic deformed images through reverse diffusion but also allows image registration by deformation fields. Specifically, the deformation fields are generated by the conditional score function of the deformation between the moving and fixed images, so that the registration can be performed from continuous deformation by simply scaling the latent feature of the score. Experimental results on 2D facial and 3D medical image registration tasks demonstrate that our method provides flexible deformations with topology preservation capability.
    Downstream Datasets Make Surprisingly Good Pretraining Corpora. (arXiv:2209.14389v1 [cs.CL])
    For most natural language processing tasks, the dominant practice is to finetune large pretrained transformer models (e.g., BERT) using smaller downstream datasets. Despite the success of this approach, it remains unclear to what extent these gains are attributable to the massive background corpora employed for pretraining versus to the pretraining objectives themselves. This paper introduces a large-scale study of self-pretraining, where the same (downstream) training data is used for both pretraining and finetuning. In experiments addressing both ELECTRA and RoBERTa models and 10 distinct downstream datasets, we observe that self-pretraining rivals standard pretraining on the BookWiki corpus (despite using around $10\times$--$500\times$ less data), outperforming the latter on $7$ and $5$ datasets, respectively. Surprisingly, these task-specific pretrained models often perform well on other tasks, including the GLUE benchmark. Our results suggest that in many scenarios, performance gains attributable to pretraining are driven primarily by the pretraining objective itself and are not always attributable to the incorporation of massive datasets. These findings are especially relevant in light of concerns about intellectual property and offensive content in web-scale pretraining data.
    Improving alignment of dialogue agents via targeted human judgements. (arXiv:2209.14375v1 [cs.LG])
    We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into natural language rules the agent should follow, and ask raters about each rule separately. We demonstrate that this breakdown enables us to collect more targeted human judgements of agent behaviour and allows for more efficient rule-conditional reward models. Second, our agent provides evidence from sources supporting factual claims when collecting preference judgements over model statements. For factual questions, evidence provided by Sparrow supports the sampled response 78% of the time. Sparrow is preferred more often than baselines while being more resilient to adversarial probing by humans, violating our rules only 8% of the time when probed. Finally, we conduct extensive analyses showing that though our model learns to follow our rules it can exhibit distributional biases.
    RADACS: Towards Higher-Order Reasoning using Action Recognition in Autonomous Vehicles. (arXiv:2209.14408v1 [cs.CV])
    When applied to autonomous vehicle settings, action recognition can help enrich an environment model's understanding of the world and improve plans for future action. Towards these improvements in autonomous vehicle decision-making, we propose in this work a novel two-stage online action recognition system, termed RADACS. RADACS formulates the problem of active agent detection and adapts ideas about actor-context relations from human activity recognition in a straightforward two-stage pipeline for action detection and classification. We show that our proposed scheme can outperform the baseline on the ICCV2021 Road Challenge dataset and by deploying it on a real vehicle platform, we demonstrate how a higher-order understanding of agent actions in an environment can improve decisions on a real autonomous vehicle.
    Scalably learning quantum many-body Hamiltonians from dynamical data. (arXiv:2209.14328v1 [quant-ph])
    The physics of a closed quantum mechanical system is governed by its Hamiltonian. However, in most practical situations, this Hamiltonian is not precisely known, and ultimately all there is are data obtained from measurements on the system. In this work, we introduce a highly scalable, data-driven approach to learning families of interacting many-body Hamiltonians from dynamical data, by bringing together techniques from gradient-based optimization from machine learning with efficient quantum state representations in terms of tensor networks. Our approach is highly practical, experimentally friendly, and intrinsically scalable to allow for system sizes of above 100 spins. In particular, we demonstrate on synthetic data that the algorithm works even if one is restricted to one simple initial state, a small number of single-qubit observables, and time evolution up to relatively short times. For the concrete example of the one-dimensional Heisenberg model our algorithm exhibits an error constant in the system size and scaling as the inverse square root of the size of the data set.
    Fast Nonlinear Vector Quantile Regression. (arXiv:2205.14977v2 [stat.CO] UPDATED)
    Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable $\mathrm{Y}$ given explanatory features $\boldsymbol{\mathrm{X}}$. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. Recently, vector quantile regression (VQR) was proposed as an extension of QR for vector-valued target variables, thanks to a meaningful generalization of the notion of quantiles to multivariate distributions via optimal transport. Despite its elegance, VQR is arguably not applicable in practice due to several limitations: (i) it assumes a linear model for the quantiles of the target $\boldsymbol{\mathrm{Y}}$ given the features $\boldsymbol{\mathrm{X}}$; (ii) its exact formulation is intractable even for modestly-sized problems in terms of target dimensions, number of regressed quantile levels, or number of features, and its relaxed dual formulation may violate the monotonicity of the estimated quantiles; (iii) no fast or scalable solvers for VQR currently exist. In this work we fully address these limitations, namely: (i) We extend VQR to the non-linear case, showing substantial improvement over linear VQR; (ii) We propose {vector monotone rearrangement}, a method which ensures the quantile functions estimated by VQR are monotone functions; (iii) We provide fast, GPU-accelerated solvers for linear and nonlinear VQR which maintain a fixed memory footprint, and demonstrate that they scale to millions of samples and thousands of quantile levels; (iv) We release an optimized python package of our solvers as to widespread the use of VQR in real-world applications.
    Active Learning in Bayesian Neural Networks with Balanced Entropy Learning Principle. (arXiv:2105.14559v2 [cs.LG] UPDATED)
    Acquiring labeled data is challenging in many machine learning applications with limited budgets. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The info-max learning principle maximizing mutual information such as BALD has been successful and widely adapted in various active learning applications. However, this pool-based specific objective inherently introduces a redundant selection and further requires a high computational cost for batch selection. In this paper, we design and propose a new uncertainty measure, Balanced Entropy Acquisition (BalEntAcq), which captures the information balance between the uncertainty of underlying softmax probability and the label variable. To do this, we approximate each marginal distribution by Beta distribution. Beta approximation enables us to formulate BalEntAcq as a ratio between an augmented entropy and the marginalized joint entropy. The closed-form expression of BalEntAcq facilitates parallelization by estimating two parameters in each marginal Beta distribution. BalEntAcq is a purely standalone measure without requiring any relational computations with other data points. Nevertheless, BalEntAcq captures a well-diversified selection near the decision boundary with a margin, unlike other existing uncertainty measures such as BALD, Entropy, or Mean Standard Deviation (MeanSD). Finally, we demonstrate that our balanced entropy learning principle with BalEntAcq consistently outperforms well-known linearly scalable active learning methods, including a recently proposed PowerBALD, a simple but diversified version of BALD, by showing experimental results obtained from MNIST, CIFAR-100, SVHN, and TinyImageNet datasets.
    Biological connectomes as a representation for the architecture of artificial neural networks. (arXiv:2209.14406v1 [cs.NE])
    Grand efforts in neuroscience are working toward mapping the connectomes of many new species, including the near completion of the Drosophila melanogaster. It is important to ask whether these models could benefit artificial intelligence. In this work we ask two fundamental questions: (1) where and when biological connectomes can provide use in machine learning, (2) which design principles are necessary for extracting a good representation of the connectome. Toward this end, we translate the motor circuit of the C. Elegans nematode into artificial neural networks at varying levels of biophysical realism and evaluate the outcome of training these networks on motor and non-motor behavioral tasks. We demonstrate that biophysical realism need not be upheld to attain the advantages of using biological circuits. We also establish that, even if the exact wiring diagram is not retained, the architectural statistics provide a valuable prior. Finally, we show that while the C. Elegans locomotion circuit provides a powerful inductive bias on locomotion problems, its structure may hinder performance on tasks unrelated to locomotion such as visual classification problems.
    Signed Network Embedding with Application to Simultaneous Detection of Communities and Anomalies. (arXiv:2207.09324v2 [cs.SI] UPDATED)
    Signed networks are frequently observed in real life with additional sign information associated with each edge, yet such information has been largely ignored in existing network models. This paper develops a unified embedding model for signed networks to disentangle the intertwined balance structure and anomaly effect, which can greatly facilitate the downstream analysis, including community detection, anomaly detection, and network inference. The proposed model captures both balance structure and anomaly effect through a low rank plus sparse matrix decomposition, which are jointly estimated via a regularized formulation. Its theoretical guarantees are established in terms of asymptotic consistency and finite-sample probability bounds for network embedding, community detection and anomaly detection. The advantage of the proposed embedding model is also demonstrated through extensive numerical experiments on both synthetic networks and an international relation network.
    Patients' Severity States Classification based on Electronic Health Record (EHR) Data using Multiple Machine Learning and Deep Learning Approaches. (arXiv:2209.14907v1 [cs.LG])
    This research presents an examination of categorizing the severity states of patients based on their electronic health records during a certain time range using multiple machine learning and deep learning approaches. The suggested method uses an EHR dataset collected from an open-source platform to categorize severity. Some tools were used in this research, such as openRefine was used to pre-process, RapidMiner was used for implementing three algorithms (Fast Large Margin, Generalized Linear Model, Multi-layer Feed-forward Neural Network) and Tableau was used to visualize the data, for implementation of algorithms we used Google Colab. Here we implemented several supervised and unsupervised algorithms along with semi-supervised and deep learning algorithms. The experimental results reveal that hyperparameter-tuned Random Forest outperformed all the other supervised machine learning algorithms with 76% accuracy as well as Generalized Linear algorithm achieved the highest precision score 78%, whereas the hyperparameter-tuned Hierarchical Clustering with 86% precision score and Gaussian Mixture Model with 61% accuracy outperformed other unsupervised approaches. Dimensionality Reduction improved results a lot for most unsupervised techniques. For implementing Deep Learning we employed a feed-forward neural network (multi-layer) and the Fast Large Margin approach for semi-supervised learning. The Fast Large Margin performed really well with a recall score of 84% and an F1 score of 78%. Finally, the Multi-layer Feed-forward Neural Network performed admirably with 75% accuracy, 75% precision, 87% recall, 81% F1 score.
    Predicting hot-electron free energies from ground-state data. (arXiv:2205.05591v2 [cond-mat.mtrl-sci] UPDATED)
    Machine-learning potentials are usually trained on the ground-state, Born-Oppenheimer energy surface, which depends exclusively on the atomic positions and not on the simulation temperature. This disregards the effect of thermally-excited electrons, that is important in metals, and essential to the description of warm dense matter. An accurate physical description of these effects requires that the nuclei move on a temperature-dependent electronic free energy. We propose a method to obtain machine-learning predictions of this free energy at an arbitrary electron temperature using exclusively training data from ground-state calculations, avoiding the need to train temperature-dependent potentials, and benchmark it on metallic liquid hydrogen at the conditions of the core of gas giants and brown dwarfs. This work demonstrates the advantages of hybrid schemes that use physical consideration to combine machine-learning predictions, providing a blueprint for the development of similar approaches that extend the reach of atomistic modelling by removing the barrier between physics and data-driven methodologies.
    LaplaceNet: A Hybrid Graph-Energy Neural Network for Deep Semi-Supervised Classification. (arXiv:2106.04527v3 [cs.LG] UPDATED)
    Semi-supervised learning has received a lot of recent attention as it alleviates the need for large amounts of labelled data which can often be expensive, requires expert knowledge and be time consuming to collect. Recent developments in deep semi-supervised classification have reached unprecedented performance and the gap between supervised and semi-supervised learning is ever-decreasing. This improvement in performance has been based on the inclusion of numerous technical tricks, strong augmentation techniques and costly optimisation schemes with multi-term loss functions. We propose a new framework, LaplaceNet, for deep semi-supervised classification that has a greatly reduced model complexity. We utilise a hybrid approach where pseudolabels are produced by minimising the Laplacian energy on a graph. These pseudo-labels are then used to iteratively train a neural-network backbone. Our model outperforms state-of-the art methods for deep semi-supervised classification, over several benchmark datasets. Furthermore, we consider the application of strong-augmentations to neural networks theoretically and justify the use of a multi-sampling approach for semi-supervised learning. We demonstrate, through rigorous experimentation, that a multi-sampling augmentation approach improves generalisation and reduces the sensitivity of the network to augmentation.
    Trading off Quality for Efficiency of Community Detection: An Inductive Method across Graphs. (arXiv:2209.14825v1 [cs.SI])
    Many network applications can be formulated as NP-hard combinatorial optimization problems of community detection (CD). Due to the NP-hardness, to balance the CD quality and efficiency remains a challenge. Most existing CD methods are transductive, which are independently optimized only for the CD on a single graph. Some of these methods use advanced machine learning techniques to obtain high-quality CD results but usually have high complexity. Other approaches use fast heuristic approximation to ensure low runtime but may suffer from quality degradation. In contrast to these transductive methods, we propose an alternative inductive community detection (ICD) method across graphs of a system or scenario to alleviate the NP-hard challenge. ICD first conducts the offline training of an adversarial dual GNN on historical graphs to capture key properties of the system. The trained model is then directly generalized to new unseen graphs for online CD without additional optimization, where a better trade-off between quality and efficiency can be achieved. ICD can also capture the permutation invariant community labels in the offline training and tackle the online CD on new graphs with non-fixed number of nodes and communities. Experiments on a set of benchmarks demonstrate that ICD can achieve a significant trade-off between quality and efficiency over various baselines.
    Make-A-Video: Text-to-Video Generation without Text-Video Data. (arXiv:2209.14792v1 [cs.CV])
    We propose Make-A-Video -- an approach for directly translating the tremendous recent progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple: learn what the world looks like and how it is described from paired text-image data, and learn how the world moves from unsupervised video footage. Make-A-Video has three advantages: (1) it accelerates training of the T2V model (it does not need to learn visual and multimodal representations from scratch), (2) it does not require paired text-video data, and (3) the generated videos inherit the vastness (diversity in aesthetic, fantastical depictions, etc.) of today's image generation models. We design a simple yet effective way to build on T2I models with novel and effective spatial-temporal modules. First, we decompose the full temporal U-Net and attention tensors and approximate them in space and time. Second, we design a spatial temporal pipeline to generate high resolution and frame rate videos with a video decoder, interpolation model and two super resolution models that can enable various applications besides T2V. In all aspects, spatial and temporal resolution, faithfulness to text, and quality, Make-A-Video sets the new state-of-the-art in text-to-video generation, as determined by both qualitative and quantitative measures.
    FastPacket: Towards Pre-trained Packets Embedding based on FastText for next-generation NIDS. (arXiv:2209.14727v1 [cs.CR])
    New Attacks are increasingly used by attackers everyday but many of them are not detected by Intrusion Detection Systems as most IDS ignore raw packet information and only care about some basic statistical information extracted from PCAP files. Using networking programs to extract fixed statistical features from packets is good, but may not enough to detect nowadays challenges. We think that it is time to utilize big data and deep learning for automatic dynamic feature extraction from packets. It is time to get inspired by deep learning pre-trained models in computer vision and natural language processing, so security deep learning solutions will have its pre-trained models on big datasets to be used in future researches. In this paper, we proposed a new approach for embedding packets based on character-level embeddings, inspired by FastText success on text data. We called this approach FastPacket. Results are measured on subsets of CIC-IDS-2017 dataset, but we expect promising results on big data pre-trained models. We suggest building pre-trained FastPacket on MAWI big dataset and make it available to community, similar to FastText. To be able to outperform currently used NIDS, to start a new era of packet-level NIDS that can better detect complex attacks.
    Batch Normalization Explained. (arXiv:2209.14778v1 [cs.LG])
    A critically important, ubiquitous, and yet poorly understood ingredient in modern deep networks (DNs) is batch normalization (BN), which centers and normalizes the feature maps. To date, only limited progress has been made understanding why BN boosts DN learning and inference performance; work has focused exclusively on showing that BN smooths a DN's loss landscape. In this paper, we study BN theoretically from the perspective of function approximation; we exploit the fact that most of today's state-of-the-art DNs are continuous piecewise affine (CPA) splines that fit a predictor to the training data via affine mappings defined over a partition of the input space (the so-called "linear regions"). {\em We demonstrate that BN is an unsupervised learning technique that -- independent of the DN's weights or gradient-based learning -- adapts the geometry of a DN's spline partition to match the data.} BN provides a "smart initialization" that boosts the performance of DN learning, because it adapts even a DN initialized with random weights to align its spline partition with the data. We also show that the variation of BN statistics between mini-batches introduces a dropout-like random perturbation to the partition boundaries and hence the decision boundary for classification problems. This per mini-batch perturbation reduces overfitting and improves generalization by increasing the margin between the training samples and the decision boundary.
    Rethinking Counterfactual Explanations as Local and Regional Counterfactual Policies. (arXiv:2209.14568v1 [stat.ML])
    Among the challenges not yet resolved for Counterfactual Explanations (CE), there are stability, synthesis of the various CE and the lack of plausibility/sparsity guarantees. From a more practical point of view, recent studies show that the prescribed counterfactual recourses are often not implemented exactly by the individuals and demonstrate that most state-of-the-art CE algorithms are very likely to fail in this noisy environment. To address these issues, we propose a probabilistic framework that gives a sparse local counterfactual rule for each observation: we provide rules that give a range of values that can change the decision with a given high probability instead of giving diverse CE. In addition, the recourses derived from these rules are robust by construction. These local rules are aggregated into a regional counterfactual rule to ensure the stability of the counterfactual explanations across observations. Our local and regional rules guarantee that the recourses are faithful to the data distribution because our rules use a consistent estimator of the probabilities of changing the decision based on a Random Forest. In addition, these probabilities give interpretable and sparse rules as we select the smallest set of variables having a given probability of changing the decision. Codes for computing our counterfactual rules are available, and we compare their relevancy with standard CE and recent similar attempts.
    Bayesian Neural Network Versus Ex-Post Calibration For Prediction Uncertainty. (arXiv:2209.14594v1 [cs.LG])
    Probabilistic predictions from neural networks which account for predictive uncertainty during classification is crucial in many real-world and high-impact decision making settings. However, in practice most datasets are trained on non-probabilistic neural networks which by default do not capture this inherent uncertainty. This well-known problem has led to the development of post-hoc calibration procedures, such as Platt scaling (logistic), isotonic and beta calibration, which transforms the scores into well calibrated empirical probabilities. A plausible alternative to the calibration approach is to use Bayesian neural networks, which directly models a predictive distribution. Although they have been applied to images and text datasets, they have seen limited adoption in the tabular and small data regime. In this paper, we demonstrate that Bayesian neural networks yields competitive performance when compared to calibrated neural networks and conduct experiments across a wide array of datasets.
    On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks. (arXiv:2209.14501v1 [quant-ph])
    Classical algorithms are often not effective for solving nonconvex optimization problems where local minima are separated by high barriers. In this paper, we explore possible quantum speedups for nonconvex optimization by leveraging the global effect of quantum tunneling. Specifically, we introduce a quantum algorithm termed the quantum tunneling walk (QTW) and apply it to nonconvex problems where local minima are approximately global minima. We show that QTW achieves quantum speedup over classical stochastic gradient descents (SGD) when the barriers between different local minima are high but thin and the minima are flat. Based on this observation, we construct a specific double-well landscape, where classical algorithms cannot efficiently hit one target well knowing the other well but QTW can when given proper initial states near the known well. Finally, we corroborate our findings with numerical experiments.
    Convergence of the mini-batch SIHT algorithm. (arXiv:2209.14536v1 [cs.LG])
    The Iterative Hard Thresholding (IHT) algorithm has been considered extensively as an effective deterministic algorithm for solving sparse optimizations. The IHT algorithm benefits from the information of the batch (full) gradient at each point and this information is a crucial key for the convergence analysis of the generated sequence. However, this strength becomes a weakness when it comes to machine learning and high dimensional statistical applications because calculating the batch gradient at each iteration is computationally expensive or impractical. Fortunately, in these applications the objective function has a summation structure that can be taken advantage of to approximate the batch gradient by the stochastic mini-batch gradient. In this paper, we study the mini-batch Stochastic IHT (SIHT) algorithm for solving the sparse optimizations. As opposed to previous works where increasing and variable mini-batch size is necessary for derivation, we fix the mini-batch size according to a lower bound that we derive and show our work. To prove stochastic convergence of the objective value function we first establish a critical sparse stochastic gradient descent property. Using this stochastic gradient descent property we show that the sequence generated by the stochastic mini-batch SIHT is a supermartingale sequence and converges with probability one. Unlike previous work we do not assume the function to be a restricted strongly convex. To the best of our knowledge, in the regime of sparse optimization, this is the first time in the literature that it is shown that the sequence of the stochastic function values converges with probability one by fixing the mini-batch size for all steps.
    Compressed Gastric Image Generation Based on Soft-Label Dataset Distillation for Medical Data Sharing. (arXiv:2209.14635v1 [cs.CV])
    Background and objective: Sharing of medical data is required to enable the cross-agency flow of healthcare information and construct high-accuracy computer-aided diagnosis systems. However, the large sizes of medical datasets, the massive amount of memory of saved deep convolutional neural network (DCNN) models, and patients' privacy protection are problems that can lead to inefficient medical data sharing. Therefore, this study proposes a novel soft-label dataset distillation method for medical data sharing. Methods: The proposed method distills valid information of medical image data and generates several compressed images with different data distributions for anonymous medical data sharing. Furthermore, our method can extract essential weights of DCNN models to reduce the memory required to save trained models for efficient medical data sharing. Results: The proposed method can compress tens of thousands of images into several soft-label images and reduce the size of a trained model to a few hundredths of its original size. The compressed images obtained after distillation have been visually anonymized; therefore, they do not contain the private information of the patients. Furthermore, we can realize high-detection performance with a small number of compressed images. Conclusions: The experimental results show that the proposed method can improve the efficiency and security of medical data sharing.
    The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling. (arXiv:2209.14458v1 [cs.SD])
    Data is the lifeblood of modern machine learning systems, including for those in Music Information Retrieval (MIR). However, MIR has long been mired by small datasets and unreliable labels. In this work, we propose to break this bottleneck using generative modeling. By pipelining a generative model of notes (Coconet trained on Bach Chorales) with a structured synthesis model of chamber ensembles (MIDI-DDSP trained on URMP), we demonstrate a system capable of producing unlimited amounts of realistic chorale music with rich annotations including mixes, stems, MIDI, note-level performance attributes (staccato, vibrato, etc.), and even fine-grained synthesis parameters (pitch, amplitude, etc.). We call this system the Chamber Ensemble Generator (CEG), and use it to generate a large dataset of chorales from four different chamber ensembles (CocoChorales). We demonstrate that data generated using our approach improves state-of-the-art models for music transcription and source separation, and we release both the system and the dataset as an open-source foundation for future work in the MIR community.
    Re-Imagen: Retrieval-Augmented Text-to-Image Generator. (arXiv:2209.14491v1 [cs.CV])
    Research on text-to-image generation has witnessed significant progress in generating diverse and photo-realistic images, driven by diffusion and auto-regressive models trained on large-scale image-text data. Though state-of-the-art models can generate high-quality images of common entities, they often have difficulty generating images of uncommon entities, such as `Chortai (dog)' or `Picarones (food)'. To tackle this issue, we present the Retrieval-Augmented Text-to-Image Generator (Re-Imagen), a generative model that uses retrieved information to produce high-fidelity and faithful images, even for rare or unseen entities. Given a text prompt, Re-Imagen accesses an external multi-modal knowledge base to retrieve relevant (image, text) pairs, and uses them as references to generate the image. With this retrieval step, Re-Imagen is augmented with the knowledge of high-level semantics and low-level visual details of the mentioned entities, and thus improves its accuracy in generating the entities' visual appearances. We train Re-Imagen on a constructed dataset containing (image, text, retrieval) triples to teach the model to ground on both text prompt and retrieval. Furthermore, we develop a new sampling strategy to interleave the classifier-free guidance for text and retrieval condition to balance the text and retrieval alignment. Re-Imagen achieves new SoTA FID results on two image generation benchmarks, such as COCO (ie, FID = 5.25) and WikiImage (ie, FID = 5.82) without fine-tuning. To further evaluate the capabilities of the model, we introduce EntityDrawBench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple visual domains. Human evaluation on EntityDrawBench shows that Re-Imagen performs on par with the best prior models in photo-realism, but with significantly better faithfulness, especially on less frequent entities.
    FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations. (arXiv:2209.14399v1 [cs.NI])
    In edge computing, users' service profiles must be migrated in response to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so. Nevertheless, these frameworks do not consider occasional server failures, which although rare, can prevent the smooth and safe functioning of edge computing users' latency sensitive applications such as autonomous driving and real-time obstacle detection, because users' computing jobs can no longer be completed. As these failures occur at a low probability, it is difficult for RL algorithms, which are inherently data-driven, to learn an optimal service migration solution for both the typical and rare event scenarios. Therefore, we introduce a rare events adaptive resilience framework FIRE, which integrates importance sampling into reinforcement learning to place backup services. We sample rare events at a rate proportional to their contribution to the value function, to learn an optimal policy. Our framework balances service migration trade-offs between delay and migration costs, with the costs of failure and the costs of backup placement and migration. We propose an importance sampling based Q-learning algorithm, and prove its boundedness and convergence to optimality. Following which we propose novel eligibility traces, linear function approximation and deep Q-learning versions of our algorithm to ensure it scales to real-world scenarios. We extend our framework to cater to users with different risk tolerances towards failure. Finally, we use trace driven experiments to show that our algorithm gives cost reductions in the event of failures.
    Machine Learning for Optical Motion Capture-driven Musculoskeletal Modeling from Inertial Motion Capture Data. (arXiv:2209.14456v1 [cs.LG])
    Marker-based Optical Motion Capture (OMC) systems and the associated musculoskeletal (MSK) modeling predictions have offered the ability to gain insights into in vivo joint and muscle loading non-invasively as well as aid clinical decision-making. However, an OMC system is lab-based, expensive, and requires a line of sight. A widely used alternative is the Inertial Motion Capture (IMC) system, which is portable, user-friendly, and relatively low cost, although it is not as accurate as an OMC system. Irrespective of the choice of motion capture technique, one needs to use an MSK model to obtain the kinematic and kinetic outputs, which is a computationally expensive tool increasingly well approximated by machine learning (ML) methods. Here, we present an ML approach to map IMC data to the human upper-extremity MSK outputs computed from OMC input data. Essentially, we attempt to predict high-quality MSK outputs from the relatively easier-to-obtain IMC data. We use OMC and IMC data simultaneously collected for the same subjects to train an ML (feed-forward multi-layer perceptron) model that predicts OMC-based MSK outputs from IMC measurements. We demonstrate that our ML predictions have a high degree of agreement with the desired OMC-based MSK estimates. Thus, this approach will be instrumental in getting the technology from 'lab to field' where OMC-based systems are infeasible.
    Learning to Explain Graph Neural Networks. (arXiv:2209.14402v1 [cs.LG])
    Graph Neural Networks (GNNs) are a popular class of machine learning models. Inspired by the learning to explain (L2X) paradigm, we propose L2XGNN, a framework for explainable GNNs which provides faithful explanations by design. L2XGNN learns a mechanism for selecting explanatory subgraphs (motifs) which are exclusively used in the GNNs message-passing operations. L2XGNN is able to select, for each input graph, a subgraph with specific properties such as being sparse and connected. Imposing such constraints on the motifs often leads to more interpretable and effective explanations. Experiments on several datasets suggest that L2XGNN achieves the same classification accuracy as baseline methods using the entire input graph while ensuring that only the provided explanations are used to make predictions. Moreover, we show that L2XGNN is able to identify motifs responsible for the graph's properties it is intended to predict.
    medigan: A Python Library of Pretrained Generative Models for Enriched Data Access in Medical Imaging. (arXiv:2209.14472v1 [eess.IV])
    Synthetic data generated by generative models can enhance the performance and capabilities of data-hungry deep learning models in medical imaging. However, there is (1) limited availability of (synthetic) datasets and (2) generative models are complex to train, which hinders their adoption in research and clinical applications. To reduce this entry barrier, we propose medigan, a one-stop shop for pretrained generative models implemented as an open-source framework-agnostic Python library. medigan allows researchers and developers to create, increase, and domain-adapt their training data in just a few lines of code. Guided by design decisions based on gathered end-user requirements, we implement medigan based on modular components for generative model (i) execution, (ii) visualisation, (iii) search & ranking, and (iv) contribution. The library's scalability and design is demonstrated by its growing number of integrated and readily-usable pretrained generative models consisting of 21 models utilising 9 different Generative Adversarial Network architectures trained on 11 datasets from 4 domains, namely, mammography, endoscopy, x-ray, and MRI. Furthermore, 3 applications of medigan are analysed in this work, which include (a) enabling community-wide sharing of restricted data, (b) investigating generative model evaluation metrics, and (c) improving clinical downstream tasks. In (b), extending on common medical image synthesis assessment and reporting standards, we show Fr\'echet Inception Distance variability based on image normalisation and radiology-specific feature extraction.
    Is Complexity Required for Neural Network Pruning? A Case Study on Global Magnitude Pruning. (arXiv:2209.14624v1 [cs.LG])
    Pruning neural networks has become popular in the last decade when it was shown that a large number of weights can be safely removed from modern neural networks without compromising accuracy. Numerous pruning methods have been proposed since then, each claiming to be better than the previous. Many state-of-the-art (SOTA) techniques today rely on complex pruning methodologies utilizing importance scores, getting feedback through back-propagation or having heuristics-based pruning rules amongst others. We question this pattern of introducing complexity in order to achieve better pruning results. We benchmark these SOTA techniques against Global Magnitude Pruning (Global MP), a naive pruning baseline, to evaluate whether complexity is really needed to achieve higher performance. Global MP ranks weights in order of their magnitudes and prunes the smallest ones. Hence, in its vanilla form, it is one of the simplest pruning techniques. Surprisingly, we find that vanilla Global MP outperforms all the other SOTA techniques and achieves a new SOTA result. It also achieves good performance on FLOPs sparsification, which we find is enhanced, when pruning is conducted in a gradual fashion. We also find that Global MP is generalizable across tasks, datasets and models with superior performance. Moreover, a common issue that many pruning algorithms run into at high sparsity rates, namely, layer-collapse, can be easily fixed in Global MP by setting a minimum threshold of weights to be retained in each layer. Lastly, unlike many other SOTA techniques, Global MP does not require any additional algorithm specific hyper-parameters and is very straightforward to tune and implement. We showcase our findings on various models (WRN-28-8, ResNet-32, ResNet-50, MobileNet-V1 and FastGRNN) and multiple datasets (CIFAR-10, ImageNet and HAR-2). Code is available at https://github.com/manasgupta-1/GlobalMP.
    A case study of spatiotemporal forecasting techniques for weather forecasting. (arXiv:2209.14782v1 [cs.LG])
    The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most important processes that fall under this domain, and forecasting it has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are currently state-of-the-art, they are resource intensive and time-consuming. Numerous studies have proposed time-series-based models as a viable alternative to numerical forecasts. Recent research has primarily focused on forecasting weather at a specific location. Therefore, models can only capture temporal correlations. This self-contained paper explores various methods for regional data-driven weather forecasting, i.e., forecasting over multiple latitude-longitude points to capture spatiotemporal correlations. The results showed that spatiotemporal prediction models reduced computational cost while improving accuracy; in particular, the proposed tensor train dynamic mode decomposition-based forecasting model has comparable accuracy to ConvLSTM without the need for training. We use the NASA POWER meteorological dataset to evaluate the models and compare them with the current state of the art.
    Out-of-Distribution Detection for LiDAR-based 3D Object Detection. (arXiv:2209.14435v1 [cs.CV])
    3D object detection is an essential part of automated driving, and deep neural networks (DNNs) have achieved state-of-the-art performance for this task. However, deep models are notorious for assigning high confidence scores to out-of-distribution (OOD) inputs, that is, inputs that are not drawn from the training distribution. Detecting OOD inputs is challenging and essential for the safe deployment of models. OOD detection has been studied extensively for the classification task, but it has not received enough attention for the object detection task, specifically LiDAR-based 3D object detection. In this paper, we focus on the detection of OOD inputs for LiDAR-based 3D object detection. We formulate what OOD inputs mean for object detection and propose to adapt several OOD detection methods for object detection. We accomplish this by our proposed feature extraction method. To evaluate OOD detection methods, we develop a simple but effective technique of generating OOD objects for a given object detection model. Our evaluation based on the KITTI dataset shows that different OOD detection methods have biases toward detecting specific OOD objects. It emphasizes the importance of combined OOD detection methods and more research in this direction.  ( 3 min )
  • Open

    VC Theoretical Explanation of Double Descent. (arXiv:2205.15549v3 [stat.ML] UPDATED)
    There has been growing interest in generalization performance of large multilayer neural networks that can be trained to achieve zero training error, while generalizing well on test data. This regime is known as 'second descent' and it appears to contradict the conventional view that optimal model complexity should reflect an optimal balance between underfitting and overfitting, i.e., the bias-variance trade-off. This paper presents a VC-theoretical analysis of double descent and shows that it can be fully explained by classical VC-generalization bounds. We illustrate an application of analytic VC-bounds for modeling double descent for classification, using empirical results for several learning methods, such as SVM, Least Squares, and Multilayer Perceptron classifiers. In addition, we discuss several reasons for the misinterpretation of VC-theoretical results in Deep Learning community.
    A New Index for Clustering Evaluation Based on Density Estimation. (arXiv:2207.01294v3 [cs.LG] UPDATED)
    A new index for internal evaluation of clustering is introduced. The index is defined as a mixture of two sub-indices. The first sub-index $ I_a $ is called the Ambiguous Index; the second sub-index $ I_s $ is called the Similarity Index. Calculation of the two sub-indices is based on density estimation to each cluster of a partition of the data. An experiment is conducted to test the performance of the new index, and compared with six other internal clustering evaluation indices -- Calinski-Harabasz index, Silhouette coefficient, Davies-Bouldin index, CDbw, DBCV, and VIASCKDE, on a set of 145 datasets. The result shows the new index significantly improves other internal clustering evaluation indices.
    The Survival Bandit Problem. (arXiv:2206.03019v2 [cs.LG] UPDATED)
    We study the survival bandit problem, a variant of the multi-armed bandit problem introduced in an open problem by Perotto et al. (2019), with a constraint on the cumulative reward; at each time step, the agent receives a (possibly negative) reward and if the cumulative reward becomes lower than a prespecified threshold, the procedure stops, and this phenomenon is called ruin. This is the first paper studying a framework where the ruin might occur but not always. We first discuss that a sublinear regret is unachievable under a naive definition of the regret. Next, we provide tight lower bounds on the probability of ruin (as well as matching policies). Based on this lower bound, we define the survival regret as an objective to minimize and provide a policy achieving a sublinear survival regret (at least in the case of integral rewards) when the time horizon $T$ is known.
    Pyramidal Denoising Diffusion Probabilistic Models. (arXiv:2208.01864v2 [cs.CV] UPDATED)
    Recently, diffusion model have demonstrated impressive image generation performances, and have been extensively studied in various computer vision tasks. Unfortunately, training and evaluating diffusion models consume a lot of time and computational resources. To address this problem, here we present a novel pyramidal diffusion model that can generate high resolution images starting from much coarser resolution images using a {\em single} score function trained with a positional embedding. This enables a neural network to be much lighter and also enables time-efficient image generation without compromising its performances. Furthermore, we show that the proposed approach can be also efficiently used for multi-scale super-resolution problem using a single score function.
    Signed Network Embedding with Application to Simultaneous Detection of Communities and Anomalies. (arXiv:2207.09324v2 [cs.SI] UPDATED)
    Signed networks are frequently observed in real life with additional sign information associated with each edge, yet such information has been largely ignored in existing network models. This paper develops a unified embedding model for signed networks to disentangle the intertwined balance structure and anomaly effect, which can greatly facilitate the downstream analysis, including community detection, anomaly detection, and network inference. The proposed model captures both balance structure and anomaly effect through a low rank plus sparse matrix decomposition, which are jointly estimated via a regularized formulation. Its theoretical guarantees are established in terms of asymptotic consistency and finite-sample probability bounds for network embedding, community detection and anomaly detection. The advantage of the proposed embedding model is also demonstrated through extensive numerical experiments on both synthetic networks and an international relation network.
    Differentiable and Transportable Structure Learning. (arXiv:2206.06354v2 [cs.LG] UPDATED)
    Directed acyclic graphs (DAGs) encode a lot of information about a particular distribution in its structure. However, compute required to infer these structures is typically super-exponential in the number of variables, as inference requires a sweep of a combinatorially large space of potential structures. That is, until recent advances made it possible to search this space using a differentiable metric, drastically reducing search time. While this technique -- named NOTEARS -- is widely considered a seminal work in DAG-discovery, it concedes an important property in favour of differentiability: transportability. To be transportable, the structures discovered on one dataset must apply to another dataset from the same domain. In our paper, we introduce D-Struct which recovers transportability in the discovered structures through a novel architecture and loss function, while remaining completely differentiable. Because D-Struct remains differentiable, our method can be easily adopted in existing differentiable architectures, as was previously done with NOTEARS. In our experiments, we empirically validate D-Struct with respect to edge accuracy and structural Hamming distance in a variety of settings.
    Diffusion Posterior Sampling for General Noisy Inverse Problems. (arXiv:2209.14687v1 [stat.ML])
    Diffusion models have been recently studied as powerful generative inverse problem solvers, owing to their high quality reconstructions and the ease of combining existing iterative solvers. However, most works focus on solving simple linear inverse problems in noiseless settings, which significantly under-represents the complexity of real-world problems. In this work, we extend diffusion solvers to efficiently handle general noisy (non)linear inverse problems via the Laplace approximation of the posterior sampling. Interestingly, the resulting posterior sampling scheme is a blended version of diffusion sampling with the manifold constrained gradient without a strict measurement consistency projection step, yielding a more desirable generative path in noisy settings compared to the previous studies. Our method demonstrates that diffusion models can incorporate various measurement noise statistics such as Gaussian and Poisson, and also efficiently handle noisy nonlinear inverse problems such as Fourier phase retrieval and non-uniform deblurring.  ( 2 min )
    Learning Causal Models from Conditional Moment Restrictions by Importance Weighting. (arXiv:2108.01312v2 [econ.EM] UPDATED)
    We consider learning causal relationships under conditional moment restrictions. Unlike causal inference under unconditional moment restrictions, conditional moment restrictions pose serious challenges for causal inference, especially in high-dimensional settings. To address this issue, we propose a method that transforms conditional moment restrictions to unconditional moment restrictions through importance weighting, using a conditional density ratio estimator. Using this transformation, we successfully estimate nonparametric functions defined under conditional moment restrictions. Our proposed framework is general and can be applied to a wide range of methods, including neural networks. We analyze the estimation error, providing theoretical support for our proposed method. In experiments, we confirm the soundness of our proposed method.  ( 2 min )
    Continuous PDE Dynamics Forecasting with Implicit Neural Representations. (arXiv:2209.14855v1 [cs.LG])
    Effective data-driven PDE forecasting methods often rely on fixed spatial and / or temporal discretizations. This raises limitations in real-world applications like weather prediction where flexible extrapolation at arbitrary spatiotemporal locations is required. We address this problem by introducing a new data-driven approach, DINo, that models a PDE's flow with continuous-time dynamics of spatially continuous functions. This is achieved by embedding spatial observations independently of their discretization via Implicit Neural Representations in a small latent space temporally driven by a learned ODE. This separate and flexible treatment of time and space makes DINo the first data-driven model to combine the following advantages. It extrapolates at arbitrary spatial and temporal locations; it can learn from sparse irregular grids or manifolds; at test time, it generalizes to new grids or resolutions. DINo outperforms alternative neural PDE forecasters in a variety of challenging generalization scenarios on representative PDE systems.  ( 2 min )
    Rectified Flow: A Marginal Preserving Approach to Optimal Transport. (arXiv:2209.14577v1 [stat.ML])
    We present a flow-based approach to the optimal transport (OT) problem between two continuous distributions $\pi_0,\pi_1$ on $\mathbb{R}^d$, of minimizing a transport cost $\mathbb{E}[c(X_1-X_0)]$ in the set of couplings $(X_0,X_1)$ whose marginal distributions on $X_0,X_1$ equals $\pi_0,\pi_1$, respectively, where $c$ is a cost function. Our method iteratively constructs a sequence of neural ordinary differentiable equations (ODE), each learned by solving a simple unconstrained regression problem, which monotonically reduce the transport cost while automatically preserving the marginal constraints. This yields a monotonic interior approach that traverses inside the set of valid couplings to decrease the transport cost, which distinguishes itself from most existing approaches that enforce the coupling constraints from the outside. The main idea of the method draws from rectified flow, a recent approach that simultaneously decreases the whole family of transport costs induced by convex functions $c$ (and is hence multi-objective in nature), but is not tailored to minimize a specific transport cost. Our method is a single-object variant of rectified flow that guarantees to solve the OT problem for a fixed, user-specified convex cost function $c$.  ( 2 min )
    Fast Nonlinear Vector Quantile Regression. (arXiv:2205.14977v2 [stat.CO] UPDATED)
    Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable $\mathrm{Y}$ given explanatory features $\boldsymbol{\mathrm{X}}$. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. Recently, vector quantile regression (VQR) was proposed as an extension of QR for vector-valued target variables, thanks to a meaningful generalization of the notion of quantiles to multivariate distributions via optimal transport. Despite its elegance, VQR is arguably not applicable in practice due to several limitations: (i) it assumes a linear model for the quantiles of the target $\boldsymbol{\mathrm{Y}}$ given the features $\boldsymbol{\mathrm{X}}$; (ii) its exact formulation is intractable even for modestly-sized problems in terms of target dimensions, number of regressed quantile levels, or number of features, and its relaxed dual formulation may violate the monotonicity of the estimated quantiles; (iii) no fast or scalable solvers for VQR currently exist. In this work we fully address these limitations, namely: (i) We extend VQR to the non-linear case, showing substantial improvement over linear VQR; (ii) We propose {vector monotone rearrangement}, a method which ensures the quantile functions estimated by VQR are monotone functions; (iii) We provide fast, GPU-accelerated solvers for linear and nonlinear VQR which maintain a fixed memory footprint, and demonstrate that they scale to millions of samples and thousands of quantile levels; (iv) We release an optimized python package of our solvers as to widespread the use of VQR in real-world applications.  ( 3 min )
    On Transfer Learning in Functional Linear Regression. (arXiv:2206.04277v2 [stat.ML] UPDATED)
    This work studies the problem of transfer learning under the functional linear model framework, which aims to improve the fit of the target model by leveraging the knowledge from related source models. We measure the relatedness between target and source models using Reproducing Kernel Hilbert Spaces, allowing the type of knowledge being transferred to be interpreted by the structure of the spaces. Two algorithms are proposed: one transfers knowledge when the index of transferable sources is known, while the other one utilizes aggregation to achieve knowledge transfer without prior information about the sources. Furthermore, we establish the optimal convergence rates for excess risk, making the statistical gain via transfer learning mathematically provable. The effectiveness of the proposed algorithms is demonstrated on synthetic data as well as real financial data.  ( 2 min )
    DreamFusion: Text-to-3D using 2D Diffusion. (arXiv:2209.14988v1 [cs.CV])
    Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.  ( 2 min )
    Deep Neural Networks for Rank-Consistent Ordinal Regression Based On Conditional Probabilities. (arXiv:2111.08851v4 [cs.LG] UPDATED)
    In recent times, deep neural networks achieved outstanding predictive performance on various classification and pattern recognition tasks. However, many real-world prediction problems have ordinal response variables, and this ordering information is ignored by conventional classification losses such as the multi-category cross-entropy. Ordinal regression methods for deep neural networks address this. One such method is the CORAL method, which is based on an earlier binary label extension framework and achieves rank consistency among its output layer tasks by imposing a weight-sharing constraint. However, while earlier experiments showed that CORAL's rank consistency is beneficial for performance, {it is limited by a weight-sharing constraint in a neural network's fully connected output layer. We propose a new method for rank-consistent ordinal regression without this limitation. Our rank-consistent ordinal regression framework (CORN) achieves rank consistency by a novel training scheme. This training scheme uses} conditional training sets to obtain the unconditional rank probabilities through applying the chain rule for conditional probability distributions. Experiments on various datasets demonstrate the efficacy of the proposed method to utilize the ordinal target information, and the absence of the weight-sharing restriction improves the performance substantially compared to the CORAL reference approach.  ( 3 min )
    A case study of spatiotemporal forecasting techniques for weather forecasting. (arXiv:2209.14782v1 [cs.LG])
    The majority of real-world processes are spatiotemporal, and the data generated by them exhibits both spatial and temporal evolution. Weather is one of the most important processes that fall under this domain, and forecasting it has become a crucial part of our daily routine. Weather data analysis is considered the most complex and challenging task. Although numerical weather prediction models are currently state-of-the-art, they are resource intensive and time-consuming. Numerous studies have proposed time-series-based models as a viable alternative to numerical forecasts. Recent research has primarily focused on forecasting weather at a specific location. Therefore, models can only capture temporal correlations. This self-contained paper explores various methods for regional data-driven weather forecasting, i.e., forecasting over multiple latitude-longitude points to capture spatiotemporal correlations. The results showed that spatiotemporal prediction models reduced computational cost while improving accuracy; in particular, the proposed tensor train dynamic mode decomposition-based forecasting model has comparable accuracy to ConvLSTM without the need for training. We use the NASA POWER meteorological dataset to evaluate the models and compare them with the current state of the art.  ( 2 min )
    On the influence of stochastic roundoff errors on the convergence of the gradient descent method with low-precision floating-point computation. (arXiv:2202.12276v2 [cs.LG] UPDATED)
    When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of the gradient descent method in low-precision computation. Additionally, we propose two new stochastic rounding schemes that trade the zero bias property with a larger probability to preserve small gradients. Our methods yield a constant rounding bias that, on average, lies in a descent direction. For convex problems, we prove that the proposed rounding methods typically have a beneficial effect on the convergence rate of gradient descent. We validate our theoretical analysis by comparing the performances of various rounding schemes when optimizing a multinomial logistic regression model and when training a simple neural network with an 8-bit floating-point format.  ( 2 min )
    Variance-Aware Sparse Linear Bandits. (arXiv:2205.13450v2 [cs.LG] UPDATED)
    It is well-known that for sparse linear bandits, when ignoring the dependency on sparsity which is much smaller than the ambient dimension, the worst-case minimax regret is $\widetilde{\Theta}\left(\sqrt{dT}\right)$ where $d$ is the ambient dimension and $T$ is the number of rounds. On the other hand, in the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve $\widetilde{\mathcal O}(1)$ regret, which is (nearly) independent of $d$ and $T$. In this paper, we present the first variance-aware regret guarantee for sparse linear bandits: $\widetilde{\mathcal O}\left(\sqrt{d\sum_{t=1}^T \sigma_t^2} + 1\right)$, where $\sigma_t^2$ is the variance of the noise at the $t$-th round. This bound naturally interpolates the regret bounds for the worst-case constant-variance regime (i.e., $\sigma_t \equiv \Omega(1)$) and the benign deterministic regimes (i.e., $\sigma_t \equiv 0$). To achieve this variance-aware regret guarantee, we develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits in a "black-box" manner. Specifically, we take two recent algorithms as black boxes to illustrate that the claimed bounds indeed hold, where the first algorithm can handle unknown-variance cases and the second one is more efficient.  ( 2 min )
    Sequential Attention for Feature Selection. (arXiv:2209.14881v1 [cs.LG])
    Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a resource budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and stochastic gates, typically select all of the features in one evaluation round, ignoring the residual value of the features during selection (i.e., the marginal contribution of a feature conditioned on the previously selected features). We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient implementation of greedy forward selection and uses attention weights at each step as a proxy for marginal feature importance. We provide theoretical insights into our Sequential Attention algorithm for linear regression models by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit algorithm [PRK1993], and thus inherits all of its provable guarantees. Lastly, our theoretical and empirical analyses provide new explanations towards the effectiveness of attention and its connections to overparameterization, which might be of independent interest.  ( 2 min )
    Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms. (arXiv:2209.14990v1 [cs.LG])
    Partial Observability -- where agents can only observe partial information about the true underlying state of the system -- is ubiquitous in real-world applications of Reinforcement Learning (RL). Theoretically, learning a near-optimal policy under partial observability is known to be hard in the worst case due to an exponential sample complexity lower bound. Recent work has identified several tractable subclasses that are learnable with polynomial samples, such as Partially Observable Markov Decision Processes (POMDPs) with certain revealing or decodability conditions. However, this line of research is still in its infancy, where (1) unified structural conditions enabling sample-efficient learning are lacking; (2) existing sample complexities for known tractable subclasses are far from sharp; and (3) fewer sample-efficient algorithms are available than in fully observable RL. This paper advances all three aspects above for Partially Observable RL in the general setting of Predictive State Representations (PSRs). First, we propose a natural and unified structural condition for PSRs called \emph{B-stability}. B-stable PSRs encompasses the vast majority of known tractable subclasses such as weakly revealing POMDPs, low-rank future-sufficient POMDPs, decodable POMDPs, and regular PSRs. Next, we show that any B-stable PSR can be learned with polynomial samples in relevant problem parameters. When instantiated in the aforementioned subclasses, our sample complexities improve substantially over the current best ones. Finally, our results are achieved by three algorithms simultaneously: Optimistic Maximum Likelihood Estimation, Estimation-to-Decisions, and Model-Based Optimistic Posterior Sampling. The latter two algorithms are new for sample-efficient learning of POMDPs/PSRs.  ( 3 min )
    Optimistic MLE -- A Generic Model-based Algorithm for Partially Observable Sequential Decision Making. (arXiv:2209.14997v1 [cs.LG])
    This paper introduces a simple efficient learning algorithms for general sequential decision making. The algorithm combines Optimism for exploration with Maximum Likelihood Estimation for model estimation, which is thus named OMLE. We prove that OMLE learns the near-optimal policies of an enormously rich class of sequential decision making problems in a polynomial number of samples. This rich class includes not only a majority of known tractable model-based Reinforcement Learning (RL) problems (such as tabular MDPs, factored MDPs, low witness rank problems, tabular weakly-revealing/observable POMDPs and multi-step decodable POMDPs), but also many new challenging RL problems especially in the partially observable setting that were not previously known to be tractable. Notably, the new problems addressed by this paper include (1) observable POMDPs with continuous observation and function approximation, where we achieve the first sample complexity that is completely independent of the size of observation space; (2) well-conditioned low-rank sequential decision making problems (also known as Predictive State Representations (PSRs)), which include and generalize all known tractable POMDP examples under a more intrinsic representation; (3) general sequential decision making problems under SAIL condition, which unifies our existing understandings of model-based RL in both fully observable and partially observable settings. SAIL condition is identified by this paper, which can be viewed as a natural generalization of Bellman/witness rank to address partial observability.  ( 3 min )
    GeONet: a neural operator for learning the Wasserstein geodesic. (arXiv:2209.14440v1 [cs.LG])
    Optimal transport (OT) offers a versatile framework to compare complex data distributions in a geometrically meaningful way. Traditional methods for computing the Wasserstein distance and geodesic between probability measures require mesh-dependent domain discretization and suffer from the curse-of-dimensionality. We present GeONet, a mesh-invariant deep neural operator network that learns the non-linear mapping from the input pair of initial and terminal distributions to the Wasserstein geodesic connecting the two endpoint distributions. In the offline training stage, GeONet learns the saddle point optimality conditions for the dynamic formulation of the OT problem in the primal and dual spaces that are characterized by a coupled PDE system. The subsequent inference stage is instantaneous and can be deployed for real-time predictions in the online learning setting. We demonstrate that GeONet achieves comparable testing accuracy to the standard OT solvers on a simulation example and the CIFAR-10 dataset with considerably reduced inference-stage computational cost by orders of magnitude.  ( 2 min )
    Bayesian Neural Network Versus Ex-Post Calibration For Prediction Uncertainty. (arXiv:2209.14594v1 [cs.LG])
    Probabilistic predictions from neural networks which account for predictive uncertainty during classification is crucial in many real-world and high-impact decision making settings. However, in practice most datasets are trained on non-probabilistic neural networks which by default do not capture this inherent uncertainty. This well-known problem has led to the development of post-hoc calibration procedures, such as Platt scaling (logistic), isotonic and beta calibration, which transforms the scores into well calibrated empirical probabilities. A plausible alternative to the calibration approach is to use Bayesian neural networks, which directly models a predictive distribution. Although they have been applied to images and text datasets, they have seen limited adoption in the tabular and small data regime. In this paper, we demonstrate that Bayesian neural networks yields competitive performance when compared to calibrated neural networks and conduct experiments across a wide array of datasets.  ( 2 min )
    Algorithms that get old : the case of generative deep neural networks. (arXiv:2202.03008v3 [stat.ML] UPDATED)
    Generative deep neural networks used in machine learning, like the Variational Auto-Encoders (VAE), and Generative Adversarial Networks (GANs) produce new objects each time when asked to do so with the constraint that the new objects remain similar to some list of examples given as input. However, this behavior is unlike that of human artists that change their style as time goes by and seldom return to the style of the initial creations. We investigate a situation where VAEs are used to sample from a probability measure described by some empirical dataset. Based on recent works on Radon-Sobolev statistical distances, we propose a numerical paradigm, to be used in conjunction with a generative algorithm, that satisfies the two following requirements: the objects created do not repeat and evolve to fill the entire target probability distribution.  ( 2 min )
    Exact Recovery of Community Detection in dependent Gaussian Mixture Models. (arXiv:2209.14859v1 [math.ST])
    We study the community detection problem on a Gaussian mixture model, in which (1) vertices are divided into $k\geq 2$ distinct communities that are not necessarily equally-sized; (2) the Gaussian perturbations for different entries in the observation matrix are not necessarily independent or identically distributed. We prove necessary and sufficient conditions for the exact recovery of the maximum likelihood estimation (MLE), and discuss the cases when these necessary and sufficient conditions give sharp threshold. Applications include the community detection on a graph where the Gaussian perturbations of observations on each edge is the sum of i.i.d.~Gaussian random variables on its end vertices, in which we explicitly obtain the threshold for the exact recovery of the MLE.  ( 2 min )
    Efficient Approximation of Gromov-Wasserstein Distance using Importance Sparsification. (arXiv:2205.13573v2 [cs.LG] UPDATED)
    As a valid metric of metric-measure spaces, Gromov-Wasserstein (GW) distance has shown the potential for matching problems of structured data like point clouds and graphs. However, its application in practice is limited due to its high computational complexity. To overcome this challenge, we propose a novel importance sparsification method, called Spar-GW, to approximate GW distance efficiently. In particular, instead of considering a dense coupling matrix, our method leverages a simple but effective sampling strategy to construct a sparse coupling matrix and update it with few computations. We demonstrate that the proposed Spar-GW method is applicable to the GW distance with arbitrary ground cost, and it reduces the complexity from $\mathcal{O}(n^4)$ to $\mathcal{O}(n^{2+\delta})$ for an arbitrary small $\delta>0$. In addition, this method can be extended to approximate the variants of GW distance, including the entropic GW distance, the fused GW distance, and the unbalanced GW distance. Experiments show the superiority of our Spar-GW to state-of-the-art methods in both synthetic and real-world tasks.  ( 2 min )
    Training Normalizing Flows from Dependent Data. (arXiv:2209.14933v1 [cs.LG])
    Normalizing flows are powerful non-parametric statistical models that function as a hybrid between density estimators and generative models. Current learning algorithms for normalizing flows assume that data points are sampled independently, an assumption that is frequently violated in practice, which may lead to erroneous density estimation and data generation. We propose a likelihood objective of normalizing flows incorporating dependencies between the data points, for which we derive a flexible and efficient learning algorithm suitable for different dependency structures. We show that respecting dependencies between observations can improve empirical results on both synthetic and real-world data.  ( 2 min )
    Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging. (arXiv:2209.14981v1 [cs.LG])
    Training vision or language models on large datasets can take days, if not weeks. We show that averaging the weights of the k latest checkpoints, each collected at the end of an epoch, can speed up the training progression in terms of loss and accuracy by dozens of epochs, corresponding to time savings up to ~68 and ~30 GPU hours when training a ResNet50 on ImageNet and RoBERTa-Base model on WikiText-103, respectively. We also provide the code and model checkpoint trajectory to reproduce the results and facilitate research on reusing historical weights for faster convergence.  ( 2 min )
    Statistical Learning and Inverse Problems: An Stochastic Gradient Approach. (arXiv:2209.14967v1 [stat.ML])
    Inverse problems are paramount in Science and Engineering. In this paper, we consider the setup of Statistical Inverse Problem (SIP) and demonstrate how Stochastic Gradient Descent (SGD) algorithms can be used in the linear SIP setting. We provide consistency and finite sample bounds for the excess risk. We also propose a modification for the SGD algorithm where we leverage machine learning methods to smooth the stochastic gradients and improve empirical performance. We exemplify the algorithm in a setting of great interest nowadays: the Functional Linear Regression model. In this case we consider a synthetic data example and examples with a real data classification problem.  ( 2 min )
    Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees. (arXiv:2209.14414v1 [stat.ML])
    We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions. The performance of an agent is measured by the regret after interacting with the environment for $T$ episodes. We propose an optimistic posterior sampling algorithm for reinforcement learning (OPSRL), a simple variant of posterior sampling that only needs a number of posterior samples logarithmic in $H$, $S$, $A$, and $T$ per state-action pair. For OPSRL we guarantee a high-probability regret bound of order at most $\widetilde{\mathcal{O}}(\sqrt{H^3SAT})$ ignoring $\text{poly}\log(HSAT)$ terms. The key novel technical ingredient is a new sharp anti-concentration inequality for linear forms which may be of independent interest. Specifically, we extend the normal approximation-based lower bound for Beta distributions by Alfers and Dinges [1984] to Dirichlet distributions. Our bound matches the lower bound of order $\Omega(\sqrt{H^3SAT})$, thereby answering the open problems raised by Agrawal and Jia [2017b] for the episodic setting.  ( 2 min )
    Equivariant maps from invariant functions. (arXiv:2209.14991v1 [stat.ML])
    In equivariant machine learning the idea is to restrict the learning to a hypothesis class where all the functions are equivariant with respect to some group action. Irreducible representations or invariant theory are typically used to parameterize the space of such functions. In this note, we explicate a general procedure, attributed to Malgrange, to express all polynomial maps between linear spaces that are equivariant with respect to the action of a group $G$, given a characterization of the invariant polynomials on a bigger space. The method also parametrizes smooth equivariant maps in the case that $G$ is a compact Lie group.  ( 2 min )
    Batch Normalization Explained. (arXiv:2209.14778v1 [cs.LG])
    A critically important, ubiquitous, and yet poorly understood ingredient in modern deep networks (DNs) is batch normalization (BN), which centers and normalizes the feature maps. To date, only limited progress has been made understanding why BN boosts DN learning and inference performance; work has focused exclusively on showing that BN smooths a DN's loss landscape. In this paper, we study BN theoretically from the perspective of function approximation; we exploit the fact that most of today's state-of-the-art DNs are continuous piecewise affine (CPA) splines that fit a predictor to the training data via affine mappings defined over a partition of the input space (the so-called "linear regions"). {\em We demonstrate that BN is an unsupervised learning technique that -- independent of the DN's weights or gradient-based learning -- adapts the geometry of a DN's spline partition to match the data.} BN provides a "smart initialization" that boosts the performance of DN learning, because it adapts even a DN initialized with random weights to align its spline partition with the data. We also show that the variation of BN statistics between mini-batches introduces a dropout-like random perturbation to the partition boundaries and hence the decision boundary for classification problems. This per mini-batch perturbation reduces overfitting and improves generalization by increasing the margin between the training samples and the decision boundary.  ( 3 min )
    Optimal Stopping with Gaussian Processes. (arXiv:2209.14738v1 [stat.ML])
    We propose a novel group of Gaussian Process based algorithms for fast approximate optimal stopping of time series with specific applications to financial markets. We show that structural properties commonly exhibited by financial time series (e.g., the tendency to mean-revert) allow the use of Gaussian and Deep Gaussian Process models that further enable us to analytically evaluate optimal stopping value functions and policies. We additionally quantify uncertainty in the value function by propagating the price model through the optimal stopping analysis. We compare and contrast our proposed methods against a sampling-based method, as well as a deep learning based benchmark that is currently considered the state-of-the-art in the literature. We show that our family of algorithms outperforms benchmarks on three historical time series datasets that include intra-day and end-of-day equity asset prices as well as the daily US treasury yield curve rates.  ( 2 min )
    Neural Networks Efficiently Learn Low-Dimensional Representations with SGD. (arXiv:2209.14863v1 [stat.ML])
    We study the problem of training a two-layer neural network (NN) of arbitrary width using stochastic gradient descent (SGD) where the input $\boldsymbol{x}\in \mathbb{R}^d$ is Gaussian and the target $y \in \mathbb{R}$ follows a multiple-index model, i.e., $y=g(\langle\boldsymbol{u_1},\boldsymbol{x}\rangle,...,\langle\boldsymbol{u_k},\boldsymbol{x}\rangle)$ with a noisy link function $g$. We prove that the first-layer weights of the NN converge to the $k$-dimensional principal subspace spanned by the vectors $\boldsymbol{u_1},...,\boldsymbol{u_k}$ of the true model, when online SGD with weight decay is used for training. This phenomenon has several important consequences when $k \ll d$. First, by employing uniform convergence on this smaller subspace, we establish a generalization error bound of $\mathcal{O}(\sqrt{{kd}/{T}})$ after $T$ iterations of SGD, which is independent of the width of the NN. We further demonstrate that, SGD-trained ReLU NNs can learn a single-index target of the form $y=f(\langle\boldsymbol{u},\boldsymbol{x}\rangle) + \epsilon$ by recovering the principal direction, with a sample complexity linear in $d$ (up to log factors), where $f$ is a monotonic function with at most polynomial growth, and $\epsilon$ is the noise. This is in contrast to the known $d^{\Omega(p)}$ sample requirement to learn any degree $p$ polynomial in the kernel regime, and it shows that NNs trained with SGD can outperform the neural tangent kernel at initialization. Finally, we also provide compressibility guarantees for NNs using the approximate low-rank structure produced by SGD.  ( 3 min )
    Unsupervised Learning From Incomplete Measurements for Inverse Problems. (arXiv:2201.12151v4 [stat.ML] UPDATED)
    In many real-world inverse problems, only incomplete measurement data are available for training which can pose a problem for learning a reconstruction function. Indeed, unsupervised learning using a fixed incomplete measurement process is impossible in general, as there is no information in the nullspace of the measurement operator. This limitation can be overcome by using measurements from multiple operators. While this idea has been successfully applied in various applications, a precise characterization of the conditions for learning is still lacking. In this paper, we fill this gap by presenting necessary and sufficient conditions for learning the underlying signal model needed for reconstruction which indicate the interplay between the number of distinct measurement operators, the number of measurements per operator, the dimension of the model and the dimension of the signals. Furthermore, we propose a novel and conceptually simple unsupervised learning loss which only requires access to incomplete measurement data and achieves a performance on par with supervised learning when the sufficient condition is verified. We validate our theoretical bounds and demonstrate the advantages of the proposed unsupervised loss compared to previous methods via a series of experiments on various imaging inverse problems, such as accelerated magnetic resonance imaging, compressed sensing and image inpainting.  ( 3 min )
    Active Learning in Bayesian Neural Networks with Balanced Entropy Learning Principle. (arXiv:2105.14559v2 [cs.LG] UPDATED)
    Acquiring labeled data is challenging in many machine learning applications with limited budgets. Active learning gives a procedure to select the most informative data points and improve data efficiency by reducing the cost of labeling. The info-max learning principle maximizing mutual information such as BALD has been successful and widely adapted in various active learning applications. However, this pool-based specific objective inherently introduces a redundant selection and further requires a high computational cost for batch selection. In this paper, we design and propose a new uncertainty measure, Balanced Entropy Acquisition (BalEntAcq), which captures the information balance between the uncertainty of underlying softmax probability and the label variable. To do this, we approximate each marginal distribution by Beta distribution. Beta approximation enables us to formulate BalEntAcq as a ratio between an augmented entropy and the marginalized joint entropy. The closed-form expression of BalEntAcq facilitates parallelization by estimating two parameters in each marginal Beta distribution. BalEntAcq is a purely standalone measure without requiring any relational computations with other data points. Nevertheless, BalEntAcq captures a well-diversified selection near the decision boundary with a margin, unlike other existing uncertainty measures such as BALD, Entropy, or Mean Standard Deviation (MeanSD). Finally, we demonstrate that our balanced entropy learning principle with BalEntAcq consistently outperforms well-known linearly scalable active learning methods, including a recently proposed PowerBALD, a simple but diversified version of BALD, by showing experimental results obtained from MNIST, CIFAR-100, SVHN, and TinyImageNet datasets.  ( 3 min )
    Joint Embedding Self-Supervised Learning in the Kernel Regime. (arXiv:2209.14884v1 [cs.LG])
    The fundamental goal of self-supervised learning (SSL) is to produce useful representations of data without access to any labels for classifying the data. Modern methods in SSL, which form representations based on known or constructed relationships between samples, have been particularly effective at this task. Here, we aim to extend this framework to incorporate algorithms based on kernel methods where embeddings are constructed by linear maps acting on the feature space of a kernel. In this kernel regime, we derive methods to find the optimal form of the output representations for contrastive and non-contrastive loss functions. This procedure produces a new representation space with an inner product denoted as the induced kernel which generally correlates points which are related by an augmentation in kernel space and de-correlates points otherwise. We analyze our kernel model on small datasets to identify common features of self-supervised learning algorithms and gain theoretical insights into their performance on downstream tasks.  ( 2 min )
    Analyzing Diffusion as Serial Reproduction. (arXiv:2209.14821v1 [cs.LG])
    Diffusion models are a class of generative models that learn to synthesize samples by inverting a diffusion process that gradually maps data into noise. While these models have enjoyed great success recently, a full theoretical understanding of their observed properties is still lacking, in particular, their weak sensitivity to the choice of noise family and the role of adequate scheduling of noise levels for good synthesis. By identifying a correspondence between diffusion models and a well-known paradigm in cognitive science known as serial reproduction, whereby human agents iteratively observe and reproduce stimuli from memory, we show how the aforementioned properties of diffusion models can be explained as a natural consequence of this correspondence. We then complement our theoretical analysis with simulations that exhibit these key features. Our work highlights how classic paradigms in cognitive science can shed light on state-of-the-art machine learning problems.  ( 2 min )
    Rethinking Counterfactual Explanations as Local and Regional Counterfactual Policies. (arXiv:2209.14568v1 [stat.ML])
    Among the challenges not yet resolved for Counterfactual Explanations (CE), there are stability, synthesis of the various CE and the lack of plausibility/sparsity guarantees. From a more practical point of view, recent studies show that the prescribed counterfactual recourses are often not implemented exactly by the individuals and demonstrate that most state-of-the-art CE algorithms are very likely to fail in this noisy environment. To address these issues, we propose a probabilistic framework that gives a sparse local counterfactual rule for each observation: we provide rules that give a range of values that can change the decision with a given high probability instead of giving diverse CE. In addition, the recourses derived from these rules are robust by construction. These local rules are aggregated into a regional counterfactual rule to ensure the stability of the counterfactual explanations across observations. Our local and regional rules guarantee that the recourses are faithful to the data distribution because our rules use a consistent estimator of the probabilities of changing the decision based on a Random Forest. In addition, these probabilities give interpretable and sparse rules as we select the smallest set of variables having a given probability of changing the decision. Codes for computing our counterfactual rules are available, and we compare their relevancy with standard CE and recent similar attempts.  ( 3 min )
    Sparse PCA With Multiple Components. (arXiv:2209.14790v1 [math.OC])
    Sparse Principal Component Analysis is a cardinal technique for obtaining combinations of features, or principal components (PCs), that explain the variance of high-dimensional datasets in an interpretable manner. At its heart, this involves solving a sparsity and orthogonality constrained convex maximization problem, which is extremely computationally challenging. Most existing work address sparse PCA via heuristics such as iteratively computing one sparse PC and deflating the covariance matrix, which does not guarantee the orthogonality, let alone the optimality, of the resulting solution. We challenge this status by reformulating the orthogonality conditions as rank constraints and optimizing over the sparsity and rank constraints simultaneously. We design tight semidefinite relaxations and propose tractable second-order cone versions of these relaxations which supply high-quality upper bounds. We also design valid second-order cone inequalities which hold when each PC's individual sparsity is specified, and demonstrate that these inequalities tighten our relaxations significantly. Moreover, we propose exact methods and rounding mechanisms that exploit these relaxations' tightness to obtain solutions with a bound gap on the order of 1%-5% for real-world datasets with p = 100s or 1000s of features and r \in {2, 3} components. We investigate the performance of our methods in spiked covariance settings and demonstrate that simultaneously considering the orthogonality and sparsity constraints leads to improvements in the Area Under the ROC curve of 2%-8% compared to state-of-the-art deflation methods. All in all, our approach solves sparse PCA problems with multiple components to certifiable (near) optimality in a practically tractable fashion.  ( 3 min )
    Distributional Reinforcement Learning via Sinkhorn Iterations. (arXiv:2202.00769v3 [cs.LG] UPDATED)
    Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the entire distribution of the total return rather than only its expectation. The empirical success of distributional RL is determined by the representation of return distributions and the choice of distribution divergence. In this paper, we propose a new class of \textit{Sinkhorn distributional RL~(SinkhornDRL)} algorithm that learns a finite set of statistics, i.e., deterministic samples, from each return distribution and then uses Sinkhorn iterations to evaluate the Sinkhorn distance between the current and target Bellmen distributions. Sinkhorn divergence features as the interpolation between the Wasserstein distance and Maximum Mean Discrepancy~(MMD). SinkhornDRL finds a sweet spot by taking advantage of the geometry of optimal transport-based distance and the unbiased gradient estimate property of MMD. Finally, compared to state-of-the-art algorithms, SinkhornDRL's competitive performance is demonstrated on the suit of 55 Atari games.  ( 2 min )
    Contrastive Unsupervised Learning of World Model with Invariant Causal Features. (arXiv:2209.14932v1 [cs.LG])
    In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart.  ( 2 min )
    Minimax Optimal Kernel Operator Learning via Multilevel Training. (arXiv:2209.14430v1 [cs.LG])
    Learning mappings between infinite-dimensional function spaces has achieved empirical success in many disciplines of machine learning, including generative modeling, functional data analysis, causal inference, and multi-agent reinforcement learning. In this paper, we study the statistical limit of learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev reproducing kernel Hilbert spaces. We establish the information-theoretic lower bound in terms of the Sobolev Hilbert-Schmidt norm and show that a regularization that learns the spectral components below the bias contour and ignores the ones that are above the variance contour can achieve the optimal learning rate. At the same time, the spectral components between the bias and variance contours give us flexibility in designing computationally feasible machine learning algorithms. Based on this observation, we develop a multilevel kernel operator learning algorithm that is optimal when learning linear operators between infinite-dimensional function spaces.  ( 2 min )
    Generalized Kernel Regularized Least Squares. (arXiv:2209.14355v1 [stat.ML])
    Kernel Regularized Least Squares (KRLS) is a popular method for flexibly estimating models that may have complex relationships between variables. However, its usefulness to many researchers is limited for two reasons. First, existing approaches are inflexible and do not allow KRLS to be combined with theoretically-motivated extensions such as fixed effects or non-linear outcomes. Second, estimation is extremely computationally intensive for even modestly sized datasets. Our paper addresses both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be re-formulated as a hierarchical model thereby allowing easy inference and modular model construction. Computationally, we also implement random sketching to dramatically accelerate estimation while incurring a limited penalty in estimation quality. We demonstrate that gKRLS can be fit on datasets with tens of thousands of observations in under one minute. Further, state-of-the-art techniques that require fitting the model over a dozen times (e.g. meta-learners) can be estimated quickly.  ( 2 min )

  • Open

    An AI that generates videos from text! | Make-A-Video Explained
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 102 min )
    AI Dream 96 - DreamBooth surpasses the AEtherFlux
    submitted by /u/LordPewPew777 [link] [comments]  ( 102 min )
    DreamFusion: Text-to-3D using 2D Diffusion
    submitted by /u/walt74 [link] [comments]  ( 103 min )
    Phenaki - A model for generating videos from text
    submitted by /u/walt74 [link] [comments]  ( 103 min )
    Bruce Willis sells rights to deepfake firm
    submitted by /u/redtailboas [link] [comments]  ( 102 min )
    Awesome Art created on starryai!!
    https://create.starryai.com/user/Falconess/creation/712339983/ Like and Follow!! submitted by /u/SkyHighArt [link] [comments]  ( 102 min )
    Find out if the text contains an answer
    Is there a working AI that allows you to find out if the text has the answer to the question? When you type a question into Google, it gives a short answer based on search results. Is there something similar available? submitted by /u/dortezy [link] [comments]  ( 107 min )
    CCTV/video & sensor profiling
    Hi, I work in data and starting also my foray into AI. One use case I'm beginning to explore is using CCTV and sensors to profile travelers e.g. cars, pedestrians for counting and profiling purposes i.e. car type, direction of travel, gender, devices used for purposes of understanding users and maybe identify potential merchant partners for nearby locations or ads What are the thoughts of the sub about this with respect to approach and methodology? Appreciate your opinion on where to start submitted by /u/saintmichel [link] [comments]  ( 102 min )
    Train an AI on all scientific information?
    No human can catch up with all the papers published on one day in academic journals and no single human has a detailed overview of all scientific findings that have ever been published. So there may already be scientific findings by a lot of different scientists that can be put together to find a solution to a problem that we have. But no single scientist is aware of all of these findings, because the amount of information is just too big. Even if they collaborate and exchange knowledge, that is still only a tiny part of all the research ever published. If we trained an AI on say the whole PubMed database, would it be possible that the AI could come up with ways to treat diseases that we are currently not able to treat? I guess, it is probably not possible, but I was just having this idea so I thought I'd share it submitted by /u/greentea387 [link] [comments]  ( 103 min )
    Are there any good voice cloning AIs?
    I've seen this video and read articles about deepfaking the voice of someone. Apparently, the software is already there and people immitate voices convincingly using just audio samples of someone talking, and I wonder how I can test that myself. I downloaded this GitHub project, and it works relatively well on the provided samples, but it doesn't work on random recordings from YouTube. Is there any free service I can use to clone voices and use them in text to speech? It wouldn't even matter how much data was required, I'd be willing to download hunderts of hours of YouTube videos and feed it to the AI, if the results were promising. submitted by /u/empirestateisgreat [link] [comments]  ( 102 min )
    How do I find a certain AI voice?
    I'm rather new to AI voices, and so I'm not totally sure what I'm doing. I want to use for videos a very specific voice I've heard a lot but don't know the name of. It is a British accent voice and used frequently in YouTube videos, but the only place I found it on was this website: British English Text to Speech | Play.ht where it called it "Logan". When I looked up that name I couldn't find any other websites for it. This website was particularly bad for my purposes, it wants me to pay $15 a month for more than 5 bits of AI voice a day, I can spare to lose a little money but I'm not paying that just for one voice. This website was much nicer: Uberduck - Make cool stuff with AI and text to speech , but I couldn't find any generic voices on it, and not the one I wanted Does anyone have any suggestions? submitted by /u/TloyCO [link] [comments]  ( 103 min )
    Need help identify what kind of AI this website use
    Hi guys, i recently stumble upon this AI website https://tech-lagoon.com/imagechef/en/image-to-comic.html?reloaded=true I like this feature of turning image into manga style black and white/screentone, i have tried many alternative other than this website, but so far this one is the best i have ever seen. the problem is, the web is kind of buggy and sluggish, the function sometimes works, sometimes doesnt. when i try to contact the web owner, the contact form is broken, thus im unable to contact the web master. my question is, do you guys have any AI alternative that can produce manga style image from a picture just as good as this that i can run locally in my own machine ? really appreciated if you guys can help me point to the right direction best regards submitted by /u/uswin [link] [comments]  ( 102 min )
    My thoughts on DALL·E 2.
    I finally had a chance to try out DALL·E 2 after using Stable Diffusion for free for awhile now. DALL·E 2 does have better coherence and slightly better results, but not enough to justify the cost. Hands and eyes are often misplaced, misshapen, or grotesque. I think we're a generation or two away from hands and eyes being resolved. I know that Google has upgraded their systems (more parameters and more tokens) to fix this issue but due to the pending legal battles for these AI art models they've never released those models to the public. In order to get an acceptable result it sometimes requires a few generations / variations which makes OpenAI's pricing model hard to stomach. I am being very picky as an AI model being able to convert text to images is mind blowing -- but it doesn't tak…  ( 104 min )
    Hello all, we recently beta tested our platform for evaluating the robustness of AI models against adversarial attacks and natural noises, called GuardAI. Based on the feedback we collected during the first test phase, we updated the platform and added new features.
    (Thank you to everyone who participated!) Some of the added features are: support for dataset poisoning detection for classification models (Spectral Signature Detection) support for several defenses (Gaussian Noise, Gaussian Augmentation, Reverse Sigmoid) support for the Kitti dataset format attacks and visualization for depth perception tasks webhook functionality to enable easy workflow automation performance improvement and more. If you haven't tested it so far, you can make an account and test out the updated version. Your feedback is really appreciated. You can sign up here https://www.navinfo.eu/services/cybersecurity/guardai/ and leave your feedback directly through the platform. Thank you! GuardAI We harness the power of AI and Cybersecurity to develop more secure and robust solutions. submitted by /u/GuardAITeam [link] [comments]  ( 103 min )
    An Introduction to Active Learning in Machine Learning
    submitted by /u/encord_team [link] [comments]  ( 102 min )
    Open AI removes waiting list for DALL.E 2
    Open AI has now removed the long wait time list and now all interested users can go ahead and get creating AI art. Have a pleasant day submitted by /u/Vixair-AI [link] [comments]  ( 105 min )
    This guy is using AI to make a movie — and you can help decide what happens next
    submitted by /u/TallAssociation0 [link] [comments]  ( 107 min )
    AI apps
    Pls recommend me AI apps for art which are free with no limits submitted by /u/Waakaari [link] [comments]  ( 106 min )
    How Much Data is Enough for Small Dataset-Based Object Detection?
    Hey, I want to share a podcast with you that I found recently. They try to debunk a popular myth about machines only learning from large amounts of data, and share a use case of applying ML with a small dataset. What do you think about it? https://youtu.be/ZVen_YiGcuc submitted by /u/Data-Power [link] [comments]  ( 102 min )
    Growing presence of AI in court rooms raises concerns in report
    submitted by /u/BatCertain1868 [link] [comments]  ( 102 min )
    EU proposes rule changes to make it easier to sue drone makers
    submitted by /u/BatCertain1868 [link] [comments]  ( 102 min )
  • Open

    Neural net computing in water
    submitted by /u/keghn [link] [comments]  ( 102 min )
  • Open

    Day of the year
    Occasionally it’s useful to find the day of the year. For example, today is 272nd day of 2022. How hard would it be to calculate the day of the year in your head? Each month has about 30 days, so the dth day of the mth month is approximately day 30(m – 1) + d […] Day of the year first appeared on John D. Cook.  ( 5 min )
  • Open

    What are your thoughts about L4DC conference?
    Is it worth trying? How about its reputation? https://l4dc.seas.upenn.edu/ Based on its previous proceedings, it seems to be a nice conference. What do you think? submitted by /u/Blasphemer666 [link] [comments]  ( 113 min )
    "Top-down design of protein nanomaterials with reinforcement learning", Lutz et al 2022
    submitted by /u/gwern [link] [comments]  ( 102 min )
    How does having zero advantage help with identifiability in D3QN?
    Sorry for asking another question Dueling Deep Q Networks. I think this paper is a tad bit more confusing than the usual. https://ai.stackexchange.com/q/37234/31755 submitted by /u/EffectiveDistinct828 [link] [comments]  ( 103 min )
    What to recommend and who to recommend it to if I build a recommender system based off of posts and comments on Reddit?
    So I want to create a small recommender system from reddit. This website is literally a playground for this sort of thing, since it is being updated in real time when users are making posts and comments. I've already sort of figured out how I could make an environment out of reddit, basically my agent would interact with a stream of data coming from some subreddit or multiple subreddit. PRAW can be used to create the streams. Now the issue here is that I'm not quite so sure who I would recommend things to, and what I should recommend. Generally speaking the easiest thing to do would be to recommend posts to people, but that seems so oblivious and boring. I'm a software engineer by trade, I've only recently begun to get into AI, and figuring this out is somewhat complicated for me. I have the idea that I want to build this model and then have it connect to a websocket API in Java and then to some other client so that it can show its progress in real time as well so that it can be made interactive. However, I'm sort of lost on what I should recommend or not recommend to people. Was wondering if anybody could give me any suggestions on possible things to use a recommender system for on reddit. submitted by /u/ThroawayX91 [link] [comments]  ( 113 min )
    Is it possible to install baselines on M1 Mac?
    Looks like it only supports tf1, but the oldest version of tensorflow-macos is of tf2. I'm trying to run this for reference. submitted by /u/killerdrogo [link] [comments]  ( 112 min )
    GAIL without actions?
    I have successfully trained GAIL-PPO with an expert, now I have some experiment data where the observations are easily captured, but the actions are hard and less accurate. Is it possible to just zero out all (expert and generator) actions before feeding them to the discriminator? what are your thoughts? Have anyone tried it before? I'm using SB3 with Imitation submitted by /u/Windgineer2 [link] [comments]  ( 102 min )
    Syntax help
    "import gym from gym.spaces import Discrete import random class MyClass(gym.Env): def __int__(self): self.action_space = Discrete(3) ... ... action = env.action_space.sample()" ​ Can someone help me why is it showing me "action = env.action_space.sample(). AttributeError: 'NoneType' object has no attribute 'sample'" error?? Also, I want to use action_space values in an if-else statement so can I? I guess if else do not accept discrete values. submitted by /u/Asleep-Ad4480 [link] [comments]  ( 115 min )
    New repo is coming! DIgging can be used to digging better candidates to handle combinatorial optimization problems and non-gradient search problems.
    submitted by /u/OpenDILab [link] [comments]  ( 102 min )
  • Open

    [D] Why is random cropping necessary in SimCLR?
    The SimCLR paper does unsupervised contrastive learning to create image representations. 2 of the most important augmentations are: random cropping color jittering Color jittering makes sense: you want to make sure the model isn't just looking at the histograms of the colors as a shortcut. But why is random cropping necessary as an augmentation? submitted by /u/vanilla-acc [link] [comments]  ( 104 min )
    [P] Calculate salaries
    Hello, I have a study here that lists different examples of salaries. 35 years, system administrator, 4 years of work experience, ... X thousand euros per year. Then I have another study comparing the states, one for age structure etc etc. If I have understood Artificial Intelligence correctly, it should be possible to make an input form, in which I can enter age etc etc and then a caclulated average value comes out. I have now downloaded the demo version of Tableu and created an account at Microsoft Azure for Machinelerning.... but somehow I can't find a start. Does anyone have any experience here? Is there e.g. something comparable for real estate or used cars or the like what I can copy and change accordingly? Greetings Dee3Doo submitted by /u/Deedreidoo [link] [comments]  ( 104 min )
    [P] Free ML counting software?
    I am looking for free software which can count items passing through a set boundary and can tell which direction they pass the boundary from. Is there any free software available which can do this? submitted by /u/Necropolizer [link] [comments]  ( 103 min )
    [D] Is neural network really smart or just some advanced level (parametric) regression ?
    Are we on the right path to AGI ? The progress in the field of narrow AI is undoubtedly insane, certainly there will be huge implementations in future where AI will be far more efficient and productive than humans on a particular task but at the direction where we are headed, why do I feel like it won't lead us to a super intelligence AGI supremacy ? May be we left something behind to make neurons that replicates how a living organism learns, adapts, changes or thinks by itself ? just a random thought, would love to hear you opinions. submitted by /u/tempting_atom [link] [comments]  ( 119 min )
    [D] Is the private research (military, industrial) far ahead from publicly available research ?
    Computer vision for drone or military applications are often impressive in term of accuracy , robustness. The in some industrial technologies Are their technologies ahead from the public versions ? Generally talking submitted by /u/adrienrsn [link] [comments]  ( 104 min )
    [D] What is the real scientific % of contribution of your 2nd authors in tour papers ?
    Edit title : in your papers * Hey, I’ m a PhD student and the real contribution of the 2nd author of my papers is roughly reviewing for grammar stuff the pre-submission version of my papers ? Scientifically it is almost 0. I didn’t even mention the third or n-th name , which are co-authors only for politics/statistic reasons. And peoples surrounding me constate the same. Is that general for phd student/post doc ? submitted by /u/adrienrsn [link] [comments]  ( 121 min )
    [D] What is the best diffusion-based models github for image-to-image translation?
    Could anybody recommend a nice github repository to start a project on image-to-image translation with diffusion-based models? We are starting this new project and ideally the model can handle a translation from 4-channel to 3-channel images of the same size. Thank you so much! submitted by /u/Blutjens [link] [comments]  ( 103 min )
    [R] Meta-FAIR releases Make-a-Video, a model that generates videos from texts or images
    A dog wearing a Superhero outfit with red cape flying through the sky https://makeavideo.studio/ submitted by /u/JClub [link] [comments]  ( 105 min )
    [D] Are there any models that can process existing speech audio and improve its quality?
    Hello. I am looking for a way to improve sound of my previous recordings which have poor audio quality. Are there such AI models at the moment that can process the existing speech audio and improve its quality? We always see that there are ways to improve old movies quality but I wonder if there are such models to improve speech quality as well. submitted by /u/CeFurkan [link] [comments]  ( 117 min )
    [P] Are you having trouble building and managing complex data pipelines?
    After 5+ years of working on data at Airbnb and managing thousands of pipelines for Mage users, we open-sourced our data pipeline tool: https://github.com/mage-ai/mage-ai https://preview.redd.it/kyb4090nktq91.png?width=1080&format=png&auto=webp&s=c9abfbffc7d61577b9576540d411f04e5c68c118 This tool’s core design principles are: 1. Easy developer experience 2. Engineering best practices built-in 3. Scaling is made simple 4. Data is a first-class citizen I’d love your feedback, thoughts, comments, etc. Also, I’m happy to hop on a Zoom call and help you get setup and give you an overview of the tool. Slack: https://mage.ai/chat submitted by /u/tchungry [link] [comments]  ( 106 min )
    [D] Does Rasa X/Enterprise have an individual developer license?
    Hi, For those working with Rasa Framework, you may have noticed that the Rasa X (a CDD tool) community edition has been deprecated. Are there any open source alternatives that provide the same features as it? Also, Are there any Individual Developer Plans for their Rasa Enterprise product? If yes, could you please let me know how to enroll for that submitted by /u/Creative_Jellyfish53 [link] [comments]  ( 103 min )
    [D] Relation between active learning and optimal design of (sequential) experiments?
    For an upcoming project, I'm interested in learning a bit more about methods for adaptive sampling of data (e.g., where we regularly sample new data, with some control over which source the data is coming from). The two areas that deal with that sort of task (to my knowledge, which is very shallow in this area) are active learning and design of sequential experiments. However, after reading about both, I was a little unsure about how two describe the relations between those two areas. So I thought I'd ask the group here - would you say that one is a subset/example of the other? How would you say they are related/different? My *impression* so far is that it wouldn't be unreasonable to call optimal design of sequential experiments a subset of active learning, but I'd like to hear other peoples' takes on that. Also, if anyone has any relevant article/textbook recommendations, I always have room on my bookshelves. ;) Thanks in advance for any thoughts or advice! submitted by /u/malenkydroog [link] [comments]  ( 105 min )
    [R] Introducing Make-A-Video: An AI system that generates videos from text
    Facebook's blog post: https://ai.facebook.com/blog/generative-ai-text-to-video/ Project URL: https://makeavideo.studio/ submitted by /u/hardmaru [link] [comments]  ( 103 min )
    [P] Combining stable diffusion with semantic search to categorise, tag, and query 100k images of hot dogs
    I have been particularly interested in generating synthetic datasets using stable diffusion for various machine learning purposes (and I also think this is going to be a big area). However, I started to run into problems trying to manage them or even know what was in them (since there is a large variance in the outputs for the same prompt). I think one compelling solution to this problem is using a semantic search system to query, store, and categorize the generated images. I did some experimentation (see below) to see how this could work on a synthetic dataset of 100k hot dogs. One other thing that really dawned on me as well was the progress with prompt engineering could really impact search and search query curation due to the shared models (i.e. CLIP). Anyway, the exploration is below and I would love to hear any feedback! Article: https://github.com/marqo-ai/marqo/blob/mainline/examples/StableDiffusion/hot-dog-100k.md Code: https://github.com/marqo-ai/marqo/blob/mainline/examples/StableDiffusion/hot-dog-100k.py submitted by /u/Jesse_marqo [link] [comments]  ( 106 min )
    [P] Question about machine learning use
    I hope I used the right tag, I am new here I am working on a paper about the use of machine learning in chemistry and have a problem. Sometimes I see authors only testing one model (Support vector machine) without testing other types of machine learning (Tree based, Regression based, etc.) What is the correct way? Could one test multiple types of machine learning models or is one enough? I expected that you always want to try different types of models on your problem, but I might be wrong submitted by /u/Feeling-Mammoth-5867 [link] [comments]  ( 106 min )
    [P] Question Answering/analysing images with text with LLMs
    Text-Generator.io now pulls down and analyses images with text in them (as well as links and other types of images) https://text-generator.io/blog/document-question-answering submitted by /u/leepenkman [link] [comments]  ( 103 min )
    [D]What should be my train data in few shot learning?
    I want to apply few shot learning with either Siamese network or prototypical network to perform image classification. My data consist of 2 classes good and bad objects, there are 700 samples for good object and 5 samples for bad object. I am confused how should I split my data for train, validate and test for few shot learning and how many tasks it should be? submitted by /u/JellyfishPretend447 [link] [comments]  ( 105 min )
    [D] reporting only weighted average as machine learning performance measure?
    How do you feel about reporting only the weighted average of precision, recall, F1, accuracy and roc/auc. Is that a good idea ? submitted by /u/javagarbagecollector [link] [comments]  ( 106 min )
    [R] Need a face image restoration dataset.
    Hey there, as the title goes, I need a face image restoration dataset for a small research idea I have in mind. Basically one with missing facial features or occluded facial features. Any help regarding where I might get one, or if one doesn't exist, how to generate one is greatly appreciated. Thanks. submitted by /u/StupidlyGenius0 [link] [comments]  ( 103 min )
    [R] CoordConv coordinate regression test shows performance degradation when model contains downsampling
    I have created a repository where I do an alternative test for the regression task described in the Uber CoordConv paper. I just train all pixel positions, so there is no train-test split, since I couldn't get perfect results even on the train set (I haven't tried learning rate scheduling). The results indicate that if the model has downsampling in it, CoordConv degrades performance: learning is slower and the end result is the same. I assumed that the test as described in the blog post also had downsampling in it, but this is not the case. Unfortunately, this makes the test described there uninteresting, hence this alternative test. On the other hand, regular convolution seems to do a lot better than indicated in Uber's results. https://preview.redd.it/bog9spjibpq91.png?width=1700&format=png&auto=webp&s=e82d900f846486feb908170f47488f751a2e7271 Update: at the suggestion of dracheschreck I have added a model to the repository where all convolutions are CoordConvs. It tends to perform the worst. Results: https://preview.redd.it/hdmxcbscxtq91.png?width=1500&format=png&auto=webp&s=cf7a994e90bb336fbdb7a1254616bd7c8ec84f58 ​ submitted by /u/dineNshine [link] [comments]  ( 104 min )
    [P] Participating in the Myosuite challenge at NeurIPS2022 on dexterous control? We are releasing a baseline and starter code to help you get started using EvoTorch!
    The Myosuite challenge (https://sites.google.com/view/myochallenge) at NeurIPS2022 tests our ability to build and train policies for contact-rich manipulation skills. EvoTorch (evotorch.ai) makes it straightforward to apply evolutionary reinforcement learning to the challenge. We've included setup help, training and visualisation scripts, a baseline controller trained through the provided script and help for submission to the competition. Simply head to the public GitHub to get started: https://github.com/nnaisense/evotorch-myosuite-starter If you need any more help getting started, come talk to us on our slack Here's a video of the baseline controller that we've included: https://reddit.com/link/xqtyw6/video/10qwrq795pq91/player submitted by /u/NaturalGradient [link] [comments]  ( 89 min )
  • Open

    How Sophos trains a powerful, lightweight PDF malware detector at ultra scale with Amazon SageMaker
    This post is co-authored by Salma Taoufiq and Harini Kannan from Sophos. As a leader in next-generation cybersecurity, Sophos strives to protect more than 500,000 organizations and millions of customers across over 150 countries against evolving threats. Powered by threat intelligence, machine learning (ML), and artificial intelligence from Sophos X-Ops, Sophos delivers a broad and […]  ( 10 min )
  • Open

    The Wheel Deal: ‘Racer RTX’ Demo Revs to Photorealistic Life, Built on NVIDIA Omniverse
    NVIDIA artists ran their engines at full throttle for the stunning Racer RTX demo, which debuted at last week’s GTC keynote, showcasing the power of NVIDIA Omniverse and the new GeForce RTX 4090 GPU. “Our goal was to create something that had never been done before,” said Gabriele Leone, creative director at NVIDIA, who led Read article > The post The Wheel Deal: ‘Racer RTX’ Demo Revs to Photorealistic Life, Built on NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 6 min )
    All This and Mor-a Are Yours With Exclusive ‘Genshin Impact’ GeForce NOW Membership Reward
    It’s good to be a GeForce NOW member. Genshin Impact’s new Version 3.1 update launches this GFN Thursday, just in time for the game’s second anniversary. Even better: GeForce NOW members can get an exclusive starter pack reward, perfect for their first steps in HoYoverse’s open-world adventure, action role-playing game. And don’t forget the nine Read article > The post All This and Mor-a Are Yours With Exclusive ‘Genshin Impact’ GeForce NOW Membership Reward appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    End-to-End Intelligent Framework for Rockfall Detection. (arXiv:2102.06491v1 [cs.LG] CROSS LISTED)
    Rockfall detection is a crucial procedure in the field of geology, which helps to reduce the associated risks. Currently, geologists identify rockfall events almost manually utilizing point cloud and imagery data obtained from different caption devices such as Terrestrial Laser Scanner or digital cameras. Multi-temporal comparison of the point clouds obtained with these techniques requires a tedious visual inspection to identify rockfall events which implies inaccuracies that depend on several factors such as human expertise and the sensibility of the sensors. This paper addresses this issue and provides an intelligent framework for rockfall event detection for any individual working in the intersection of the geology domain and decision support systems. The development of such an analysis framework poses significant research challenges and justifies intensive experimental analysis. In particular, we propose an intelligent system that utilizes multiple machine learning algorithms to detect rockfall clusters of point cloud data. Due to the extremely imbalanced nature of the problem, a plethora of state-of-the-art resampling techniques accompanied by multiple models and feature selection procedures are being investigated. Various machine learning pipeline combinations have been benchmarked and compared applying well-known metrics to be incorporated into our system. Specifically, we developed statistical and machine learning techniques and applied them to analyze point cloud data extracted from Terrestrial Laser Scanner in two distinct case studies, involving different geological contexts: the basaltic cliff of Castellfollit de la Roca and the conglomerate Montserrat Massif, both located in Spain. Our experimental data suggest that some of the above-mentioned machine learning pipelines can be utilized to detect rockfall incidents on mountain walls, with experimentally proven accuracy.  ( 3 min )
    A Survey on Ensemble Learning under the Era of Deep Learning. (arXiv:2101.08387v6 [cs.LG] UPDATED)
    Due to the dominant position of deep learning (mostly deep neural networks) in various artificial intelligence applications, recently, ensemble learning based on deep neural networks (ensemble deep learning) has shown significant performances in improving the generalization of learning system. However, since modern deep neural networks usually have millions to billions of parameters, the time and space overheads for training multiple base deep learners and testing with the ensemble deep learner are far greater than that of traditional ensemble learning. Though several algorithms of fast ensemble deep learning have been proposed to promote the deployment of ensemble deep learning in some applications, further advances still need to be made for many applications in specific fields, where the developing time and computing resources are usually restricted or the data to be processed is of large dimensionality. An urgent problem needs to be solved is how to take the significant advantages of ensemble deep learning while reduce the required expenses so that many more applications in specific fields can benefit from it. For the alleviation of this problem, it is essential to know about how ensemble learning has developed under the era of deep learning. Thus, in this article, we present fundamental discussions focusing on data analyses of published works, methodologies, recent advances and unattainability of traditional ensemble learning and ensemble deep learning. We hope this article will be helpful to realize the intrinsic problems and technical challenges faced by future developments of ensemble learning under the era of deep learning.  ( 3 min )
    Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective. (arXiv:2205.07320v3 [cs.LG] UPDATED)
    The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.
    Contrastive Learning for Unsupervised Domain Adaptation of Time Series. (arXiv:2206.06243v2 [cs.LG] UPDATED)
    Unsupervised domain adaptation (UDA) aims at learning a machine learning model using a labeled source domain that performs well on a similar yet different, unlabeled target domain. UDA is important in many applications such as medicine, where it is used to adapt risk scores across different patient cohorts. In this paper, we develop a novel framework for UDA of time series data, called CLUDA. Specifically, we propose a contrastive learning framework to learn contextual representations in multivariate time series, so that these preserve label information for the prediction task. In our framework, we further capture the variation in the contextual representations between source and target domain via a custom nearest-neighbor contrastive learning. To the best of our knowledge, ours is the first framework to learn domain-invariant, contextual representation for UDA of time series data. We evaluate our framework using a wide range of time series datasets to demonstrate its effectiveness and show that it achieves state-of-the-art performance for time series UDA.
    Cooperate or Compete: A New Perspective on Training of Generative Networks. (arXiv:2207.02192v6 [cs.LG] UPDATED)
    GANs have two competing modules: the generator module is trained to generate new examples, and the discriminator module is trained to discriminate real examples from generated examples. The training procedure of GAN is modeled as a finitely repeated simultaneous game. Each module tries to increase its performance at every repetition of the base game (at every batch of training data) in a non-cooperative manner. We observed that each module can perform better and learn faster if training is modeled as an infinitely repeated simultaneous game. At every repetition of the base game (at every batch of training data) the stronger module (whose performance is increased or remains the same compared to the previous batch of training data) cooperates with the weaker module (whose performance is decreased compared to the previous batch of training data) and only the weaker module is allowed to increase its performance.
    Differentially Private Covariance Revisited. (arXiv:2205.14324v3 [cs.CR] UPDATED)
    In this paper, we present two new algorithms for covariance estimation under concentrated differential privacy (zCDP). The first algorithm achieves a Frobenius error of $\tilde{O}(d^{1/4}\sqrt{\mathrm{tr}}/\sqrt{n} + \sqrt{d}/n)$, where $\mathrm{tr}$ is the trace of the covariance matrix. By taking $\mathrm{tr}=1$, this also implies a worst-case error bound of $\tilde{O}(d^{1/4}/\sqrt{n})$, which improves the standard Gaussian mechanism's $\tilde{O}(d/n)$ for the regime $d>\widetilde{\Omega}(n^{2/3})$. Our second algorithm offers a tail-sensitive bound that could be much better on skewed data. The corresponding algorithms are also simple and efficient. Experimental results show that they offer significant improvements over prior work.
    First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces. (arXiv:2206.02041v2 [math.OC] UPDATED)
    From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative-we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.
    SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks. (arXiv:2206.05794v2 [cs.LG] UPDATED)
    We analyze deep ReLU neural networks trained with mini-batch Stochastic Gradient Descent (SGD) and weight decay. We show, both theoretically and empirically, that when training a neural network using SGD with weight decay and small batch size, the resulting weight matrices tend to be of small rank. Our analysis relies on a minimal set of assumptions; the neural networks may be arbitrarily wide or deep and may include residual connections, as well as convolutional layers. The same analysis implies the inherent presence of SGD "noise", defined as the inability of SGD to converge to a stationary point. In particular, we prove that SGD noise must always be present, even asymptotically, as long as we incorporate weight decay and the batch size is smaller than the total number of training samples.
    Randomized K-FACs: Speeding up K-FAC with Randomized Numerical Linear Algebra. (arXiv:2206.15397v2 [cs.LG] UPDATED)
    K-FAC is a successful tractable implementation of Natural Gradient for Deep Learning, which nevertheless suffers from the requirement to compute the inverse of the Kronecker factors (through an eigen-decomposition). This can be very time-consuming (or even prohibitive) when these factors are large. In this paper, we theoretically show that, owing to the exponential-average construction paradigm of the Kronecker factors that is typically used, their eigen-spectrum must decay. We show numerically that in practice this decay is very rapid, leading to the idea that we could save substantial computation by only focusing on the first few eigen-modes when inverting the Kronecker-factors. Randomized Numerical Linear Algebra provides us with the necessary tools to do so. Numerical results show we obtain $\approx2.5\times$ reduction in per-epoch time and $\approx3.3\times$ reduction in time to target accuracy. We compare our proposed K-FAC sped-up versions with a more computationally efficient NG implementation, SENG, and observe we perform on par with it.
    Wave simulation in non-smooth media by PINN with quadratic neural network and PML condition. (arXiv:2208.08276v2 [physics.geo-ph] UPDATED)
    Frequency-domain simulation of seismic waves plays an important role in seismic inversion, but it remains challenging in large models. The recently proposed physics-informed neural network (PINN), as an effective deep learning method, has achieved successful applications in solving a wide range of partial differential equations (PDEs), and there is still room for improvement on this front. For example, PINN can lead to inaccurate solutions when PDE coefficients are non-smooth and describe structurally-complex media. In this paper, we solve the acoustic and visco-acoustic scattered-field wave equation in the frequency domain with PINN instead of the wave equation to remove source singularity. We first illustrate that non-smooth velocity models lead to inaccurate wavefields when no boundary conditions are implemented in the loss function. Then, we add the perfectly matched layer (PML) conditions in the loss function of PINN and design a quadratic neural network to overcome the detrimental effects of non-smooth models in PINN. We show that PML and quadratic neurons improve the results as well as attenuation and discuss the reason for this improvement. We also illustrate that a network trained during a wavefield simulation can be used to pre-train the neural network of another wavefield simulation after PDE-coefficient alteration and improve the convergence speed accordingly. This pre-training strategy should find application in iterative full waveform inversion (FWI) and time-lag target-oriented imaging when the model perturbation between two consecutive iterations or two consecutive experiments can be small.
    Selective Cross-Task Distillation. (arXiv:2204.11526v3 [cs.LG] UPDATED)
    The outpouring of various pre-trained models empowers knowledge distillation by providing abundant teacher resources, but there lacks a developed mechanism to utilize these teachers adequately. With a massive model repository composed of teachers pre-trained on diverse tasks, we must surmount two obstacles when using knowledge distillation to learn a new task. First, given a fixed computing budget, it is not affordable to try each teacher and train the student repeatedly, making it necessary to seek out the most contributive teacher precisely and efficiently. Second, semantic gaps exist between the teachers and the target student since they are trained on different tasks. Thus, we need to extract knowledge from a general label space that may be different from the student's. Faced with these two challenges, we study a new setting named selective cross-task distillation that includes teacher assessment and generalized knowledge reuse. We bridge the teacher's label space and the student's label space through optimal transport. The transportation cost from the teacher's prediction to the student's prediction measures the relatedness between two tasks and acts as an objective for distillation. Our method reuses cross-task knowledge from a distinct label space and efficiently assesses teachers without enumerating the model repository. Experiments demonstrate the effectiveness of our proposed method.
    Topological Data Analysis in Time Series: Temporal Filtration and Application to Single-Cell Genomics. (arXiv:2204.14048v2 [cs.LG] UPDATED)
    The absence of a conventional association between the cell-cell cohabitation and its emergent dynamics into cliques during development has hindered our understanding of how cell populations proliferate, differentiate, and compete, i.e. the cell ecology. With the recent advancement of the single-cell RNA-sequencing (RNA-seq), we can potentially describe such a link by constructing network graphs that characterize the similarity of the gene expression profiles of the cell-specific transcriptional programs, and analyzing these graphs systematically using the summary statistics informed by the algebraic topology. We propose the single-cell topological simplicial analysis (scTSA). Applying this approach to the single-cell gene expression profiles from local networks of cells in different developmental stages with different outcomes reveals a previously unseen topology of cellular ecology. These networks contain an abundance of cliques of single-cell profiles bound into cavities that guide the emergence of more complicated habitation forms. We visualize these ecological patterns with topological simplicial architectures of these networks, compared with the null models. Benchmarked on the single-cell RNA-seq data of zebrafish embryogenesis spanning 38,731 cells, 25 cell types and 12 time steps, our approach highlights the gastrulation as the most critical stage, consistent with consensus in developmental biology. As a nonlinear, model-independent, and unsupervised framework, our approach can also be applied to tracing multi-scale cell lineage, identifying critical stages, or creating pseudo-time series.
    ASTROMER: A transformer-based embedding for the representation of light curves. (arXiv:2205.01677v2 [astro-ph.IM] UPDATED)
    Taking inspiration from natural language embeddings, we present ASTROMER, a transformer-based model to create representations of light curves. ASTROMER was trained on millions of MACHO R-band samples, and it can be easily fine-tuned to match specific domains associated with downstream tasks. As an example, this paper shows the benefits of using pre-trained representations to classify variable stars. In addition, we provide a python library including all functionalities employed in this work. Our library includes the pre-trained models that can be used to enhance the performance of deep learning models, decreasing computational resources while achieving state-of-the-art results.
    Constraint-Based Causal Structure Learning from Undersampled Graphs. (arXiv:2205.09235v3 [stat.ML] UPDATED)
    Graphical structures estimated by causal learning algorithms from time series data can provide highly misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Although this problem has been recently recognized, practitioners have limited resources to respond to it, and so must continue using models that they know are likely misleading. Existing methods either (a) require that the difference between causal and measurement timescales is known; or (b) can handle only very small number of random variables when the timescale difference is unknown; or (c) apply to only pairs of variables, though with fewer assumptions about prior knowledge; or (d) return impractically too many solutions. This paper addresses all four challenges. We combine constraint programming with both theoretical insights into the problem structure and prior information about admissible causal interactions. The resulting system provides a practical approach that scales to significantly larger sets (>100) of random variables, does not require precise knowledge of the timescale difference, supports edge misidentification and parametric connection strengths, and can provide the optimum choice among many possible solutions. The cumulative impact of these improvements is gain of multiple orders of magnitude in speed and informativeness.
    Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe. (arXiv:2209.05324v2 [cs.CV] UPDATED)
    Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending and drawing extensive attention both from industry and academia. Conventional approaches for most autonomous driving algorithms perform detection, segmentation, tracking, etc., in a front or perspective view. As sensor configurations get more complex, integrating multi-source information from different sensors and representing features in a unified view come of vital importance. BEV perception inherits several advantages, as representing surrounding scenes in BEV is intuitive and fusion-friendly; and representing objects in BEV is most desirable for subsequent modules as in planning and/or control. The core problems for BEV perception lie in (a) how to reconstruct the lost 3D information via view transformation from perspective view to BEV; (b) how to acquire ground truth annotations in BEV grid; (c) how to formulate the pipeline to incorporate features from different sources and views; and (d) how to adapt and generalize algorithms as sensor configurations vary across different scenarios. In this survey, we review the most recent work on BEV perception and provide an in-depth analysis of different solutions. Moreover, several systematic designs of BEV approach from the industry are depicted as well. Furthermore, we introduce a full suite of practical guidebook to improve the performance of BEV perception tasks, including camera, LiDAR and fusion inputs. At last, we point out the future research directions in this area. We hope this report would shed some light on the community and encourage more research effort on BEV perception. We keep an active repository to collect the most recent work and provide a toolbox for bag of tricks at https://github.com/OpenPerceptionX/BEVPerception-Survey-Recipe.
    SGTM 2.0: Autonomously Untangling Long Cables using Interactive Perception. (arXiv:2209.13706v1 [cs.RO])
    Cables are commonplace in homes, hospitals, and industrial warehouses and are prone to tangling. This paper extends prior work on autonomously untangling long cables by introducing novel uncertainty quantification metrics and actions that interact with the cable to reduce perception uncertainty. We present Sliding and Grasping for Tangle Manipulation 2.0 (SGTM 2.0), a system that autonomously untangles cables approximately 3 meters in length with a bilateral robot using estimates of uncertainty at each step to inform actions. By interactively reducing uncertainty, Sliding and Grasping for Tangle Manipulation 2.0 (SGTM 2.0) reduces the number of state-resetting moves it must take, significantly speeding up run-time. Experiments suggest that SGTM 2.0 can achieve 83% untangling success on cables with 1 or 2 overhand and figure-8 knots, and 70% termination detection success across these configurations, outperforming SGTM 1.0 by 43% in untangling accuracy and 200% in full rollout speed. Supplementary material, visualizations, and videos can be found at sites.google.com/view/sgtm2.
    A Novel Nearest Neighbors Algorithm Based on Power Muirhead Mean. (arXiv:2209.01514v2 [cs.LG] UPDATED)
    K-Nearest Neighbors algorithm is one of the most used classifiers in terms of simplicity and performance. Although, when a dataset has many outliers or when it is small or unbalanced, KNN doesn't work well. This paper aims to propose a novel classifier, based on K-Nearest Neighbors which calculates the local means of every class using the Power Muirhead Mean operator to overcome alluded issues. We called our new algorithm Power Muirhead Mean K-Nearest Neighbors (PMM-KNN). Eventually, we used five well-known datasets to assess PMM-KNN performance. The research results demonstrate that the PMM-KNN has outperformed three state-of-the-art classification methods in all experiments.
    Importance Tempering: Group Robustness for Overparameterized Models. (arXiv:2209.08745v2 [cs.LG] UPDATED)
    Although overparameterized models have shown their success on many machine learning tasks, the accuracy could drop on the testing distribution that is different from the training one. This accuracy drop still limits applying machine learning in the wild. At the same time, importance weighting, a traditional technique to handle distribution shifts, has been demonstrated to have less or even no effect on overparameterized models both empirically and theoretically. In this paper, we propose importance tempering to improve the decision boundary and achieve consistently better results for overparameterized models. Theoretically, we justify that the selection of group temperature can be different under label shift and spurious correlation setting. At the same time, we also prove that properly selected temperatures can extricate the minority collapse for imbalanced classification. Empirically, we achieve state-of-the-art results on worst group classification tasks using importance tempering.
    A General Framework for Analyzing Stochastic Dynamics in Learning Algorithms. (arXiv:2006.06171v3 [math.OC] UPDATED)
    One of the challenges in analyzing learning algorithms is the circular entanglement between the objective value and the stochastic noise. This is also known as the "chicken and egg" phenomenon and traditionally, there is no principled way to tackle this issue. People solve the problem by utilizing the special structure of the dynamic, and hence the analysis would be difficult to generalize. In this work, we present a streamlined three-step recipe to tackle the "chicken and egg" problem and give a general framework for analyzing stochastic dynamics in learning algorithms. Our framework composes standard techniques from probability theory, such as stopping time and martingale concentration. We demonstrate the power and flexibility of our framework by giving a unifying analysis for three very different learning problems with the last iterate and the strong uniform high probability convergence guarantee. The problems are stochastic gradient descent for strongly convex functions, streaming principal component analysis, and linear bandit with stochastic gradient descent updates. We either improve or match the state-of-the-art bounds on all three dynamics.  ( 2 min )
    Error-Correcting Neural Networks for Two-Dimensional Curvature Computation in the Level-Set Method. (arXiv:2201.12342v3 [math.NA] UPDATED)
    We present an error-neural-modeling-based strategy for approximating two-dimensional curvature in the level-set method. Our main contribution is a redesigned hybrid solver [Larios-C\'ardenas and Gibou, J. Comput. Phys. (May 2022), 10.1016/j.jcp.2022.111291] that relies on numerical schemes to enable machine-learning operations on demand. In particular, our routine features double predicting to harness curvature symmetry invariance in favor of precision and stability. The core of this solver is a multilayer perceptron trained on circular- and sinusoidal-interface samples. Its role is to quantify the error in numerical curvature approximations and emit corrected estimates for select grid vertices along the free boundary. These corrections arise in response to preprocessed context level-set, curvature, and gradient data. To promote neural capacity, we have adopted sample negative-curvature normalization, reorientation, and reflection-based augmentation. In the same manner, our system incorporates dimensionality reduction, well-balancedness, and regularization to minimize outlying effects. Our training approach is likewise scalable across mesh sizes. For this purpose, we have introduced dimensionless parametrization and probabilistic subsampling during data production. Together, all these elements have improved the accuracy and efficiency of curvature calculations around under-resolved regions. In most experiments, our strategy has outperformed the numerical baseline at twice the number of redistancing steps while requiring only a fraction of the cost.  ( 3 min )
    Imbalanced Graph Classification via Graph-of-Graph Neural Networks. (arXiv:2112.00238v2 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have achieved unprecedented success in identifying categorical labels of graphs. However, most existing graph classification problems with GNNs follow the protocol of balanced data splitting, which misaligns with many real-world scenarios in which some classes have much fewer labels than others. Directly training GNNs under this imbalanced scenario may lead to uninformative representations of graphs in minority classes, and compromise the overall classification performance, which signifies the importance of developing effective GNNs towards handling imbalanced graph classification. Existing methods are either tailored for non-graph structured data or designed specifically for imbalanced node classification while few focus on imbalanced graph classification. To this end, we introduce a novel framework, Graph-of-Graph Neural Networks (G$^2$GNN), which alleviates the graph imbalance issue by deriving extra supervision globally from neighboring graphs and locally from stochastic augmentations of graphs. Globally, we construct a graph of graphs (GoG) based on kernel similarity and perform GoG propagation to aggregate neighboring graph representations. Locally, we employ topological augmentation via masking node features or dropping edges with self-consistency regularization to generate stochastic augmentations of each graph that improve the model generalibility. Extensive graph classification experiments conducted on seven benchmark datasets demonstrate our proposed G$^2$GNN outperforms numerous baselines by roughly 5\% in both F1-macro and F1-micro scores. The implementation of G$^2$GNN is available at https://github.com/YuWVandy/G2GNN}{https://github.com/YuWVandy/G2GNN  ( 3 min )
    Periodic Residual Learning for Crowd Flow Forecasting. (arXiv:2112.06132v2 [cs.LG] UPDATED)
    Crowd flow forecasting, which aims to predict the crowds entering or leaving certain regions, is a fundamental task in smart cities. One of the key properties of crowd flow data is periodicity: a pattern that occurs at regular time intervals, such as a weekly pattern. To capture such periodicity, existing studies either fuse the periodic hidden states into channels for networks to learn or apply extra periodic strategies to the network architecture. In this paper, we devise a novel periodic residual learning network (PRNet) for a better modeling of periodicity in crowd flow data. Unlike existing methods, PRNet frames the crowd flow forecasting as a periodic residual learning problem by modeling the variation between the inputs (the previous time period) and the outputs (the future time period). Compared to directly predicting crowd flows that are highly dynamic, learning more stationary deviation is much easier, which thus facilitates the model training. Besides, the learned variation enables the network to produce the residual between future conditions and its corresponding weekly observations at each time interval, and therefore contributes to substantially more accurate multi-step ahead predictions. Extensive experiments show that PRNet can be easily integrated into existing models to enhance their predictive performance.  ( 3 min )
    TTOpt: A Maximum Volume Quantized Tensor Train-based Optimization and its Application to Reinforcement Learning. (arXiv:2205.00293v2 [cs.LG] UPDATED)
    We present a novel procedure for optimization based on the combination of efficient quantized tensor train representation and a generalized maximum matrix volume principle. We demonstrate the applicability of the new Tensor Train Optimizer (TTOpt) method for various tasks, ranging from minimization of multidimensional functions to reinforcement learning. Our algorithm compares favorably to popular evolutionary-based methods and outperforms them by the number of function evaluations or execution time, often by a significant margin.  ( 2 min )
    Learning Dissipative Dynamics in Chaotic Systems. (arXiv:2106.06898v2 [cs.LG] UPDATED)
    Chaotic systems are notoriously challenging to predict because of their sensitivity to perturbations and errors due to time stepping. Despite this unpredictable behavior, for many dissipative systems the statistics of the long term trajectories are governed by an invariant measure supported on a set, known as the global attractor; for many problems this set is finite dimensional, even if the state space is infinite dimensional. For Markovian systems, the statistical properties of long-term trajectories are uniquely determined by the solution operator that maps the evolution of the system over arbitrary positive time increments. In this work, we propose a machine learning framework to learn the underlying solution operator for dissipative chaotic systems, showing that the resulting learned operator accurately captures short-time trajectories and long-time statistical behavior. Using this framework, we are able to predict various statistics of the invariant measure for the turbulent Kolmogorov Flow dynamics with Reynolds numbers up to 5000.  ( 2 min )
    Distance-based Positive and Unlabeled Learning for Ranking. (arXiv:2005.10700v3 [cs.LG] UPDATED)
    Learning to rank -- producing a ranked list of items specific to a query and with respect to a set of supervisory items -- is a problem of general interest. The setting we consider is one in which no analytic description of what constitutes a good ranking is available. Instead, we have a collection of representations and supervisory information consisting of a (target item, interesting items set) pair. We demonstrate analytically, in simulation, and in real data examples that learning to rank via combining representations using an integer linear program is effective when the supervision is as light as "these few items are similar to your item of interest." While this nomination task is quite general, for specificity we present our methodology from the perspective of vertex nomination in graphs. The methodology described herein is model agnostic.  ( 2 min )
    Graph Condensation for Graph Neural Networks. (arXiv:2110.07580v4 [cs.LG] UPDATED)
    Given the prevalence of large-scale graphs in real-world applications, the storage and time for training neural models have raised increasing concerns. To alleviate the concerns, we propose and study the problem of graph condensation for graph neural networks (GNNs). Specifically, we aim to condense the large, original graph into a small, synthetic and highly-informative graph, such that GNNs trained on the small graph and large graph have comparable performance. We approach the condensation problem by imitating the GNN training trajectory on the original graph through the optimization of a gradient matching loss and design a strategy to condense node futures and structural information simultaneously. Extensive experiments have demonstrated the effectiveness of the proposed framework in condensing different graph datasets into informative smaller graphs. In particular, we are able to approximate the original test accuracy by 95.3% on Reddit, 99.8% on Flickr and 99.0% on Citeseer, while reducing their graph size by more than 99.9%, and the condensed graphs can be used to train various GNN architectures.Code is released at https://github.com/ChandlerBang/GCond.  ( 3 min )
    On the Implicit Bias Towards Minimal Depth of Deep Neural Networks. (arXiv:2202.09028v9 [cs.LG] UPDATED)
    Recent results in the literature suggest that the penultimate (second-to-last) layer representations of neural networks that are trained for classification exhibit a clustering property called neural collapse (NC). We study the implicit bias of stochastic gradient descent (SGD) in favor of low-depth solutions when training deep neural networks. We characterize a notion of effective depth that measures the first layer for which sample embeddings are separable using the nearest-class center classifier. Furthermore, we hypothesize and empirically show that SGD implicitly selects neural networks of small effective depths. Secondly, while neural collapse emerges even when generalization should be impossible - we argue that the \emph{degree of separability} in the intermediate layers is related to generalization. We derive a generalization bound based on comparing the effective depth of the network with the minimal depth required to fit the same dataset with partially corrupted labels. Remarkably, this bound provides non-trivial estimations of the test performance. Finally, we empirically show that the effective depth of a trained neural network monotonically increases when increasing the number of random labels in data.  ( 3 min )
    Learning Filter-Based Compressed Blind-Deconvolution. (arXiv:2209.14165v1 [eess.SP])
    The problem of sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging. To reduce its computational and implementation cost, we propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time. The proposed compression measures the signal through a filter followed by a subsampling, allowing for a significant reduction in implementation cost. We derive theoretical guarantees for the identifiability and recovery of a sparse filter from compressed measurements. Our results allow for the design of a wide class of compression filters. We, then, propose a data-driven unrolled learning framework to learn the compression filter and solve the S-MBD problem. The encoder is a recurrent inference network that maps compressed measurements into an estimate of sparse filters. We demonstrate that our unrolled learning method is more robust to choices of source shapes and has better recovery performance compared to optimization-based methods. Finally, in applications with limited data (fewshot learning), we highlight the superior generalization capability of unrolled learning compared to conventional deep learning.  ( 2 min )
    CausalSim: A Causal Inference Framework for Unbiased Trace-Driven Simulation. (arXiv:2201.01811v3 [cs.LG] UPDATED)
    We present CausalSim, a causal inference framework for unbiased trace-driven simulation. Current trace-driven simulators assume that the interventions being simulated (e.g., a new algorithm) would not affect the validity of the traces. However, real-world traces are often biased by the choices of algorithms made during trace collection, and hence replaying traces under an intervention may lead to incorrect results. CausalSim addresses this challenge by learning a causal model of the system dynamics and latent factors capturing the underlying system conditions during trace collection. It learns these models using an initial randomized control trial (RCT) under a fixed set of algorithms, and then applies them to remove biases from trace data when simulating new algorithms. Key to CausalSim is mapping unbiased trace-driven simulation to a tensor completion problem with extremely sparse observations. By exploiting a basic distributional invariance property present in RCT data, CausalSim enables a novel tensor completion method despite the sparsity of observations. Our extensive evaluation of CausalSim on both real and synthetic datasets, including more than ten months of real data from the Puffer video streaming system show it improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines. Moreover, CausalSim provides markedly different insights about ABR algorithms compared to the biased baseline simulator, which we validate with a real deployment.  ( 3 min )
    Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach. (arXiv:2103.14029v3 [stat.ML] UPDATED)
    We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available. Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions. In this paper, we tackle the primary challenge to causal inference using negative controls: the identification and estimation of these bridge functions. Previous work has relied on completeness conditions on these functions to identify the causal parameters and required uniqueness assumptions in estimation, and they also focused on parametric estimation of bridge functions. Instead, we provide a new identification strategy that avoids the completeness condition. And, we provide new estimators for these functions based on minimax learning formulations. These estimators accommodate general function classes such as Reproducing Kernel Hilbert Spaces and neural networks. We study finite-sample convergence results both for estimating bridge functions themselves and for the final estimation of the causal parameter under a variety of combinations of assumptions. We avoid uniqueness conditions on the bridge functions as much as possible.  ( 2 min )
    Multiblock ADMM for nonsmooth nonconvex optimization with nonlinear coupling constraints. (arXiv:2201.07657v2 [math.OC] UPDATED)
    This paper considers a multiblock nonsmooth nonconvex optimization problem with nonlinear coupling constraints. By developing the idea of using the information zone and adaptive regime proposed in [J. Bolte, S. Sabach and M. Teboulle, Nonconvex Lagrangian-based optimization: Monitoring schemes and global convergence, Mathematics of Operations Research, 43: 1210--1232, 2018], we propose a multiblock alternating direction method of multipliers for solving this problem. We specify the update of the primal variables by employing a majorization minimization procedure in each block update. An independent convergence analysis is conducted to prove the subsequential and global convergence of the generated sequence to a critical point of the augmented Lagrangian. We also establish iteration complexity and provide preliminary numerical results for the proposed algorithm.  ( 2 min )
    p-Adic Statistical Field Theory and Deep Belief Networks. (arXiv:2207.13877v2 [math-ph] UPDATED)
    In this work we initiate the study of the correspondence between $p$-adic statistical field theories (SFTs) and neural networks (NNs). In general quantum field theories over a $p$-adic spacetime can be formulated in a rigorous way. Nowadays these theories are considered just mathematical toy models for understanding the problems of the true theories. In this work we show these theories are deeply connected with the deep belief networks (DBNs). Hinton et al. constructed DBNs by stacking several restricted Boltzmann machines (RBMs). The purpose of this construction is to obtain a network with a hierarchical structure (a deep learning architecture). An RBM corresponds a certain spin glass, thus a DBN should correspond to an ultrametric (hierarchical) spin glass. A model of such system can be easily constructed by using $p$-adic numbers. In our approach, a $p$-adic SFT corresponds to a $p$-adic continuous DBN, and a discretization of this theory corresponds to a $p$-adic discrete DBN. We show that these last machines are universal approximators. In the $p$-adic framework, the correspondence between SFTs and NNs is not fully developed. We point out several open problems.  ( 2 min )
    SHiFT: An Efficient, Flexible Search Engine for Transfer Learning. (arXiv:2204.01457v2 [cs.LG] UPDATED)
    Transfer learning can be seen as a data- and compute-efficient alternative to training models from scratch. The emergence of rich model repositories, such as TensorFlow Hub, enables practitioners and researchers to unleash the potential of these models across a wide range of downstream tasks. As these repositories keep growing exponentially, efficiently selecting a good model for the task at hand becomes paramount. By carefully comparing various selection and search strategies, we realize that no single method outperforms the others, and hybrid or mixed strategies can be beneficial. Therefore, we propose SHiFT, the first downstream task-aware, flexible, and efficient model search engine for transfer learning. These properties are enabled by a custom query language SHiFT-QL together with a cost-based decision maker, which we empirically validate. Motivated by the iterative nature of machine learning development, we further support efficient incremental executions of our queries, which requires a careful implementation when jointly used with our optimizations.  ( 2 min )
    Sample-Efficient Safety Assurances using Conformal Prediction. (arXiv:2109.14082v3 [cs.RO] UPDATED)
    When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e. of the situations that are unsafe, fewer than $\epsilon$ will occur without an alert. In this work, we present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics, in order to tune warning systems to provably achieve an $\epsilon$ false negative rate using as few as $1/\epsilon$ data points. We apply our framework to a driver warning system and a robotic grasping application, and empirically demonstrate guaranteed false negative rate while also observing low false detection (positive) rate.  ( 2 min )
    A hybrid inference system for improved curvature estimation in the level-set method using machine learning. (arXiv:2104.02951v5 [cs.LG] UPDATED)
    We present a novel hybrid strategy based on machine learning to improve curvature estimation in the level-set method. The proposed inference system couples enhanced neural networks with standard numerical schemes to compute curvature more accurately. The core of our hybrid framework is a switching mechanism that relies on well established numerical techniques to gauge curvature. If the curvature magnitude is larger than a resolution-dependent threshold, it uses a neural network to yield a better approximation. Our networks are multilayer perceptrons fitted to synthetic data sets composed of sinusoidal- and circular-interface samples at various configurations. To reduce data set size and training complexity, we leverage the problem's characteristic symmetry and build our models on just half of the curvature spectrum. These savings lead to a powerful inference system able to outperform any of its numerical or neural component alone. Experiments with stationary, smooth interfaces show that our hybrid solver is notably superior to conventional numerical methods in coarse grids and along steep interface regions. Compared to prior research, we have observed outstanding gains in precision after training the regression model with data pairs from more than a single interface type and transforming data with specialized input preprocessing. In particular, our findings confirm that machine learning is a promising venue for reducing or removing mass loss in the level-set method.  ( 3 min )
    Data-driven soliton mappings for integrable fractional nonlinear wave equations via deep learning with Fourier neural operator. (arXiv:2209.14291v1 [nlin.SI])
    In this paper, we firstly extend the Fourier neural operator (FNO) to discovery the soliton mapping between two function spaces, where one is the fractional-order index space $\{\epsilon|\epsilon\in (0, 1)\}$ in the fractional integrable nonlinear wave equations while another denotes the solitonic solution function space. To be specific, the fractional nonlinear Schr\"{o}dinger (fNLS), fractional Korteweg-de Vries (fKdV), fractional modified Korteweg-de Vries (fmKdV) and fractional sine-Gordon (fsineG) equations proposed recently are studied in this paper. We present the train and evaluate progress by recording the train and test loss. To illustrate the accuracies, the data-driven solitons are also compared to the exact solutions. Moreover, we consider the influences of several critical factors (e.g., activation functions containing Relu$(x)$, Sigmoid$(x)$, Swish$(x)$ and $x\tanh(x)$, depths of fully connected layer) on the performance of the FNO algorithm. We also use a new activation function, namely, $x\tanh(x)$, which is not used in the field of deep learning. The results obtained in this paper may be useful to further understand the neural networks in the fractional integrable nonlinear wave systems and the mappings between two spaces.  ( 3 min )
    Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results. (arXiv:2209.14272v1 [cs.LG])
    Humour is a substantial element of human affect and cognition. Its automatic understanding can facilitate a more naturalistic human-device interaction and the humanisation of artificial intelligence. Current methods of humour detection are solely based on staged data making them inadequate for 'real-world' applications. We address this deficiency by introducing the novel Passau-Spontaneous Football Coach Humour (Passau-SFCH) dataset, comprising of about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humour and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments, employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humour recognition is analysed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humour and its sentiment, facial expressions are most promising, while humour direction can be best modelled via text-based features. The results reveal considerable differences among various subjects, highlighting the individuality of humour usage and style. Further, we observe that a decision-level fusion yields the best recognition result. Finally, we make our code publicly available at https://www.github.com/EIHW/passau-sfch. The Passau-SFCH dataset is available upon request.  ( 3 min )
    Mobile Edge Computing, Metaverse, 6G Wireless Communications, Artificial Intelligence, and Blockchain: Survey and Their Convergence. (arXiv:2209.14147v1 [cs.DC])
    With the advances of the Internet of Things (IoT) and 5G/6G wireless communications, the paradigms of mobile computing have developed dramatically in recent years, from centralized mobile cloud computing to distributed fog computing and mobile edge computing (MEC). MEC pushes compute-intensive assignments to the edge of the network and brings resources as close to the endpoints as possible, addressing the shortcomings of mobile devices with regard to storage space, resource optimisation, computational performance and efficiency. Compared to cloud computing, as the distributed and closer infrastructure, the convergence of MEC with other emerging technologies, including the Metaverse, 6G wireless communications, artificial intelligence (AI), and blockchain, also solves the problems of network resource allocation, more network load as well as latency requirements. Accordingly, this paper investigates the computational paradigms used to meet the stringent requirements of modern applications. The application scenarios of MEC in mobile augmented reality (MAR) are provided. Furthermore, this survey presents the motivation of MEC-based Metaverse and introduces the applications of MEC to the Metaverse. Particular emphasis is given on a set of technical fusions mentioned above, e.g., 6G with MEC paradigm, MEC strengthened by blockchain, etc.  ( 3 min )
    A deep learning approach for the computation of curvature in the level-set method. (arXiv:2002.02804v4 [math.NA] UPDATED)
    We propose a deep learning strategy to estimate the mean curvature of two-dimensional implicit interfaces in the level-set method. Our approach is based on fitting feed-forward neural networks to synthetic data sets constructed from circular interfaces immersed in uniform grids of various resolutions. These multilayer perceptrons process the level-set values from mesh points next to the free boundary and output the dimensionless curvature at their closest locations on the interface. Accuracy analyses involving irregular interfaces, in both uniform and adaptive grids, show that our models are competitive with traditional numerical schemes in the $L^1$ and $L^2$ norms. In particular, our neural networks approximate curvature with comparable precision in coarse resolutions, when the interface features steep curvature regions, and when the number of iterations to reinitialize the level-set function is small. Although the conventional numerical approach is more robust than our framework, our results have unveiled the potential of machine learning for dealing with computational tasks where the level-set method is known to experience difficulties. We also establish that an application-dependent map of local resolutions to neural models can be devised to estimate mean curvature more effectively than a universal neural network.
    Global Weighted Tensor Nuclear Norm for Tensor Robust Principal Component Analysis. (arXiv:2209.14084v1 [cs.LG])
    Tensor Robust Principal Component Analysis (TRPCA), which aims to recover a low-rank tensor corrupted by sparse noise, has attracted much attention in many real applications. This paper develops a new Global Weighted TRPCA method (GWTRPCA), which is the first approach simultaneously considers the significance of intra-frontal slice and inter-frontal slice singular values in the Fourier domain. Exploiting this global information, GWTRPCA penalizes the larger singular values less and assigns smaller weights to them. Hence, our method can recover the low-tubal-rank components more exactly. Moreover, we propose an effective adaptive weight learning strategy by a Modified Cauchy Estimator (MCE) since the weight setting plays a crucial role in the success of GWTRPCA. To implement the GWTRPCA method, we devise an optimization algorithm using an Alternating Direction Method of Multipliers (ADMM) method. Experiments on real-world datasets validate the effectiveness of our proposed method.
    Less is More: Rethinking Few-Shot Learning and Recurrent Neural Nets. (arXiv:2209.14267v1 [cs.LG])
    The statistical supervised learning framework assumes an input-output set with a joint probability distribution that is reliably represented by the training dataset. The learner is then required to output a prediction rule learned from the training dataset's input-output pairs. In this work, we provide meaningful insights into the asymptotic equipartition property (AEP) \citep{Shannon:1948} in the context of machine learning, and illuminate some of its potential ramifications for few-shot learning. We provide theoretical guarantees for reliable learning under the information-theoretic AEP, and for the generalization error with respect to the sample size. We then focus on a highly efficient recurrent neural net (RNN) framework and propose a reduced-entropy algorithm for few-shot learning. We also propose a mathematical intuition for the RNN as an approximation of a sparse coding solver. We verify the applicability, robustness, and computational efficiency of the proposed approach with image deblurring and optical coherence tomography (OCT) speckle suppression. Our experimental results demonstrate significant potential for improving learning models' sample efficiency, generalization, and time complexity, that can therefore be leveraged for practical real-time applications.
    How to solve a classification problem using a cooperative tiling Multi-Agent System?. (arXiv:2209.14239v1 [cs.MA])
    Adaptive Multi-Agent Systems (AMAS) transform dynamic problems into problems of local cooperation between agents. We present smapy, an ensemble based AMAS implementation for mobility prediction, whose agents are provided with machine learning models in addition to their cooperation rules. With a detailed methodology, we propose a framework to transform a classification problem into a cooperative tiling of the input variable space. We show that it is possible to use linear classifiers for online non-linear classification on three benchmark toy problems chosen for their different levels of linear separability, if they are integrated in a cooperative Multi-Agent structure. The results obtained show a significant improvement of the performance of linear classifiers in non-linear contexts in terms of classification accuracy and decision boundaries, thanks to the cooperative approach.
    Data Augmentation using Feature Generation for Volumetric Medical Images. (arXiv:2209.14097v1 [eess.IV])
    Medical image classification is one of the most critical problems in the image recognition area. One of the major challenges in this field is the scarcity of labelled training data. Additionally, there is often class imbalance in datasets as some cases are very rare to happen. As a result, accuracy in classification task is normally low. Deep Learning models, in particular, show promising results on image segmentation and classification problems, but they require very large datasets for training. Therefore, there is a need to generate more of synthetic samples from the same distribution. Previous work has shown that feature generation is more efficient and leads to better performance than corresponding image generation. We apply this idea in the Medical Imaging domain. We use transfer learning to train a segmentation model for the small dataset for which gold-standard class annotations are available. We extracted the learnt features and use them to generate synthetic features conditioned on class labels, using Auxiliary Classifier GAN (ACGAN). We test the quality of the generated features in a downstream classification task for brain tumors according to their severity level. Experimental results show a promising result regarding the validity of these generated features and their overall contribution to balancing the data and improving the classification class-wise accuracy.
    Evaluation of Time-Series Forecasting Models for Chickenpox Cases Estimation in Hungary. (arXiv:2209.14129v1 [cs.AI])
    Time-Series Forecasting is a powerful data modeling discipline that analyzes historical observations to predict future values of a time-series. It has been utilized in numerous applications, including but not limited to economics, meteorology, and health. In this paper, we use time-series forecasting techniques to model and predict the future incidence of chickenpox. To achieve this, we implement and simulate multiple models and data preprocessing techniques on a Hungary-collected dataset. We demonstrate that the LSTM model outperforms all other models in the vast majority of the experiments in terms of county-level forecasting, whereas the SARIMAX model performs best at the national level. We also demonstrate that the performance of the traditional data preprocessing method is inferior to that of the data preprocessing method that we have proposed.
    Mutual Information and Ensemble Based Feature Recommender for Renal Cancer Stage Classification. (arXiv:2209.13836v1 [cs.LG])
    Kidney is an essential organ in human body. It maintains homeostasis and removes harmful substances through urine. Renal cell carcinoma (RCC) is the most common form of kidney cancer. Around 90\% of all kidney cancers are attributed to RCC. Most harmful type of RCC is clear cell renal cell carcinoma (ccRCC) that makes up about 80\% of all RCC cases. Early and accurate detection of ccRCC is necessary to prevent further spreading of the disease in other organs. In this article, a detailed experimentation is done to identify important features which can aid in diagnosing ccRCC at different stages. The ccRCC dataset is obtained from The Cancer Genome Atlas (TCGA). A novel mutual information and ensemble based feature ranking approach considering the order of features obtained from 8 popular feature selection methods is proposed. Performance of the proposed method is evaluated by overall classification accuracy obtained using 2 different classifiers (ANN and SVM). Experimental results show that the proposed feature ranking method is able to attain a higher accuracy (96.6\% and 98.6\% using SVM and NN, respectively) for classifying different stages of ccRCC with a reduced feature set as compared to existing work. It is also to be noted that, out of 3 distinguishing features as mentioned by the existing TNM system (proposed by AJCC and UICC), our proposed method was able to select two of them (size of tumour, metastasis status) as the top-most ones. This establishes the efficacy of our proposed approach.
    Momentum Gradient Descent Federated Learning with Local Differential Privacy. (arXiv:2209.14086v1 [cs.LG])
    Nowadays, the development of information technology is growing rapidly. In the big data era, the privacy of personal information has been more pronounced. The major challenge is to find a way to guarantee that sensitive personal information is not disclosed while data is published and analyzed. Centralized differential privacy is established on the assumption of a trusted third-party data curator. However, this assumption is not always true in reality. As a new privacy preservation model, local differential privacy has relatively strong privacy guarantees. Although federated learning has relatively been a privacy-preserving approach for distributed learning, it still introduces various privacy concerns. To avoid privacy threats and reduce communication costs, in this article, we propose integrating federated learning and local differential privacy with momentum gradient descent to improve the performance of machine learning models.
    Score Modeling for Simulation-based Inference. (arXiv:2209.14249v1 [cs.LG])
    Neural Posterior Estimation methods for simulation-based inference can be ill-suited for dealing with posterior distributions obtained by conditioning on multiple observations, as they may require a large number of simulator calls to yield accurate approximations. Neural Likelihood Estimation methods can naturally handle multiple observations, but require a separate inference step, which may affect their efficiency and performance. We introduce a new method for simulation-based inference that enjoys the benefits of both approaches. We propose to model the scores for the posterior distributions induced by individual observations, and introduce a sampling algorithm that combines the learned scores to approximately sample from the target efficiently.
    A Multi-scale Graph Signature for Persistence Diagrams based on Return Probabilities of Random Walks. (arXiv:2209.14264v1 [cs.LG])
    Persistence diagrams (PDs), often characterized as sets of death and birth of homology class, have been known for providing a topological representation of a graph structure, which is often useful in machine learning tasks. Prior works rely on a single graph signature to construct PDs. In this paper, we explore the use of a family of multi-scale graph signatures to enhance the robustness of topological features. We propose a deep learning architecture to handle this set input. Experiments on benchmark graph classification datasets demonstrate that our proposed architecture outperforms other persistent homology-based methods and achieves competitive performance compared to state-of-the-art methods using graph neural networks. In addition, our approach can be easily applied to large size of input graphs as it does not suffer from limited scalability which can be an issue for graph kernel methods.
    Knowledge-Aware Bayesian Deep Topic Model. (arXiv:2209.14228v1 [cs.CL])
    We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Although embedded topic models (ETMs) and its variants have gained promising performance in text analysis, they mainly focus on mining word co-occurrence patterns, ignoring potentially easy-to-obtain prior topic hierarchies that could help enhance topic coherence. While several knowledge-based topic models have recently been proposed, they are either only applicable to shallow hierarchies or sensitive to the quality of the provided prior knowledge. To this end, we develop a novel deep ETM that jointly models the documents and the given prior knowledge by embedding the words and topics into the same space. Guided by the provided knowledge, the proposed model tends to discover topic hierarchies that are organized into interpretable taxonomies. Besides, with a technique for adapting a given graph, our extended version allows the provided prior topic structure to be finetuned to match the target corpus. Extensive experiments show that our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
    Accuracy, Fairness, and Interpretability of Machine Learning Criminal Recidivism Models. (arXiv:2209.14237v1 [cs.CY])
    Criminal recidivism models are tools that have gained widespread adoption by parole boards across the United States to assist with parole decisions. These models take in large amounts of data about an individual and then predict whether an individual would commit a crime if released on parole. Although such models are not the only or primary factor in making the final parole decision, questions have been raised about their accuracy, fairness, and interpretability. In this paper, various machine learning-based criminal recidivism models are created based on a real-world parole decision dataset from the state of Georgia in the United States. The recidivism models are comparatively evaluated for their accuracy, fairness, and interpretability. It is found that there are noted differences and trade-offs between accuracy, fairness, and being inherently interpretable. Therefore, choosing the best model depends on the desired balance between accuracy, fairness, and interpretability, as no model is perfect or consistently the best across different criteria.
    Identifying Differential Equations to predict Blood Glucose using Sparse Identification of Nonlinear Systems. (arXiv:2209.13852v1 [cs.LG])
    Describing dynamic medical systems using machine learning is a challenging topic with a wide range of applications. In this work, the possibility of modeling the blood glucose level of diabetic patients purely on the basis of measured data is described. A combination of the influencing variables insulin and calories are used to find an interpretable model. The absorption speed of external substances in the human body depends strongly on external influences, which is why time-shifts are added for the influencing variables. The focus is put on identifying the best timeshifts that provide robust models with good prediction accuracy that are independent of other unknown external influences. The modeling is based purely on the measured data using Sparse Identification of Nonlinear Dynamics. A differential equation is determined which, starting from an initial value, simulates blood glucose dynamics. By applying the best model to test data, we can show that it is possible to simulate the long-term blood glucose dynamics using differential equations and few, influencing variables.
    Exploring the Relationship between Architecture and Adversarially Robust Generalization. (arXiv:2209.14105v1 [cs.LG])
    Adversarial training has been demonstrated to be one of the most effective remedies for defending adversarial examples, yet it often suffers from the huge robustness generalization gap on unseen testing adversaries, deemed as the \emph{adversarially robust generalization problem}. Despite the preliminary understandings devoted on adversarially robust generalization, little is known from the architectural perspective. Thus, this paper tries to bridge the gap by systematically examining the most representative architectures (e.g., Vision Transformers and CNNs). In particular, we first comprehensively evaluated \emph{20} adversarially trained architectures on ImageNette and CIFAR-10 datasets towards several adversaries (multiple $\ell_p$-norm adversarial attacks), and found that Vision Transformers (e.g., PVT, CoAtNet) often yield better adversarially robust generalization. To further understand what architectural ingredients favor adversarially robust generalization, we delve into several key building blocks and revealed the fact via the lens of Rademacher complexity that the higher weight sparsity contributes significantly towards the better adversarially robust generalization of Vision Transformers, which can be often achieved by attention layers. Our extensive studies discovered the close relationship between architectural design and adversarially robust generalization, and instantiated several important insights. We hope our findings could help to better understand the mechanism towards designing robust deep learning architectures.
    On the Generalization of Deep Reinforcement Learning Methods in the Problem of Local Navigation. (arXiv:2209.14271v1 [cs.RO])
    In this paper, we study the application of DRL algorithms in the context of local navigation problems, in which a robot moves towards a goal location in unknown and cluttered workspaces equipped only with limited-range exteroceptive sensors, such as LiDAR. Collision avoidance policies based on DRL present some advantages, but they are quite susceptible to local minima, once their capacity to learn suitable actions is limited to the sensor range. Since most robots perform tasks in unstructured environments, it is of great interest to seek generalized local navigation policies capable of avoiding local minima, especially in untrained scenarios. To do so, we propose a novel reward function that incorporates map information gained in the training stage, increasing the agent's capacity to deliberate about the best course of action. Also, we use the SAC algorithm for training our ANN, which shows to be more effective than others in the state-of-the-art literature. A set of sim-to-sim and sim-to-real experiments illustrate that our proposed reward combined with the SAC outperforms the compared methods in terms of local minima and collision avoidance.
    A Parameter-free Nonconvex Low-rank Tensor Completion Model for Spatiotemporal Traffic Data Recovery. (arXiv:2209.13786v1 [cs.LG])
    Traffic data chronically suffer from missing and corruption, leading to accuracy and utility reduction in subsequent Intelligent Transportation System (ITS) applications. Noticing the inherent low-rank property of traffic data, numerous studies formulated missing traffic data recovery as a low-rank tensor completion (LRTC) problem. Due to the non-convexity and discreteness of the rank minimization in LRTC, existing methods either replaced rank with convex surrogates that are quite far away from the rank function or approximated rank with nonconvex surrogates involving many parameters. In this study, we proposed a Parameter-Free Non-Convex Tensor Completion model (TC-PFNC) for traffic data recovery, in which a log-based relaxation term was designed to approximate tensor algebraic rank. Moreover, previous studies usually assumed the observations are reliable without any outliers. Therefore, we extended the TC-PFNC to a robust version (RTC-PFNC) by modeling potential traffic data outliers, which can recover the missing value from partial and corrupted observations and remove the anomalies in observations. The numerical solutions of TC-PFNC and RTC-PFNC were elaborated based on the alternating direction multiplier method (ADMM). The extensive experimental results conducted on four real-world traffic data sets demonstrated that the proposed methods outperform other state-of-the-art methods in both missing and corrupted data recovery. The code used in this paper is available at: https://github.com/YoungHe49/T-ITSPFNC.
    Online Subset Selection using $\alpha$-Core with no Augmented Regret. (arXiv:2209.14222v1 [cs.LG])
    We consider the problem of sequential sparse subset selections in an online learning setup. Assume that the set $[N]$ consists of $N$ distinct elements. On the $t^{\text{th}}$ round, a monotone reward function $f_t: 2^{[N]} \to \mathbb{R}_+,$ which assigns a non-negative reward to each subset of $[N],$ is revealed to a learner. The learner selects (perhaps randomly) a subset $S_t \subseteq [N]$ of $k$ elements before the reward function $f_t$ for that round is revealed $(k \leq N)$. As a consequence of its choice, the learner receives a reward of $f_t(S_t)$ on the $t^{\text{th}}$ round. The learner's goal is to design an online subset selection policy to maximize its expected cumulative reward accrued over a given time horizon. In this connection, we propose an online learning policy called SCore (Subset Selection with Core) that solves the problem for a large class of reward functions. The proposed SCore policy is based on a new concept of $\alpha$-Core, which is a generalization of the notion of Core from the cooperative game theory literature. We establish a learning guarantee for the SCore policy in terms of a new performance metric called $\alpha$-augmented regret. In this new metric, the power of the offline benchmark is suitably augmented compared to the online policy. We give several illustrative examples to show that a broad class of reward functions, including submodular, can be efficiently learned using the SCore policy. We also outline how the SCore policy can be used under a semi-bandit feedback model and conclude the paper with a number of open problems.
    Leveraging machine learning for less developed languages: Progress on Urdu text detection. (arXiv:2209.14022v1 [cs.CV])
    Text detection in natural scene images has applications for autonomous driving, navigation help for elderly and blind people. However, the research on Urdu text detection is usually hindered by lack of data resources. We have developed a dataset of scene images with Urdu text. We present the use of machine learning methods to perform detection of Urdu text from the scene images. We extract text regions using channel enhanced Maximally Stable Extremal Region (MSER) method. First, we classify text and noise based on their geometric properties. Next, we use a support vector machine for early discarding of non-text regions. To further remove the non-text regions, we use histogram of oriented gradients (HoG) features obtained and train a second SVM classifier. This improves the overall performance on text region detection within the scene images. To support research on Urdu text, We aim to make the data freely available for research use. We also aim to highlight the challenges and the research gap for Urdu text detection.
    Debiasing Graph Neural Networks via Learning Disentangled Causal Substructure. (arXiv:2209.14107v1 [cs.LG])
    Most Graph Neural Networks (GNNs) predict the labels of unseen graphs by learning the correlation between the input graphs and labels. However, by presenting a graph classification investigation on the training graphs with severe bias, surprisingly, we discover that GNNs always tend to explore the spurious correlations to make decision, even if the causal correlation always exists. This implies that existing GNNs trained on such biased datasets will suffer from poor generalization capability. By analyzing this problem in a causal view, we find that disentangling and decorrelating the causal and bias latent variables from the biased graphs are both crucial for debiasing. Inspiring by this, we propose a general disentangled GNN framework to learn the causal substructure and bias substructure, respectively. Particularly, we design a parameterized edge mask generator to explicitly split the input graph into causal and bias subgraphs. Then two GNN modules supervised by causal/bias-aware loss functions respectively are trained to encode causal and bias subgraphs into their corresponding representations. With the disentangled representations, we synthesize the counterfactual unbiased training samples to further decorrelate causal and bias variables. Moreover, to better benchmark the severe bias problem, we construct three new graph datasets, which have controllable bias degrees and are easier to visualize and explain. Experimental results well demonstrate that our approach achieves superior generalization performance over existing baselines. Furthermore, owing to the learned edge mask, the proposed model has appealing interpretability and transferability. Code and data are available at: https://github.com/googlebaba/DisC.
    Automatic Analysis of Available Source Code of Top Artificial Intelligence Conference Papers. (arXiv:2209.14155v1 [cs.SE])
    Source code is essential for researchers to reproduce the methods and replicate the results of artificial intelligence (AI) papers. Some organizations and researchers manually collect AI papers with available source code to contribute to the AI community. However, manual collection is a labor-intensive and time-consuming task. To address this issue, we propose a method to automatically identify papers with available source code and extract their source code repository URLs. With this method, we find that 20.5% of regular papers of 10 top AI conferences published from 2010 to 2019 are identified as papers with available source code and that 8.1% of these source code repositories are no longer accessible. We also create the XMU NLP Lab README Dataset, the largest dataset of labeled README files for source code document research. Through this dataset, we have discovered that quite a few README files have no installation instructions or usage tutorials provided. Further, a large-scale comprehensive statistical analysis is made for a general picture of the source code of AI conference papers. The proposed solution can also go beyond AI conference papers to analyze other scientific papers from both journals and conferences to shed light on more domains.
    Securing Federated Learning against Overwhelming Collusive Attackers. (arXiv:2209.14093v1 [cs.LG])
    In the era of a data-driven society with the ubiquity of Internet of Things (IoT) devices storing large amounts of data localized at different places, distributed learning has gained a lot of traction, however, assuming independent and identically distributed data (iid) across the devices. While relaxing this assumption that anyway does not hold in reality due to the heterogeneous nature of devices, federated learning (FL) has emerged as a privacy-preserving solution to train a collaborative model over non-iid data distributed across a massive number of devices. However, the appearance of malicious devices (attackers), who intend to corrupt the FL model, is inevitable due to unrestricted participation. In this work, we aim to identify such attackers and mitigate their impact on the model, essentially under a setting of bidirectional label flipping attacks with collusion. We propose two graph theoretic algorithms, based on Minimum Spanning Tree and k-Densest graph, by leveraging correlations between local models. Our FL model can nullify the influence of attackers even when they are up to 70% of all the clients whereas prior works could not afford more than 50% of clients as attackers. The effectiveness of our algorithms is ascertained through experiments on two benchmark datasets, namely MNIST and Fashion-MNIST, with overwhelming attackers. We establish the superiority of our algorithms over the existing ones using accuracy, attack success rate, and early detection round.
    Guiding Safe Exploration with Weakest Preconditions. (arXiv:2209.14148v1 [cs.LG])
    In reinforcement learning for safety-critical settings, it is often desirable for the agent to obey safety constraints at all points in time, including during training. We present a novel neurosymbolic approach called SPICE to solve this safe exploration problem. SPICE uses an online shielding layer based on symbolic weakest preconditions to achieve a more precise safety analysis than existing tools without unduly impacting the training process. We evaluate the approach on a suite of continuous control benchmarks and show that it can achieve comparable performance to existing safe learning techniques while incurring fewer safety violations. Additionally, we present theoretical results showing that SPICE converges to the optimal safe policy under reasonable assumptions.
    Class-Imbalanced Complementary-Label Learning via Weighted Loss. (arXiv:2209.14189v1 [cs.LG])
    Complementary-label learning (CLL) is a common application in the scenario of weak supervision. However, in real-world datasets, CLL encounters class-imbalanced training samples, where the quantity of samples of one class is significantly lower than those of other classes. Unfortunately, existing CLL approaches have yet to explore the problem of class-imbalanced samples, which reduces the prediction accuracy, especially in imbalanced classes. In this paper, we propose a novel problem setting to allow learning from class-imbalanced complementarily labeled samples for multi-class classification. Accordingly, to deal with this novel problem, we propose a new CLL approach, called Weighted Complementary-Label Learning (WCLL). The proposed method models a weighted empirical risk minimization loss by utilizing the class-imbalanced complementarily labeled information, which is also applicable to multi-class imbalanced training samples. Furthermore, the estimation error bound of the proposed method was derived to provide a theoretical guarantee. Finally, we do extensive experiments on widely-used benchmark datasets to validate the superiority of our method by comparing it with existing state-of-the-art methods.
    Falsification before Extrapolation in Causal Effect Estimation. (arXiv:2209.13708v1 [cs.LG])
    Randomized Controlled Trials (RCTs) represent a gold standard when developing policy guidelines. However, RCTs are often narrow, and lack data on broader populations of interest. Causal effects in these populations are often estimated using observational datasets, which may suffer from unobserved confounding and selection bias. Given a set of observational estimates (e.g. from multiple studies), we propose a meta-algorithm that attempts to reject observational estimates that are biased. We do so using validation effects, causal effects that can be inferred from both RCT and observational data. After rejecting estimators that do not pass this test, we generate conservative confidence intervals on the extrapolated causal effects for subgroups not observed in the RCT. Under the assumption that at least one observational estimator is asymptotically normal and consistent for both the validation and extrapolated effects, we provide guarantees on the coverage probability of the intervals output by our algorithm. To facilitate hypothesis testing in settings where causal effect transportation across datasets is necessary, we give conditions under which a doubly-robust estimator of group average treatment effects is asymptotically normal, even when flexible machine learning methods are used for estimation of nuisance parameters. We illustrate the properties of our approach on semi-synthetic and real world datasets, and show that it compares favorably to standard meta-analysis techniques.
    B2B Advertising: Joint Dynamic Scoring of Account and Users. (arXiv:2209.14250v1 [cs.LG])
    When a business sells to another business (B2B), the buying business is represented by a group of individuals, termed account, who collectively decide whether to buy. The seller advertises to each individual and interacts with them, mostly by digital means. The sales cycle is long, most often over a few months. There is heterogeneity among individuals belonging to an account in seeking information and hence the seller needs to score the interest of each individual over a long horizon to decide which individuals must be reached and when. Moreover, the buy decision rests with the account and must be scored to project the likelihood of purchase, a decision that is subject to change all the way up to the actual decision, emblematic of group decision making. We score decision of the account and its individuals in a dynamic manner. Dynamic scoring allows opportunity to influence different individual members at different time points over the long horizon. The dataset contains behavior logs of each individual's communication activities with the seller; but, there are no data on consultations among individuals which result in the decision. Using neural network architecture, we propose several ways to aggregate information from individual members' activities, to predict the group's collective decision. Multiple evaluations find strong model performance.
    VREN: Volleyball Rally Dataset with Expression Notation Language. (arXiv:2209.13846v1 [cs.LG])
    This research is intended to accomplish two goals: The first goal is to curate a large and information rich dataset that contains crucial and succinct summaries on the players' actions and positions and the back-and-forth travel patterns of the volleyball in professional and NCAA Div-I indoor volleyball games. While several prior studies have aimed to create similar datasets for other sports (e.g. badminton and soccer), creating such a dataset for indoor volleyball is not yet realized. The second goal is to introduce a volleyball descriptive language to fully describe the rally processes in the games and apply the language to our dataset. Based on the curated dataset and our descriptive sports language, we introduce three tasks for automated volleyball action and tactic analysis using our dataset: (1) Volleyball Rally Prediction, aimed at predicting the outcome of a rally and helping players and coaches improve decision-making in practice, (2) Setting Type and Hitting Type Prediction, to help coaches and players prepare more effectively for the game, and (3) Volleyball Tactics and Attacking Zone Statistics, to provide advanced volleyball statistics and help coaches understand the game and opponent's tactics better. We conducted case studies to show how experimental results can provide insights to the volleyball analysis community. Furthermore, experimental evaluation based on real-world data establishes a baseline for future studies and applications of our dataset and language. This study bridges the gap between the indoor volleyball field and computer science.
    Cyclegan Network for Sheet Metal Welding Drawing Translation. (arXiv:2209.14106v1 [cs.CV])
    In intelligent manufacturing, the quality of machine translation engineering drawings will directly affect its manufacturing accuracy. Currently, most of the work is manually translated, greatly reducing production efficiency. This paper proposes an automatic translation method for welded structural engineering drawings based on Cyclic Generative Adversarial Networks (CycleGAN). The CycleGAN network model of unpaired transfer learning is used to learn the feature mapping of real welding engineering drawings to realize automatic translation of engineering drawings. U-Net and PatchGAN are the main network for the generator and discriminator, respectively. Based on removing the identity mapping function, a high-dimensional sparse network is proposed to replace the traditional dense network for the Cyclegan generator to improve noise robustness. Increase the residual block hidden layer to increase the resolution of the generated graph. The improved and fine-tuned network models are experimentally validated, computing the gap between real and generated data. It meets the welding engineering precision standard and solves the main problem of low drawing recognition efficiency in the welding manufacturing process. The results show. After training with our model, the PSNR, SSIM and MSE of welding engineering drawings reach about 44.89%, 99.58% and 2.11, respectively, which are superior to traditional networks in both training speed and accuracy.
    Deep learning for gradient flows using the Brezis-Ekeland principle. (arXiv:2209.14115v1 [math.NA])
    We propose a deep learning method for the numerical solution of partial differential equations that arise as gradient flows. The method relies on the Brezis--Ekeland principle, which naturally defines an objective function to be minimized, and so is ideally suited for a machine learning approach using deep neural networks. We describe our approach in a general framework and illustrate the method with the help of an example implementation for the heat equation in space dimensions two to seven.
    Reinforcement Learning with Tensor Networks: Application to Dynamical Large Deviations. (arXiv:2209.14089v1 [cond-mat.stat-mech])
    We present a framework to integrate tensor network (TN) methods with reinforcement learning (RL) for solving dynamical optimisation tasks. We consider the RL actor-critic method, a model-free approach for solving RL problems, and introduce TNs as the approximators for its policy and value functions. Our "actor-critic with tensor networks" (ACTeN) method is especially well suited to problems with large and factorisable state and action spaces. As an illustration of the applicability of ACTeN we solve the exponentially hard task of sampling rare trajectories in two paradigmatic stochastic models, the East model of glasses and the asymmetric simple exclusion process (ASEP), the latter being particularly challenging to other methods due to the absence of detailed balance. With substantial potential for further integration with the vast array of existing RL methods, the approach introduced here is promising both for applications in physics and to multi-agent RL problems more generally.
    Recipro-CAM: Gradient-free reciprocal class activation map. (arXiv:2209.14074v1 [cs.CV])
    Convolutional neural network (CNN) becomes one of the most popular and prominent deep learning architectures for computer vision, but its black box feature hides the internal prediction process. For this reason, AI practitioners have shed light on explainable AI to provide the interpretability of the model behavior. In particular, class activation map (CAM) and Grad-CAM based methods have shown promise results, but they have architectural limitation or gradient computing burden. To resolve these, Score-CAM has been suggested as a gradient-free method, however, it requires more execution time compared to CAM or Grad-CAM based methods. Therefore, we propose a lightweight architecture and gradient free Reciprocal CAM (Recipro-CAM) by spatially masking the extracted feature maps to exploit the correlation between activation maps and network outputs. With the proposed method, we achieved the gains of 1:78 - 3:72% in the ResNet family compared to Score-CAM in Average Drop- Coherence-Complexity (ADCC) metric, excluding the VGG-16 (1:39% drop). In addition, Recipro-CAM exhibits a saliency map generation rate similar to Grad-CAM and approximately 148 times faster than Score-CAM.
    Offensive Language Detection on Twitter. (arXiv:2209.14091v1 [cs.CL])
    Detection of offensive language in social media is one of the key challenges for social media. Researchers have proposed many advanced methods to accomplish this task. In this report, we try to use the learnings from their approach and incorporate our ideas to improve upon them. We have successfully achieved an accuracy of 74% in classifying offensive tweets. We also list upcoming challenges in the abusive content detection in the social media world.
    CSSAM: U-net Network for Application and Segmentation of Welding Engineering Drawings. (arXiv:2209.14102v1 [cs.CV])
    Heavy equipment manufacturing splits specific contours in drawings and cuts sheet metal to scale for welding. Currently, most of the segmentation and extraction of weld map contours is achieved manually. Its efficiency is greatly reduced. Therefore, we propose a U-net-based contour segmentation and extraction method for welding engineering drawings. The contours of the parts required for engineering drawings can be automatically divided and blanked, which significantly improves manufacturing efficiency. U-net includes an encoder-decoder, which implements end-to-end mapping through semantic differences and spatial location feature information between the encoder and decoder. While U-net excels at segmenting medical images, our extensive experiments on the Welding Structural Diagram dataset show that the classic U-Net architecture falls short in segmenting welding engineering drawings. Therefore, we design a novel Channel Spatial Sequence Attention Module (CSSAM) and improve on the classic U-net. At the same time, vertical max pooling and average horizontal pooling are proposed. Pass the pooling operation through two equal convolutions into the CSSAM module. The output and the features before pooling are fused by semantic clustering, which replaces the traditional jump structure and effectively narrows the semantic gap between the encoder and the decoder, thereby improving the segmentation performance of welding engineering drawings. We use vgg16 as the backbone network. Compared with the classic U-net, our network has good performance in engineering drawing dataset segmentation.
    Consensus Knowledge Graph Learning via Multi-view Sparse Low Rank Block Model. (arXiv:2209.13762v1 [stat.ML])
    Network analysis has been a powerful tool to unveil relationships and interactions among a large number of objects. Yet its effectiveness in accurately identifying important node-node interactions is challenged by the rapidly growing network size, with data being collected at an unprecedented granularity and scale. Common wisdom to overcome such high dimensionality is collapsing nodes into smaller groups and conducting connectivity analysis on the group level. Dividing efforts into two phases inevitably opens a gap in consistency and drives down efficiency. Consensus learning emerges as a new normal for common knowledge discovery with multiple data sources available. To this end, this paper features developing a unified framework of simultaneous grouping and connectivity analysis by combining multiple data sources. The algorithm also guarantees a statistically optimal estimator.
    On the Robustness of Ensemble-Based Machine Learning Against Data Poisoning. (arXiv:2209.14013v1 [cs.LG])
    Machine learning is becoming ubiquitous. From financial to medicine, machine learning models are boosting decision-making processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses even predates the introduction of deep neural networks, leading to several promising solutions. Among them, ensemble-based defenses, where different models are trained on portions of the training set and their predictions are then aggregated, are getting significant attention, due to their relative simplicity and theoretical and practical guarantees. The work in this paper designs and implements a hash-based ensemble approach for ML robustness and evaluates its applicability and performance on random forests, a machine learning model proved to be more resistant to poisoning attempts on tabular datasets. An extensive experimental evaluation is carried out to evaluate the robustness of our approach against a variety of attacks, and compare it with a traditional monolithic model based on random forests.
    Backward Reachability Analysis of Neural Feedback Loops: Techniques for Linear and Nonlinear Systems. (arXiv:2209.14076v1 [eess.SY])
    The increasing prevalence of neural networks (NNs) in safety-critical applications calls for methods to certify safe behavior. This paper presents a backward reachability approach for safety verification of neural feedback loops (NFLs), i.e., closed-loop systems with NN control policies. While recent works have focused on forward reachability as a strategy for safety certification of NFLs, backward reachability offers advantages over the forward strategy, particularly in obstacle avoidance scenarios. Prior works have developed techniques for backward reachability analysis for systems without NNs, but the presence of NNs in the feedback loop presents a unique set of problems due to the nonlinearities in their activation functions and because NN models are generally not invertible. To overcome these challenges, we use existing forward NN analysis tools to efficiently find an over-approximation of the backprojection (BP) set, i.e., the set of states for which the NN control policy will drive the system to a given target set. We present frameworks for calculating BP over-approximations for both linear and nonlinear systems with control policies represented by feedforward NNs and propose computationally efficient strategies. We use numerical results from a variety of models to showcase the proposed algorithms, including a demonstration of safety certification for a 6D system.
    Efficient block contrastive learning via parameter-free meta-node approximation. (arXiv:2209.14067v1 [cs.LG])
    Contrastive learning has recently achieved remarkable success in many domains including graphs. However contrastive loss, especially for graphs, requires a large number of negative samples which is unscalable and computationally prohibitive with a quadratic time complexity. Sub-sampling is not optimal and incorrect negative sampling leads to sampling bias. In this work, we propose a meta-node based approximation technique that can (a) proxy all negative combinations (b) in quadratic cluster size time complexity, (c) at graph level, not node level, and (d) exploit graph sparsity. By replacing node-pairs with additive cluster-pairs, we compute the negatives in cluster-time at graph level. The resulting Proxy approximated meta-node Contrastive (PamC) loss, based on simple optimized GPU operations, captures the full set of negatives, yet is efficient with a linear time complexity. By avoiding sampling, we effectively eliminate sample bias. We meet the criterion for larger number of samples, thus achieving block-contrastiveness, which is proven to outperform pair-wise losses. We use learnt soft cluster assignments for the meta-node constriction, and avoid possible heterophily and noise added during edge creation. Theoretically, we show that real world graphs easily satisfy conditions necessary for our approximation. Empirically, we show promising accuracy gains over state-of-the-art graph clustering on 6 benchmarks. Importantly, we gain substantially in efficiency; up to 3x in training time, 1.8x in inference time and over 5x in GPU memory reduction.
    Forecasting Sensor Values in Waste-To-Fuel Plants: a Case Study. (arXiv:2209.13957v1 [cs.AI])
    In this research, we develop machine learning models to predict future sensor readings of a waste-to-fuel plant, which would enable proactive control of the plant's operations. We developed models that predict sensor readings for 30 and 60 minutes into the future. The models were trained using historical data, and predictions were made based on sensor readings taken at a specific time. We compare three types of models: (a) a n\"aive prediction that considers only the last predicted value, (b) neural networks that make predictions based on past sensor data (we consider different time window sizes for making a prediction), and (c) a gradient boosted tree regressor created with a set of features that we developed. We developed and tested our models on a real-world use case at a waste-to-fuel plant in Canada. We found that approach (c) provided the best results, while approach (b) provided mixed results and was not able to outperform the n\"aive consistently.
    ArNLI: Arabic Natural Language Inference for Entailment and Contradiction Detection. (arXiv:2209.13953v1 [cs.CL])
    Natural Language Inference (NLI) is a hot topic research in natural language processing, contradiction detection between sentences is a special case of NLI. This is considered a difficult NLP task which has a big influence when added as a component in many NLP applications, such as Question Answering Systems, text Summarization. Arabic Language is one of the most challenging low-resources languages in detecting contradictions due to its rich lexical, semantics ambiguity. We have created a data set of more than 12k sentences and named ArNLI, that will be publicly available. Moreover, we have applied a new model inspired by Stanford contradiction detection proposed solutions on English language. We proposed an approach to detect contradictions between pairs of sentences in Arabic language using contradiction vector combined with language model vector as an input to machine learning model. We analyzed results of different traditional machine learning classifiers and compared their results on our created data set (ArNLI) and on an automatic translation of both PHEME, SICK English data sets. Best results achieved using Random Forest classifier with an accuracy of 99%, 60%, 75% on PHEME, SICK and ArNLI respectively.
    Argumentative Reward Learning: Reasoning About Human Preferences. (arXiv:2209.14010v1 [cs.AI])
    We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference-based argumentation with existing approaches to reinforcement learning from human feedback. Our method improves prior work by generalising human preferences, reducing the burden on the user and increasing the robustness of the reward model. We demonstrate this with a number of experiments.
    Graph Soft-Contrastive Learning via Neighborhood Ranking. (arXiv:2209.13964v1 [cs.LG])
    Graph contrastive learning (GCL) has been an emerging solution for graph self-supervised learning. The core principle of GCL is to reduce the distance between samples in the positive view, but increase the distance between samples in the negative view. While achieving promising performances, current GCL methods still suffer from two limitations: (1) uncontrollable validity of augmentation, that graph perturbation may produce invalid views against semantics and feature-topology correspondence of graph data; and (2) unreliable binary contrastive justification, that the positiveness and negativeness of the constructed views are difficult to be determined for non-euclidean graph data. To tackle the above limitations, we propose a new contrastive learning paradigm for graphs, namely Graph Soft-Contrastive Learning (GSCL), that conducts contrastive learning in a finer-granularity via ranking neighborhoods without any augmentations and binary contrastive justification. GSCL is built upon the fundamental assumption of graph proximity that connected neighbors are more similar than far-distant nodes. Specifically, we develop pair-wise and list-wise Gated Ranking infoNCE Loss functions to preserve the relative ranking relationship in the neighborhood. Moreover, as the neighborhood size exponentially expands with more hops considered, we propose neighborhood sampling strategies to improve learning efficiency. The extensive experimental results show that our proposed GSCL can consistently achieve state-of-the-art performances on various public datasets with comparable practical complexity to GCL.
    PearNet: A Pearson Correlation-based Graph Attention Network for Sleep Stage Recognition. (arXiv:2209.13645v1 [eess.SP])
    Sleep stage recognition is crucial for assessing sleep and diagnosing chronic diseases. Deep learning models, such as Convolutional Neural Networks and Recurrent Neural Networks, are trained using grid data as input, making them not capable of learning relationships in non-Euclidean spaces. Graph-based deep models have been developed to address this issue when investigating the external relationship of electrode signals across different brain regions. However, the models cannot solve problems related to the internal relationships between segments of electrode signals within a specific brain region. In this study, we propose a Pearson correlation-based graph attention network, called PearNet, as a solution to this problem. Graph nodes are generated based on the spatial-temporal features extracted by a hierarchical feature extraction method, and then the graph structure is learned adaptively to build node connections. Based on our experiments on the Sleep-EDF-20 and Sleep-EDF-78 datasets, PearNet performs better than the state-of-the-art baselines.
    Big data analysis and distributed deep learning for next-generation intrusion detection system optimization. (arXiv:2209.13961v1 [cs.CR])
    With the growing use of information technology in all life domains, hacking has become more negatively effective than ever before. Also with developing technologies, attacks numbers are growing exponentially every few months and become more sophisticated so that traditional IDS becomes inefficient detecting them. This paper proposes a solution to detect not only new threats with higher detection rate and lower false positive than already used IDS, but also it could detect collective and contextual security attacks. We achieve those results by using Networking Chatbot, a deep recurrent neural network: Long Short Term Memory (LSTM) on top of Apache Spark Framework that has an input of flow traffic and traffic aggregation and the output is a language of two words, normal or abnormal. We propose merging the concepts of language processing, contextual analysis, distributed deep learning, big data, anomaly detection of flow analysis. We propose a model that describes the network abstract normal behavior from a sequence of millions of packets within their context and analyzes them in near real-time to detect point, collective and contextual anomalies. Experiments are done on MAWI dataset, and it shows better detection rate not only than signature IDS, but also better than traditional anomaly IDS. The experiment shows lower false positive, higher detection rate and better point anomalies detection. As for prove of contextual and collective anomalies detection, we discuss our claim and the reason behind our hypothesis. But the experiment is done on random small subsets of the dataset because of hardware limitations, so we share experiment and our future vision thoughts as we wish that full prove will be done in future by other interested researchers who have better hardware infrastructure than ours.
    Estimators of Entropy and Information via Inference in Probabilistic Models. (arXiv:2202.12363v3 [stat.ML] UPDATED)
    Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use importance sampling with proposal distribution families that include amortized variational inference and sequential Monte Carlo, which can be tailored to the target model and used to squeeze true information values with high accuracy. We present several theoretical properties of EEVI and demonstrate scalability and efficacy on two problems from the medical domain: (i) in an expert system for diagnosing liver disorders, we rank medical tests according to how informative they are about latent diseases, given a pattern of observed symptoms and patient attributes; and (ii) in a differential equation model of carbohydrate metabolism, we find optimal times to take blood glucose measurements that maximize information about a diabetic patient's insulin sensitivity, given their meal and medication schedule.
    Toward Certification of Machine-Learning Systems for Low Criticality Airborne Applications. (arXiv:2209.13975v1 [cs.LG])
    The exceptional progress in the field of machine learning (ML) in recent years has attracted a lot of interest in using this technology in aviation. Possible airborne applications of ML include safety-critical functions, which must be developed in compliance with rigorous certification standards of the aviation industry. Current certification standards for the aviation industry were developed prior to the ML renaissance without taking specifics of ML technology into account. There are some fundamental incompatibilities between traditional design assurance approaches and certain aspects of ML-based systems. In this paper, we analyze the current airborne certification standards and show that all objectives of the standards can be achieved for a low-criticality ML-based system if certain assumptions about ML development workflow are applied.
    An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis. (arXiv:2209.13914v1 [cs.SD])
    Affective speech analysis is an ongoing topic of research. A relatively new problem in this field is the analysis of vocal bursts, which are nonverbal vocalisations such as laughs or sighs. Current state-of-the-art approaches to address affective vocal burst analysis are mostly based on wav2vec2 or HuBERT features. In this paper, we investigate the use of the wav2vec successor data2vec in combination with a multitask learning pipeline to tackle different analysis problems at once. To assess the performance of our efficient multitask learning architecture, we participate in the 2022 ACII Affective Vocal Burst Challenge, showing that our approach substantially outperforms the baseline established there in three different subtasks.
    Training Strategies for Improved Lip-reading. (arXiv:2209.01383v2 [cs.CV] UPDATED)
    Several training strategies and temporal models have been recently proposed for isolated word lip-reading in a series of independent works. However, the potential of combining the best strategies and investigating the impact of each of them has not been explored. In this paper, we systematically investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies, like self-distillation and using word boundary indicators. Our results show that Time Masking (TM) is the most important augmentation followed by mixup and Densely-Connected Temporal Convolutional Networks (DC-TCN) are the best temporal model for lip-reading of isolated words. Using self-distillation and word boundary indicators is also beneficial but to a lesser extent. A combination of all the above methods results in a classification accuracy of 93.4%, which is an absolute improvement of 4.6% over the current state-of-the-art performance on the LRW dataset. The performance can be further improved to 94.1% by pre-training on additional datasets. An error analysis of the various training strategies reveals that the performance improves by increasing the classification accuracy of hard-to-recognise words.
    A simple but strong baseline for online continual learning: Repeated Augmented Rehearsal. (arXiv:2209.13917v1 [cs.LG])
    Online continual learning (OCL) aims to train neural networks incrementally from a non-stationary data stream with a single pass through data. Rehearsal-based methods attempt to approximate the observed input distributions over time with a small memory and revisit them later to avoid forgetting. Despite its strong empirical performance, rehearsal methods still suffer from a poor approximation of the loss landscape of past data with memory samples. This paper revisits the rehearsal dynamics in online settings. We provide theoretical insights on the inherent memory overfitting risk from the viewpoint of biased and dynamic empirical risk minimization, and examine the merits and limits of repeated rehearsal. Inspired by our analysis, a simple and intuitive baseline, Repeated Augmented Rehearsal (RAR), is designed to address the underfitting-overfitting dilemma of online rehearsal. Surprisingly, across four rather different OCL benchmarks, this simple baseline outperforms vanilla rehearsal by 9%-17% and also significantly improves state-of-the-art rehearsal-based methods MIR, ASER, and SCR. We also demonstrate that RAR successfully achieves an accurate approximation of the loss landscape of past data and high-loss ridge aversion in its learning trajectory. Extensive ablation studies are conducted to study the interplay between repeated and augmented rehearsal and reinforcement learning (RL) is applied to dynamically adjust the hyperparameters of RAR to balance the stability-plasticity trade-off online.
    Collaboration-Aware Graph Convolutional Network for Recommender Systems. (arXiv:2207.06221v2 [cs.IR] UPDATED)
    Graph Neural Networks (GNNs) have been successfully adopted in recommender systems by virtue of the message-passing that implicitly captures collaborative effect. Nevertheless, most of the existing message-passing mechanisms for recommendation are directly inherited from GNNs without scrutinizing whether the captured collaborative effect would benefit the prediction of user preferences. In this paper, we first analyze how message-passing captures the collaborative effect and propose a recommendation-oriented topological metric, Common Interacted Ratio (CIR), which measures the level of interaction between a specific neighbor of a node with the rest of its neighbors. After demonstrating the benefits of leveraging collaborations from neighbors with higher CIR, we propose a recommendation-tailored GNN, Collaboration-Aware Graph Convolutional Network (CAGCN), that goes beyond 1-Weisfeiler-Lehman(1-WL) test in distinguishing non-bipartite-subgraph-isomorphic graphs. Experiments on six benchmark datasets show that the best CAGCN variant outperforms the most representative GNN-based recommendation model, LightGCN, by nearly 10\% in Recall@20 and also achieves around 80\% speedup. Our code is publicly available at https://github.com/YuWVandy/CAGCN.
    Natural Language Processing Methods to Identify Oncology Patients at High Risk for Acute Care with Clinical Notes. (arXiv:2209.13860v1 [cs.CL])
    Clinical notes are an essential component of a health record. This paper evaluates how natural language processing (NLP) can be used to identify the risk of acute care use (ACU) in oncology patients, once chemotherapy starts. Risk prediction using structured health data (SHD) is now standard, but predictions using free-text formats are complex. This paper explores the use of free-text notes for the prediction of ACU instead of SHD. Deep Learning models were compared to manually engineered language features. Results show that SHD models minimally outperform NLP models; an l1-penalised logistic regression with SHD achieved a C-statistic of 0.748 (95%-CI: 0.735, 0.762), while the same model with language features achieved 0.730 (95%-CI: 0.717, 0.745) and a transformer-based model achieved 0.702 (95%-CI: 0.688, 0.717). This paper shows how language models can be used in clinical applications and underlines how risk bias is different for diverse patient groups, even using only free-text data.
    Shape-constrained Symbolic Regression with NSGA-III. (arXiv:2209.13851v1 [cs.LG])
    Shape-constrained symbolic regression (SCSR) allows to include prior knowledge into data-based modeling. This inclusion allows to ensure that certain expected behavior is better reflected by the resulting models. The expected behavior is defined via constraints, which refer to the function form e.g. monotonicity, concavity, convexity or the models image boundaries. In addition to the advantage of obtaining more robust and reliable models due to defining constraints over the functions shape, the use of SCSR allows to find models which are more robust to noise and have a better extrapolation behavior. This paper presents a mutlicriterial approach to minimize the approximation error as well as the constraint violations. Explicitly the two algorithms NSGA-II and NSGA-III are implemented and compared against each other in terms of model quality and runtime. Both algorithms are capable of dealing with multiple objectives, whereas NSGA-II is a well established multi-objective approach performing well on instances with up-to 3 objectives. NSGA-III is an extension of the NSGA-II algorithm and was developed to handle problems with "many" objectives (more than 3 objectives). Both algorithms are executed on a selected set of benchmark instances from physics textbooks. The results indicate that both algorithms are able to find largely feasible solutions and NSGA-III provides slight improvements in terms of model quality. Moreover, an improvement in runtime can be observed using the many-objective approach.
    Disentangling Transfer in Continual Reinforcement Learning. (arXiv:2209.13900v1 [cs.LG])
    The ability of continual learning systems to transfer knowledge from previously seen tasks in order to maximize performance on new tasks is a significant challenge for the field, limiting the applicability of continual learning solutions to realistic scenarios. Consequently, this study aims to broaden our understanding of transfer and its driving forces in the specific case of continual reinforcement learning. We adopt SAC as the underlying RL algorithm and Continual World as a suite of continuous control tasks. We systematically study how different components of SAC (the actor and the critic, exploration, and data) affect transfer efficacy, and we provide recommendations regarding various modeling options. The best set of choices, dubbed ClonEx-SAC, is evaluated on the recent Continual World benchmark. ClonEx-SAC achieves 87% final success rate compared to 80% of PackNet, the best method in the benchmark. Moreover, the transfer grows from 0.18 to 0.54 according to the metric provided by Continual World.
    An Embarrassingly Simple Approach to Semi-Supervised Few-Shot Learning. (arXiv:2209.13777v1 [cs.CV])
    Semi-supervised few-shot learning consists in training a classifier to adapt to new tasks with limited labeled data and a fixed quantity of unlabeled data. Many sophisticated methods have been developed to address the challenges this problem comprises. In this paper, we propose a simple but quite effective approach to predict accurate negative pseudo-labels of unlabeled data from an indirect learning perspective, and then augment the extremely label-constrained support set in few-shot classification tasks. Our approach can be implemented in just few lines of code by only using off-the-shelf operations, yet it is able to outperform state-of-the-art methods on four benchmark datasets.
    Variance Tolerance Factors For Interpreting Neural Networks. (arXiv:2209.13858v1 [cs.LG])
    Black box models only provide results for deep learning tasks and lack informative details about how these results were obtained. In this paper, we propose a general theory that defines a variance tolerance factor (VTF) to interpret the neural networks by ranking the importance of features and constructing a novel architecture consisting of a base model and feature model to demonstrate its utility. Two feature importance ranking methods and a feature selection method based on the VTF are created. A thorough evaluation on synthetic, benchmark, and real datasets is provided.
    Supervised Class-pairwise NMF for Data Representation and Classification. (arXiv:2209.13831v1 [cs.LG])
    Various Non-negative Matrix factorization (NMF) based methods add new terms to the cost function to adapt the model to specific tasks, such as clustering, or to preserve some structural properties in the reduced space (e.g., local invariance). The added term is mainly weighted by a hyper-parameter to control the balance of the overall formula to guide the optimization process towards the objective. The result is a parameterized NMF method. However, NMF method adopts unsupervised approaches to estimate the factorizing matrices. Thus, the ability to perform prediction (e.g. classification) using the new obtained features is not guaranteed. The objective of this work is to design an evolutionary framework to learn the hyper-parameter of the parameterized NMF and estimate the factorizing matrices in a supervised way to be more suitable for classification problems. Moreover, we claim that applying NMF-based algorithms separately to different class-pairs instead of applying it once to the whole dataset improves the effectiveness of the matrix factorization process. This results in training multiple parameterized NMF algorithms with different balancing parameter values. A cross-validation combination learning framework is adopted and a Genetic Algorithm is used to identify the optimal set of hyper-parameter values. The experiments we conducted on both real and synthetic datasets demonstrated the effectiveness of the proposed approach.
    Revisiting Few-Shot Learning from a Causal Perspective. (arXiv:2209.13816v1 [cs.LG])
    Few-shot learning with N-way K-shot scheme is an open challenge in machine learning. Many approaches have been proposed to tackle this problem, e.g., the Matching Networks and CLIP-Adapter. Despite that these approaches have shown significant progress, the mechanism of why these methods succeed has not been well explored. In this paper, we interpret these few-shot learning methods via causal mechanism. We show that the existing approaches can be viewed as specific forms of front-door adjustment, which is to remove the effects of confounders. Based on this, we introduce a general causal method for few-shot learning, which considers not only the relationship between examples but also the diversity of representations. Experimental results demonstrate the superiority of our proposed method in few-shot classification on various benchmark datasets. Code is available in the supplementary material.
    Online Policy Optimization for Robust MDP. (arXiv:2209.13841v1 [cs.LG])
    Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework -- in which the transition probabilities belong to an uncertainty set around a nominal model -- provides one way to develop robust models. While previous analysis shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address the additional uncertainty caused by an adversarial environment, our model features a new optimistic update rule derived via Fenchel conjugates. Our analysis establishes the first regret bound for online robust MDPs.
    FedVeca: Federated Vectorized Averaging on Non-IID Data with Adaptive Bi-directional Global Objective. (arXiv:2209.13803v1 [cs.LG])
    Federated Learning (FL) is a distributed machine learning framework to alleviate the data silos, where decentralized clients collaboratively learn a global model without sharing their private data. However, the clients' Non-Independent and Identically Distributed (Non-IID) data negatively affect the trained model, and clients with different numbers of local updates may cause significant gaps to the local gradients in each communication round. In this paper, we propose a Federated Vectorized Averaging (FedVeca) method to address the above problem on Non-IID data. Specifically, we set a novel objective for the global model which is related to the local gradients. The local gradient is defined as a bi-directional vector with step size and direction, where the step size is the number of local updates and the direction is divided into positive and negative according to our definition. In FedVeca, the direction is influenced by the step size, thus we average the bi-directional vectors to reduce the effect of different step sizes. Then, we theoretically analyze the relationship between the step sizes and the global objective, and obtain upper bounds on the step sizes per communication round. Based on the upper bounds, we design an algorithm for the server and the client to adaptively adjusts the step sizes that make the objective close to the optimum. Finally, we conduct experiments on different datasets, models and scenarios by building a prototype system, and the experimental results demonstrate the effectiveness and efficiency of the FedVeca method.
    Label Distribution Learning via Implicit Distribution Representation. (arXiv:2209.13824v1 [cs.LG])
    In contrast to multi-label learning, label distribution learning characterizes the polysemy of examples by a label distribution to represent richer semantics. In the learning process of label distribution, the training data is collected mainly by manual annotation or label enhancement algorithms to generate label distribution. Unfortunately, the complexity of the manual annotation task or the inaccuracy of the label enhancement algorithm leads to noise and uncertainty in the label distribution training set. To alleviate this problem, we introduce the implicit distribution in the label distribution learning framework to characterize the uncertainty of each label value. Specifically, we use deep implicit representation learning to construct a label distribution matrix with Gaussian prior constraints, where each row component corresponds to the distribution estimate of each label value, and this row component is constrained by a prior Gaussian distribution to moderate the noise and uncertainty interference of the label distribution dataset. Finally, each row component of the label distribution matrix is transformed into a standard label distribution form by using the self-attention algorithm. In addition, some approaches with regularization characteristics are conducted in the training phase to improve the performance of the model.
    Towards Regression-Free Neural Networks for Diverse Compute Platforms. (arXiv:2209.13740v1 [cs.CV])
    With the shift towards on-device deep learning, ensuring a consistent behavior of an AI service across diverse compute platforms becomes tremendously important. Our work tackles the emergent problem of reducing predictive inconsistencies arising as negative flips: test samples that are correctly predicted by a less accurate model, but incorrectly by a more accurate one. We introduce REGression constrained Neural Architecture Search (REG-NAS) to design a family of highly accurate models that engender fewer negative flips. REG-NAS consists of two components: (1) A novel architecture constraint that enables a larger model to contain all the weights of the smaller one thus maximizing weight sharing. This idea stems from our observation that larger weight sharing among networks leads to similar sample-wise predictions and results in fewer negative flips; (2) A novel search reward that incorporates both Top-1 accuracy and negative flips in the architecture search metric. We demonstrate that \regnas can successfully find desirable architectures with few negative flips in three popular architecture search spaces. Compared to the existing state-of-the-art approach, REG-NAS enables 33-48% relative reduction of negative flips.
    Joint Learning of Linear Time-Invariant Dynamical Systems. (arXiv:2112.10955v4 [stat.ML] UPDATED)
    Linear time-invariant systems are very popular models in system theory and applications. A fundamental problem in system identification that remains rather unaddressed in extant literature is to leverage commonalities amongst related linear systems to estimate their transition matrices more accurately. To address this problem, the current paper investigates methods for jointly estimating the transition matrices of multiple systems. It is assumed that the transition matrices are unknown linear functions of some unknown shared basis matrices. We establish finite-time estimation error rates that fully reflect the roles of trajectory lengths, dimension, and number of systems under consideration. The presented results are fairly general and show the significant gains that can be achieved by pooling data across systems in comparison to learning each system individually. Further, they are shown to be robust against model misspecifications. To obtain the results, we develop novel techniques that are of interest for addressing similar joint-learning problems. They include tightly bounding estimation errors in terms of the eigen-structures of transition matrices, establishing sharp high probability bounds for singular values of dependent random matrices, and capturing effects of misspecified transition matrices as the systems evolve over time.
    A Closer Look at Evaluating the Bit-Flip Attack Against Deep Neural Networks. (arXiv:2209.14243v1 [cs.CR])
    Deep neural network models are massively deployed on a wide variety of hardware platforms. This results in the appearance of new attack vectors that significantly extend the standard attack surface, extensively studied by the adversarial machine learning community. One of the first attack that aims at drastically dropping the performance of a model, by targeting its parameters (weights) stored in memory, is the Bit-Flip Attack (BFA). In this work, we point out several evaluation challenges related to the BFA. First of all, the lack of an adversary's budget in the standard threat model is problematic, especially when dealing with physical attacks. Moreover, since the BFA presents critical variability, we discuss the influence of some training parameters and the importance of the model architecture. This work is the first to present the impact of the BFA against fully-connected architectures that present different behaviors compared to convolutional neural networks. These results highlight the importance of defining robust and sound evaluation methodologies to properly evaluate the dangers of parameter-based attacks as well as measure the real level of robustness offered by a defense.
    Conformal Prediction is Robust to Label Noise. (arXiv:2209.14295v1 [cs.LG])
    We study the robustness of conformal prediction, a powerful tool for uncertainty quantification, to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. Through stylized theoretical examples and practical experiments, we argue that naive conformal prediction covers the noiseless ground truth label unless the noise distribution is adversarially designed. This leads us to believe that correcting for label noise is unnecessary except for pathological data distributions or noise sources. In such cases, we can also correct for noise of bounded size in the conformal prediction algorithm in order to ensure correct coverage of the ground truth labels without score or data regularity.
    PTSD in the Wild: A Video Database for Studying Post-Traumatic Stress Disorder Recognition in Unconstrained Environments. (arXiv:2209.14085v1 [cs.HC])
    POST-traumatic stress disorder (PTSD) is a chronic and debilitating mental condition that is developed in response to catastrophic life events, such as military combat, sexual assault, and natural disasters. PTSD is characterized by flashbacks of past traumatic events, intrusive thoughts, nightmares, hypervigilance, and sleep disturbance, all of which affect a person's life and lead to considerable social, occupational, and interpersonal dysfunction. The diagnosis of PTSD is done by medical professionals using self-assessment questionnaire of PTSD symptoms as defined in the Diagnostic and Statistical Manual of Mental Disorders (DSM). In this paper, and for the first time, we collected, annotated, and prepared for public distribution a new video database for automatic PTSD diagnosis, called PTSD in the wild dataset. The database exhibits "natural" and big variability in acquisition conditions with different pose, facial expression, lighting, focus, resolution, age, gender, race, occlusions and background. In addition to describing the details of the dataset collection, we provide a benchmark for evaluating computer vision and machine learning based approaches on PTSD in the wild dataset. In addition, we propose and we evaluate a deep learning based approach for PTSD detection in respect to the given benchmark. The proposed approach shows very promising results. Interested researcher can download a copy of PTSD-in-the wild dataset from: this http URL  ( 3 min )
    Machine Beats Machine: Machine Learning Models to Defend Against Adversarial Attacks. (arXiv:2209.13963v1 [cs.LG])
    We propose using a two-layered deployment of machine learning models to prevent adversarial attacks. The first layer determines whether the data was tampered, while the second layer solves a domain-specific problem. We explore three sets of features and three dataset variations to train machine learning models. Our results show clustering algorithms achieved promising results. In particular, we consider the best results were obtained by applying the DBSCAN algorithm to the structured structural similarity index measure computed between the images and a white reference image.
    Active Transfer Prototypical Network: An Efficient Labeling Algorithm for Time-Series Data. (arXiv:2209.14199v1 [cs.LG])
    The paucity of labeled data is a typical challenge in the automotive industry. Annotating time-series measurements requires solid domain knowledge and in-depth exploratory data analysis, which implies a high labeling effort. Conventional Active Learning (AL) addresses this issue by actively querying the most informative instances based on the estimated classification probability and retraining the model iteratively. However, the learning efficiency strongly relies on the initial model, resulting in the trade-off between the size of the initial dataset and the query number. This paper proposes a novel Few-Shot Learning (FSL)-based AL framework, which addresses the trade-off problem by incorporating a Prototypical Network (ProtoNet) in the AL iterations. The results show an improvement, on the one hand, in the robustness to the initial model and, on the other hand, in the learning efficiency of the ProtoNet through the active selection of the support set in each iteration. This framework was validated on UCI HAR/HAPT dataset and a real-world braking maneuver dataset. The learning performance significantly surpasses traditional AL algorithms on both datasets, achieving 90% classification accuracy with 10% and 5% labeling effort, respectively.
    SoftTreeMax: Policy Gradient with Tree Search. (arXiv:2209.13966v1 [cs.LG])
    Policy-gradient methods are widely used for learning control policies. They can be easily distributed to multiple workers and reach state-of-the-art results in many domains. Unfortunately, they exhibit large variance and subsequently suffer from high-sample complexity since they aggregate gradients over entire trajectories. At the other extreme, planning methods, like tree search, optimize the policy using single-step transitions that consider future lookahead. These approaches have been mainly considered for value-based algorithms. Planning-based algorithms require a forward model and are computationally intensive at each step, but are more sample efficient. In this work, we introduce SoftTreeMax, the first approach that integrates tree-search into policy gradient. Traditionally, gradients are computed for single state-action pairs. Instead, our tree-based policy structure leverages all gradients at the tree leaves in each environment step. This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient. On Atari, SoftTreeMax demonstrates up to 5x better performance in faster run-time compared with distributed PPO.
    Experimental study of time series forecasting methods for groundwater level prediction. (arXiv:2209.13927v1 [cs.LG])
    Groundwater level prediction is an applied time series forecasting task with important social impacts to optimize water management as well as preventing some natural disasters: for instance, floods or severe droughts. Machine learning methods have been reported in the literature to achieve this task, but they are only focused on the forecast of the groundwater level at a single location. A global forecasting method aims at exploiting the groundwater level time series from a wide range of locations to produce predictions at a single place or at several places at a time. Given the recent success of global forecasting methods in prestigious competitions, it is meaningful to assess them on groundwater level prediction and see how they are compared to local methods. In this work, we created a dataset of 1026 groundwater level time series. Each time series is made of daily measurements of groundwater levels and two exogenous variables, rainfall and evapotranspiration. This dataset is made available to the communities for reproducibility and further evaluation. To identify the best configuration to effectively predict groundwater level for the complete set of time series, we compared different predictors including local and global time series forecasting methods. We assessed the impact of exogenous variables. Our result analysis shows that the best predictions are obtained by training a global method on past groundwater levels and rainfall data.
    Scalable and Equivariant Spherical CNNs by Discrete-Continuous (DISCO) Convolutions. (arXiv:2209.13603v1 [cs.CV])
    No existing spherical convolutional neural network (CNN) framework is both computationally scalable and rotationally equivariant. Continuous approaches capture rotational equivariance but are often prohibitively computationally demanding. Discrete approaches offer more favorable computational performance but at the cost of equivariance. We develop a hybrid discrete-continuous (DISCO) group convolution that is simultaneously equivariant and computationally scalable to high-resolution. While our framework can be applied to any compact group, we specialize to the sphere. Our DISCO spherical convolutions not only exhibit $\text{SO}(3)$ rotational equivariance but also a form of asymptotic $\text{SO}(3)/\text{SO}(2)$ rotational equivariance, which is more desirable for many applications (where $\text{SO}(n)$ is the special orthogonal group representing rotations in $n$-dimensions). Through a sparse tensor implementation we achieve linear scaling in number of pixels on the sphere for both computational cost and memory usage. For 4k spherical images we realize a saving of $10^9$ in computational cost and $10^4$ in memory usage when compared to the most efficient alternative equivariant spherical convolution. We apply the DISCO spherical CNN framework to a number of benchmark dense-prediction problems on the sphere, such as semantic segmentation and depth estimation, on all of which we achieve the state-of-the-art performance.
    An Overview of the Data-Loader Landscape: Comparative Performance Analysis. (arXiv:2209.13705v1 [cs.DC])
    Dataloaders, in charge of moving data from storage into GPUs while training machine learning models, might hold the key to drastically improving the performance of training jobs. Recent advances have shown promise not only by considerably decreasing training time but also by offering new features such as loading data from remote storage like S3. In this paper, we are the first to distinguish the dataloader as a separate component in the Deep Learning (DL) workflow and to outline its structure and features. Finally, we offer a comprehensive comparison of the different dataloading libraries available, their trade-offs in terms of functionality, usability, and performance and the insights derived from them.
    Deep learning based sferics recognition for AMT data processing in the dead band. (arXiv:2209.13647v1 [eess.SP])
    In the audio magnetotellurics (AMT) sounding data processing, the absence of sferic signals in some time ranges typically results in a lack of energy in the AMT dead band, which may cause unreliable resistivity estimate. We propose a deep convolutional neural network (CNN) to automatically recognize sferic signals from redundantly recorded data in a long time range and use them to compensate for the resistivity estimation. We train the CNN by using field time series data with different signal to noise rations that were acquired from different regions in mainland China. To solve the potential overfitting problem due to the limited number of sferic labels, we propose a training strategy that randomly generates training samples (with random data augmentations) while optimizing the CNN model parameters. We stop the training process and data generation until the training loss converges. In addition, we use a weighted binary cross-entropy loss function to solve the sample imbalance problem to better optimize the network, use multiple reasonable metrics to evaluate network performance, and carry out ablation experiments to optimally choose the model hyperparameters. Extensive field data applications show that our trained CNN can robustly recognize sferic signals from noisy time series for subsequent impedance estimation. The subsequent processing results show that our method can significantly improve S/N and effectively solve the problem of lack of energy in dead band. Compared to the traditional processing method without sferic compensation, our method can generate a smoother and more reasonable apparent resistivity-phase curves and depolarized phase tensor, correct the estimation error of sudden drop of high-frequency apparent resistivity and abnormal behavior of phase reversal, and finally better restore the real shallow subsurface resistivity structure.  ( 3 min )
    ButterflyFlow: Building Invertible Layers with Butterfly Matrices. (arXiv:2209.13774v1 [cs.LG])
    Normalizing flows model complex probability distributions using maps obtained by composing invertible layers. Special linear layers such as masked and 1x1 convolutions play a key role in existing architectures because they increase expressive power while having tractable Jacobians and inverses. We propose a new family of invertible linear layers based on butterfly layers, which are known to theoretically capture complex linear structures including permutations and periodicity, yet can be inverted efficiently. This representational power is a key advantage of our approach, as such structures are common in many real-world datasets. Based on our invertible butterfly layers, we construct a new class of normalizing flow models called ButterflyFlow. Empirically, we demonstrate that ButterflyFlows not only achieve strong density estimation results on natural images such as MNIST, CIFAR-10, and ImageNet 32x32, but also obtain significantly better log-likelihoods on structured datasets such as galaxy images and MIMIC-III patient cohorts -- all while being more efficient in terms of memory and computation than relevant baselines.  ( 2 min )
    Reconstruction-guided attention improves the robustness and shape processing of neural networks. (arXiv:2209.13620v1 [cs.CV])
    Many visual phenomena suggest that humans use top-down generative or reconstructive processes to create visual percepts (e.g., imagery, object completion, pareidolia), but little is known about the role reconstruction plays in robust object recognition. We built an iterative encoder-decoder network that generates an object reconstruction and used it as top-down attentional feedback to route the most relevant spatial and feature information to feed-forward object recognition processes. We tested this model using the challenging out-of-distribution digit recognition dataset, MNIST-C, where 15 different types of transformation and corruption are applied to handwritten digit images. Our model showed strong generalization performance against various image perturbations, on average outperforming all other models including feedforward CNNs and adversarially trained networks. Our model is particularly robust to blur, noise, and occlusion corruptions, where shape perception plays an important role. Ablation studies further reveal two complementary roles of spatial and feature-based attention in robust object recognition, with the former largely consistent with spatial masking benefits in the attention literature (the reconstruction serves as a mask) and the latter mainly contributing to the model's inference speed (i.e., number of time steps to reach a certain confidence threshold) by reducing the space of possible object hypotheses. We also observed that the model sometimes hallucinates a non-existing pattern out of noise, leading to highly interpretable human-like errors. Our study shows that modeling reconstruction-based feedback endows AI systems with a powerful attention mechanism, which can help us understand the role of generating perception in human visual processing.  ( 3 min )
    Rethinking Clustering-Based Pseudo-Labeling for Unsupervised Meta-Learning. (arXiv:2209.13635v1 [cs.LG])
    The pioneering method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling. This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data. However, it often suffers from label inconsistency or limited diversity, which leads to poor performance. In this work, we prove that the core reason for this is lack of a clustering-friendly property in the embedding space. We address this by minimizing the inter- to intra-class similarity ratio to provide clustering-friendly embedding features, and validate our approach through comprehensive experiments. Note that, despite only utilizing a simple clustering algorithm (k-means) in our embedding space to obtain the pseudo-labels, we achieve significant improvement. Moreover, we adopt a progressive evaluation mechanism to obtain more diverse samples in order to further alleviate the limited diversity problem. Finally, our approach is also model-agnostic and can easily be integrated into existing supervised methods. To demonstrate its generalization ability, we integrate it into two representative algorithms: MAML and EP. The results on three main few-shot benchmarks clearly show that the proposed method achieves significant improvement compared to state-of-the-art models. Notably, our approach also outperforms the corresponding supervised method in two tasks.  ( 2 min )
    Reasoning over Multi-view Knowledge Graphs. (arXiv:2209.13702v1 [cs.AI])
    Recently, knowledge representation learning (KRL) is emerging as the state-of-the-art approach to process queries over knowledge graphs (KGs), wherein KG entities and the query are embedded into a latent space such that entities that answer the query are embedded close to the query. Yet, despite the intensive research on KRL, most existing studies either focus on homogenous KGs or assume KG completion tasks (i.e., inference of missing facts), while answering complex logical queries over KGs with multiple aspects (multi-view KGs) remains an open challenge. To bridge this gap, in this paper, we present ROMA, a novel KRL framework for answering logical queries over multi-view KGs. Compared with the prior work, ROMA departs in major aspects. (i) It models a multi-view KG as a set of overlaying sub-KGs, each corresponding to one view, which subsumes many types of KGs studied in the literature (e.g., temporal KGs). (ii) It supports complex logical queries with varying relation and view constraints (e.g., with complex topology and/or from multiple views); (iii) It scales up to KGs of large sizes (e.g., millions of facts) and fine-granular views (e.g., dozens of views); (iv) It generalizes to query structures and KG views that are unobserved during training. Extensive empirical evaluation on real-world KGs shows that \system significantly outperforms alternative methods.  ( 2 min )
    Hamiltonian Adaptive Importance Sampling. (arXiv:2209.13716v1 [cs.LG])
    Importance sampling (IS) is a powerful Monte Carlo (MC) methodology for approximating integrals, for instance in the context of Bayesian inference. In IS, the samples are simulated from the so-called proposal distribution, and the choice of this proposal is key for achieving a high performance. In adaptive IS (AIS) methods, a set of proposals is iteratively improved. AIS is a relevant and timely methodology although many limitations remain yet to be overcome, e.g., the curse of dimensionality in high-dimensional and multi-modal problems. Moreover, the Hamiltonian Monte Carlo (HMC) algorithm has become increasingly popular in machine learning and statistics. HMC has several appealing features such as its exploratory behavior, especially in high-dimensional targets, when other methods suffer. In this paper, we introduce the novel Hamiltonian adaptive importance sampling (HAIS) method. HAIS implements a two-step adaptive process with parallel HMC chains that cooperate at each iteration. The proposed HAIS efficiently adapts a population of proposals, extracting the advantages of HMC. HAIS can be understood as a particular instance of the generic layered AIS family with an additional resampling step. HAIS achieves a significant performance improvement in high-dimensional problems w.r.t. state-of-the-art algorithms. We discuss the statistical properties of HAIS and show its high performance in two challenging examples.  ( 3 min )
    MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference. (arXiv:2209.13643v1 [cs.CR])
    Multi-party computing (MPC) has been gaining popularity over the past years as a secure computing model, particularly for machine learning (ML) inference. Compared with its competitors, MPC has fewer overheads than homomorphic encryption (HE) and has a more robust threat model than hardware-based trusted execution environments (TEE) such as Intel SGX. Despite its apparent advantages, MPC protocols still pay substantial performance penalties compared to plaintext when applied to ML algorithms. The overhead is due to added computation and communication costs. For multiplications that are ubiquitous in ML algorithms, MPC protocols add 32x more computational costs and 1 round of broadcasting among MPC servers. Moreover, ML computations that have trivial costs in plaintext, such as Softmax, ReLU, and other non-linear operations become very expensive due to added communication. Those added overheads make MPC less palatable to deploy in real-time ML inference frameworks, such as speech translation. In this work, we present MPC-Pipe, an MPC pipeline inference technique that uses two ML-specific approaches. 1) inter-linear-layer pipeline and 2) inner layer pipeline. Those two techniques shorten the total inference runtime for machine learning models. Our experiments have shown to reduce ML inference latency by up to 12.6% when model weights are private and 14.48\% when model weights are public, compared to current MPC protocol implementations.  ( 3 min )
    Deep learning forward and reverse primer design to detect SARS-CoV-2 emerging variants. (arXiv:2209.13591v1 [q-bio.GN])
    Surges that have been observed at different periods in the number of COVID-19 cases are associated with the emergence of multiple SARS-CoV-2 (Severe Acute Respiratory Virus) variants. The design of methods to support laboratory detection are crucial in the monitoring of these variants. Hence, in this paper, we develop a semi-automated method to design both forward and reverse primer sets to detect SARS-CoV-2 variants. To proceed, we train deep Convolution Neural Networks (CNNs) to classify labelled SARS-CoV-2 variants and identify partial genomic features needed for the forward and reverse Polymerase Chain Reaction (PCR) primer design. Our proposed approach supplements existing ones while promoting the emerging concept of neural network assisted primer design for PCR. Our CNN model was trained using a database of SARS-CoV-2 full-length genomes from GISAID and tested on a separate dataset from NCBI, with 98\% accuracy for the classification of variants. This result is based on the development of three different methods of feature extraction, and the selected primer sequences for each SARS-CoV-2 variant detection (except Omicron) were present in more than 95 \% of sequences in an independent set of 5000 same variant sequences, and below 5 \% in other independent datasets with 5000 sequences of each variant. In total, we obtain 22 forward and reverse primer pairs with flexible length sizes (18-25 base pairs) with an expected amplicon length ranging between 42 and 3322 nucleotides. Besides the feature appearance, in-silico primer checks confirmed that the identified primer pairs are suitable for accurate SARS-CoV-2 variant detection by means of PCR tests.  ( 3 min )
    Analysis and prediction of heart stroke from ejection fraction and serum creatinine using LSTM deep learning approach. (arXiv:2209.13799v1 [cs.CV])
    The combination of big data and deep learning is a world-shattering technology that can greatly impact any objective if used properly. With the availability of a large volume of health care datasets and progressions in deep learning techniques, systems are now well equipped to predict the future trend of any health problems. From the literature survey, we found the SVM was used to predict the heart failure rate without relating objective factors. Utilizing the intensity of important historical information in electronic health records (EHR), we have built a smart and predictive model utilizing long short-term memory (LSTM) and predict the future trend of heart failure based on that health record. Hence the fundamental commitment of this work is to predict the failure of the heart using an LSTM based on the patient's electronic medicinal information. We have analyzed a dataset containing the medical records of 299 heart failure patients collected at the Faisalabad Institute of Cardiology and the Allied Hospital in Faisalabad (Punjab, Pakistan). The patients consisted of 105 women and 194 men and their ages ranged from 40 and 95 years old. The dataset contains 13 features, which report clinical, body, and lifestyle information responsible for heart failure. We have found an increasing trend in our analysis which will contribute to advancing the knowledge in the field of heart stroke prediction.  ( 3 min )
    TRBoost: A Generic Gradient Boosting Machine based on Trust-region Method. (arXiv:2209.13791v1 [cs.LG])
    A generic Gradient Boosting Machine called Trust-region Boosting (TRBoost) is presented for performing supervised machine learning tasks. Existing Gradient Boosting Machines (GBMs) have achieved state-of-the-art results on many problems. However, there are some difficulties to maintain a balance between performance and generality. The first-order algorithms are appropriate for more general loss functions than the second-order algorithms; while the performance is often not as good as the latter one. TRBoost generalizes GBMs based on the Trust-region algorithm to suit arbitrary loss functions while keeping up the good performance as the second-order algorithms. Several numerical experiments are conducted to confirm that TRBoost can get competitive results while offering additional benefits in convergence.  ( 2 min )
    Deep Learning Based Detection of Enlarged Perivascular Spaces on Brain MRI. (arXiv:2209.13727v1 [eess.IV])
    Deep learning has been demonstrated effective in many neuroimaging applications. However, in many scenarios the number of imaging sequences capturing information related to small vessel disease lesions is insufficient to support data-driven techniques. Additionally, cohort-based studies may not always have the optimal or essential imaging sequences for accurate lesion detection. Therefore, it is necessary to determine which of these imaging sequences are essential for accurate detection. In this study we aimed to find the optimal combination of magnetic resonance imaging (MRI) sequences for deep learning-based detection of enlarged perivascular spaces (ePVS). To this end, we implemented an effective light-weight U-Net adapted for ePVS detection and comprehensively investigated different combinations of information from susceptibility weighted imaging (SWI), fluid-attenuated inversion recovery (FLAIR), T1-weighted (T1w) and T2-weighted (T2w) MRI sequences. We conclude that T2w MRI is the most important for accurate ePVS detection, and the incorporation of SWI, FLAIR and T1w MRI in the deep neural network could make insignificant improvements in accuracy.  ( 2 min )
    CEC-CNN: A Consecutive Expansion-Contraction Convolutional Network for Very Small Resolution Medical Image Classification. (arXiv:2209.13661v1 [cs.CV])
    Deep Convolutional Neural Networks (CNNs) for image classification successively alternate convolutions and downsampling operations, such as pooling layers or strided convolutions, resulting in lower resolution features the deeper the network gets. These downsampling operations save computational resources and provide some translational invariance as well as a bigger receptive field at the next layers. However, an inherent side-effect of this is that high-level features, produced at the deep end of the network, are always captured in low resolution feature maps. The inverse is also true, as shallow layers always contain small scale features. In biomedical image analysis engineers are often tasked with classifying very small image patches which carry only a limited amount of information. By their nature, these patches may not even contain objects, with the classification depending instead on the detection of subtle underlying patterns with an unknown scale in the image's texture. In these cases every bit of information is valuable; thus, it is important to extract the maximum number of informative features possible. Driven by these considerations, we introduce a new CNN architecture which preserves multi-scale features from deep, intermediate, and shallow layers by utilizing skip connections along with consecutive contractions and expansions of the feature maps. Using a dataset of very low resolution patches from Pancreatic Ductal Adenocarcinoma (PDAC) CT scans we demonstrate that our network can outperform current state of the art models.  ( 3 min )
    Modeling Polyp Activity of Paragorgia arborea Using Supervised Learning. (arXiv:2209.13644v1 [q-bio.PE])
    While the distribution patterns of cold-water corals, such as Paragorgia arborea, have received increasing attention in recent studies, little is known about their in situ activity patterns. In this paper, we examine polyp activity in P. arborea using machine learning techniques to analyze high-resolution time series data and photographs obtained from an autonomous lander cluster deployed in the Stjernsund, Norway. An interactive illustration of the models derived in this paper is provided online as supplementary material. We find that the best predictor of the degree of extension of the coral polyps is current direction with a lag of three hours. Other variables that are not directly associated with water currents, such as temperature and salinity, offer much less information concerning polyp activity. Interestingly, the degree of polyp extension can be predicted more reliably by sampling the laminar flows in the water column above the measurement site than by sampling the more turbulent flows in the direct vicinity of the corals. Our results show that the activity patterns of the P. arborea polyps are governed by the strong tidal current regime of the Stjernsund. It appears that P. arborea does not react to shorter changes in the ambient current regime but instead adjusts its behavior in accordance with the large-scale pattern of the tidal cycle itself in order to optimize nutrient uptake.  ( 3 min )
    DVGAN: Stabilize Wasserstein GAN training for time-domain Gravitational Wave physics. (arXiv:2209.13592v1 [astro-ph.IM])
    Simulating time-domain observations of gravitational wave (GW) detector environments will allow for a better understanding of GW sources, augment datasets for GW signal detection and help in characterizing the noise of the detectors, leading to better physics. This paper presents a novel approach to simulating fixed-length time-domain signals using a three-player Wasserstein Generative Adversarial Network (WGAN), called DVGAN, that includes an auxiliary discriminator that discriminates on the derivatives of input signals. An ablation study is used to compare the effects of including adversarial feedback from an auxiliary derivative discriminator with a vanilla two-player WGAN. We show that discriminating on derivatives can stabilize the learning of GAN components on 1D continuous signals during their training phase. This results in smoother generated signals that are less distinguishable from real samples and better capture the distributions of the training data. DVGAN is also used to simulate real transient noise events captured in the advanced LIGO GW detector.  ( 2 min )
    Efficiently Learning Recoveries from Failures Under Partial Observability. (arXiv:2209.13605v1 [cs.RO])
    Operating under real world conditions is challenging due to the possibility of a wide range of failures induced by partial observability. In relatively benign settings, such failures can be overcome by retrying or executing one of a small number of hand-engineered recovery strategies. By contrast, contact-rich sequential manipulation tasks, like opening doors and assembling furniture, are not amenable to exhaustive hand-engineering. To address this issue, we present a general approach for robustifying manipulation strategies in a sample-efficient manner. Our approach incrementally improves robustness by first discovering the failure modes of the current strategy via exploration in simulation and then learning additional recovery skills to handle these failures. To ensure efficient learning, we propose an online algorithm Value Upper Confidence Limit (Value-UCL) that selects what failure modes to prioritize and which state to recover to such that the expected performance improves maximally in every training episode. We use our approach to learn recovery skills for door-opening and evaluate them both in simulation and on a real robot with little fine-tuning. Compared to open-loop execution, our experiments show that even a limited amount of recovery learning improves task success substantially from 71\% to 92.4\% in simulation and from 75\% to 90\% on a real robot.  ( 2 min )
    FAIR-FATE: Fair Federated Learning with Momentum. (arXiv:2209.13678v1 [cs.LG])
    While fairness-aware machine learning algorithms have been receiving increasing attention, the focus has been on centralized machine learning, leaving decentralized methods underexplored. Federated Learning is a decentralized form of machine learning where clients train local models with a server aggregating them to obtain a shared global model. Data heterogeneity amongst clients is a common characteristic of Federated Learning, which may induce or exacerbate discrimination of unprivileged groups defined by sensitive attributes such as race or gender. In this work we propose FAIR-FATE: a novel FAIR FederATEd Learning algorithm that aims to achieve group fairness while maintaining high utility via a fairness-aware aggregation method that computes the global model by taking into account the fairness of the clients. To achieve that, the global model update is computed by estimating a fair model update using a Momentum term that helps to overcome the oscillations of noisy non-fair gradients. To the best of our knowledge, this is the first approach in machine learning that aims to achieve fairness using a fair Momentum estimate. Experimental results on four real-world datasets demonstrate that FAIR-FATE significantly outperforms state-of-the-art fair Federated Learning algorithms under different levels of data heterogeneity.  ( 2 min )
    Learn one size to infer all: Exploiting translational symmetries in delay-dynamical and spatio-temporal systems using scalable neural networks. (arXiv:2111.03706v2 [cs.LG] UPDATED)
    We design scalable neural networks adapted to translational symmetries in dynamical systems, capable of inferring untrained high-dimensional dynamics for different system sizes. We train these networks to predict the dynamics of delay-dynamical and spatio-temporal systems for a single size. Then, we drive the networks by their own predictions. We demonstrate that by scaling the size of the trained network, we can predict the complex dynamics for larger or smaller system sizes. Thus, the network learns from a single example and, by exploiting symmetry properties, infers entire bifurcation diagrams.  ( 2 min )
    Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks. (arXiv:2006.07002v8 [cs.LG] UPDATED)
    We study the transfer learning process between two linear regression problems. An important and timely special case is when the regressors are overparameterized and perfectly interpolate their training data. We examine a parameter transfer mechanism whereby a subset of the parameters of the target task solution are constrained to the values learned for a related source task. We analytically characterize the generalization error of the target task in terms of the salient factors in the transfer learning architecture, i.e., the number of examples available, the number of (free) parameters in each of the tasks, the number of parameters transferred from the source to target task, and the relation between the two tasks. Our non-asymptotic analysis shows that the generalization error of the target task follows a two-dimensional double descent trend (with respect to the number of free parameters in each of the tasks) that is controlled by the transfer learning factors. Our analysis points to specific cases where the transfer of parameters is beneficial as a substitute for extra overparameterization (i.e., additional free parameters in the target task). Specifically, we show that the usefulness of a transfer learning setting is fragile and depends on a delicate interplay among the set of transferred parameters, the relation between the tasks, and the true solution. We also demonstrate that overparameterized transfer learning is not necessarily more beneficial when the source task is closer or identical to the target task.  ( 3 min )
    On the Limitations of Stochastic Pre-processing Defenses. (arXiv:2206.09491v2 [cs.LG] UPDATED)
    Defending against adversarial examples remains an open problem. A common belief is that randomness at inference increases the cost of finding adversarial inputs. An example of such a defense is to apply a random transformation to inputs prior to feeding them to the model. In this paper, we empirically and theoretically investigate such stochastic pre-processing defenses and demonstrate that they are flawed. First, we show that most stochastic defenses are weaker than previously thought; they lack sufficient randomness to withstand even standard attacks like projected gradient descent. This casts doubt on a long-held assumption that stochastic defenses invalidate attacks designed to evade deterministic defenses and force attackers to integrate the Expectation over Transformation (EOT) concept. Second, we show that stochastic defenses confront a trade-off between adversarial robustness and model invariance; they become less effective as the defended model acquires more invariance to their randomization. Future work will need to decouple these two effects. We also discuss implications and guidance for future research.  ( 2 min )
    One-Step Abductive Multi-Target Learning with Diverse Noisy Samples and Its Application to Tumour Segmentation for Breast Cancer. (arXiv:2110.10325v7 [cs.LG] UPDATED)
    Recent studies have demonstrated the effectiveness of the combination of machine learning and logical reasoning, including data-driven logical reasoning, knowledge driven machine learning and abductive learning, in inventing advanced artificial intelligence technologies. One-step abductive multi-target learning (OSAMTL), an approach inspired by abductive learning, via simply combining machine learning and logical reasoning in a one-step balanced way, has as well shown its effectiveness in handling complex noisy labels of a single noisy sample in medical histopathology whole slide image analysis (MHWSIA). However, OSAMTL is not suitable for the situation where diverse noisy samples (DiNS) are provided for a learning task. In this paper, giving definition of DiNS, we propose one-step abductive multi-target learning with DiNS (OSAMTL-DiNS) to expand the original OSAMTL to handle complex noisy labels of DiNS. Applying OSAMTL-DiNS to tumour segmentation for breast cancer in MHWSIA, we show that OSAMTL-DiNS is able to enable various state-of-the-art approaches for learning from noisy labels to achieve more rational predictions.  ( 3 min )
    A Doubly Optimistic Strategy for Safe Linear Bandits. (arXiv:2209.13694v1 [cs.LG])
    We propose a \underline{d}oubly \underline{o}ptimistic strategy for the \underline{s}afe-\underline{l}inear-\underline{b}andit problem, DOSLB. The safe linear bandit problem is to optimise an unknown linear reward whilst satisfying unknown round-wise safety constraints on actions, using stochastic bandit feedback of reward and safety-risks of actions. In contrast to prior work on aggregated resource constraints, our formulation explicitly demands control on roundwise safety risks. Unlike existing optimistic-pessimistic paradigms for safe bandits, DOSLB exercises supreme optimism, using optimistic estimates of reward and safety scores to select actions. Yet, and surprisingly, we show that DOSLB rarely takes risky actions, and obtains $\tilde{O}(d \sqrt{T})$ regret, where our notion of regret accounts for both inefficiency and lack of safety of actions. Specialising to polytopal domains, we first notably show that the $\sqrt{T}$-regret bound cannot be improved even with large gaps, and then identify a slackened notion of regret for which we show tight instance-dependent $O(\log^2 T)$ bounds. We further argue that in such domains, the number of times an overly risky action is played is also bounded as $O(\log^2T)$.  ( 2 min )
    Learning Deep Representations via Contrastive Learning for Instance Retrieval. (arXiv:2209.13832v1 [cs.CV])
    Instance-level Image Retrieval (IIR), or simply Instance Retrieval, deals with the problem of finding all the images within an dataset that contain a query instance (e.g. an object). This paper makes the first attempt that tackles this problem using instance-discrimination based contrastive learning (CL). While CL has shown impressive performance for many computer vision tasks, the similar success has never been found in the field of IIR. In this work, we approach this problem by exploring the capability of deriving discriminative representations from pre-trained and fine-tuned CL models. To begin with, we investigate the efficacy of transfer learning in IIR, by comparing off-the-shelf features learned by a pre-trained deep neural network (DNN) classifier with features learned by a CL model. The findings inspired us to propose a new training strategy that optimizes CL towards learning IIR-oriented features, by using an Average Precision (AP) loss together with a fine-tuning method to learn contrastive feature representations that are tailored to IIR. Our empirical evaluation demonstrates significant performance enhancement over the off-the-shelf features learned from a pre-trained DNN classifier on the challenging Oxford and Paris datasets.  ( 2 min )
    FLOWGEN: Fast and slow graph generation. (arXiv:2207.07656v4 [cs.LG] UPDATED)
    Machine learning systems typically apply the same model to both easy and tough cases. This is in stark contrast with humans, who tend to evoke either fast (instinctive) or slow (analytical) thinking depending on the problem difficulty, a property called the dual-process theory of mind. We present FLOWGEN, a graph-generation model inspired by the dual-process theory of mind that generates large graphs incrementally. Depending on the difficulty of completing the graph at the current step, graph generation is routed to either a fast (weaker) or a slow (stronger) model. The fast and slow models have identical architectures, but vary in the number of parameters and consequently the strength. Experiments on real-world graphs show that ours can successfully generate graphs similar to those generated by a single large model in a fraction of time.  ( 2 min )
    Obstacle Identification and Ellipsoidal Decomposition for Fast Motion Planning in Unknown Dynamic Environments. (arXiv:2209.14233v1 [cs.RO])
    Collision avoidance in the presence of dynamic obstacles in unknown environments is one of the most critical challenges for unmanned systems. In this paper, we present a method that identifies obstacles in terms of ellipsoids to estimate linear and angular obstacle velocities. Our proposed method is based on the idea of any object can be approximately expressed by ellipsoids. To achieve this, we propose a method based on variational Bayesian estimation of Gaussian mixture model, the Kyachiyan algorithm, and a refinement algorithm. Our proposed method does not require knowledge of the number of clusters and can operate in real-time, unlike existing optimization-based methods. In addition, we define an ellipsoid-based feature vector to match obstacles given two timely close point frames. Our method can be applied to any environment with static and dynamic obstacles, including the ones with rotating obstacles. We compare our algorithm with other clustering methods and show that when coupled with a trajectory planner, the overall system can efficiently traverse unknown environments in the presence of dynamic obstacles.  ( 2 min )
    Anomaly detection optimization using big data and deep learning to reduce false-positive. (arXiv:2209.13965v1 [cs.AI])
    Anomaly-based Intrusion Detection System (IDS) has been a hot research topic because of its ability to detect new threats rather than only memorized signatures threats of signature-based IDS. Especially after the availability of advanced technologies that increase the number of hacking tools and increase the risk impact of an attack. The problem of any anomaly-based model is its high false-positive rate. The high false-positive rate is the reason why anomaly IDS is not commonly applied in practice. Because anomaly-based models classify an unseen pattern as a threat where it may be normal but not included in the training dataset. This type of problem is called overfitting where the model is not able to generalize. Optimizing Anomaly-based models by having a big training dataset that includes all possible normal cases may be an optimal solution but could not be applied in practice. Although we can increase the number of training samples to include much more normal cases, still we need a model that has more ability to generalize. In this research paper, we propose applying deep model instead of traditional models because it has more ability to generalize. Thus, we will obtain less false-positive by using big data and deep model. We made a comparison between machine learning and deep learning algorithms in the optimization of anomaly-based IDS by decreasing the false-positive rate. We did an experiment on the NSL-KDD benchmark and compared our results with one of the best used classifiers in traditional learning in IDS optimization. The experiment shows 10% lower false-positive by using deep learning instead of traditional learning.  ( 3 min )
    Spectral Diffusion Processes. (arXiv:2209.14125v1 [stat.ML])
    Score-based generative modelling (SGM) has proven to be a very effective method for modelling densities on finite-dimensional spaces. In this work we propose to extend this methodology to learn generative models over functional spaces. To do so, we represent functional data in spectral space to dissociate the stochastic part of the processes from their space-time part. Using dimensionality reduction techniques we then sample from their stochastic component using finite dimensional SGM. We demonstrate our method's effectiveness for modelling various multimodal datasets.  ( 2 min )
    Factual and Informative Review Generation for Explainable Recommendation. (arXiv:2209.12613v2 [cs.CL] UPDATED)
    Recent models can generate fluent and grammatical synthetic reviews while accurately predicting user ratings. The generated reviews, expressing users' estimated opinions towards related products, are often viewed as natural language 'rationales' for the jointly predicted rating. However, previous studies found that existing models often generate repetitive, universally applicable, and generic explanations, resulting in uninformative rationales. Further, our analysis shows that previous models' generated content often contain factual hallucinations. These issues call for novel solutions that could generate both informative and factually grounded explanations. Inspired by recent success in using retrieved content in addition to parametric knowledge for generation, we propose to augment the generator with a personalized retriever, where the retriever's output serves as external knowledge for enhancing the generator. Experiments on Yelp, TripAdvisor, and Amazon Movie Reviews dataset show our model could generate explanations that more reliably entail existing reviews, are more diverse, and are rated more informative by human evaluators.  ( 2 min )
    Attacking Compressed Vision Transformers. (arXiv:2209.13785v1 [cs.LG])
    Vision Transformers are increasingly embedded in industrial systems due to their superior performance, but their memory and power requirements make deploying them to edge devices a challenging task. Hence, model compression techniques are now widely used to deploy models on edge devices as they decrease the resource requirements and make model inference very fast and efficient. But their reliability and robustness from a security perspective is another major issue in safety-critical applications. Adversarial attacks are like optical illusions for ML algorithms and they can severely impact the accuracy and reliability of models. In this work we investigate the transferability of adversarial samples across the SOTA Vision Transformer models across 3 SOTA compressed versions and infer the effects different compression techniques have on adversarial attacks.  ( 2 min )
    LL-GNN: Low Latency Graph Neural Networks on FPGAs for Particle Detectors. (arXiv:2209.14065v1 [cs.AR])
    This work proposes a novel reconfigurable architecture for low latency Graph Neural Network (GNN) design specifically for particle detectors. Accelerating GNNs for particle detectors is challenging since it requires sub-microsecond latency to deploy the networks for online event selection in the Level-1 triggers at the CERN Large Hadron Collider experiments. This paper proposes a custom code transformation with strength reduction for the matrix multiplication operations in the interaction-network based GNNs with fully connected graphs, which avoids the costly multiplication. It exploits sparsity patterns as well as binary adjacency matrices, and avoids irregular memory access, leading to a reduction in latency and improvement in hardware efficiency. In addition, we introduce an outer-product based matrix multiplication approach which is enhanced by the strength reduction for low latency design. Also, a fusion step is introduced to further reduce the design latency. Furthermore, an GNN-specific algorithm-hardware co-design approach is presented which not only finds a design with a much better latency but also finds a high accuracy design under a given latency constraint. Finally, a customizable template for this low latency GNN hardware architecture has been designed and open-sourced, which enables the generation of low-latency FPGA designs with efficient resource utilization using a high-level synthesis tool. Evaluation results show that our FPGA implementation is up to 24 times faster and consumes up to 45 times less power than a GPU implementation. Compared to our previous FPGA implementations, this work achieves 6.51 to 16.7 times lower latency. Moreover, the latency of our FPGA design is sufficiently low to enable deployment of GNNs in a sub-microsecond, real-time collider trigger system, enabling it to benefit from improved accuracy.  ( 3 min )
    Multilingual Search with Subword TF-IDF. (arXiv:2209.14281v1 [cs.CL])
    Multilingual search can be achieved with subword tokenization. The accuracy of traditional TF-IDF approaches depend on manually curated tokenization, stop words and stemming rules, whereas subword TF-IDF (STF-IDF) can offer higher accuracy without such heuristics. Moreover, multilingual support can be incorporated inherently as part of the subword tokenization model training. XQuAD evaluation demonstrates the advantages of STF-IDF: superior information retrieval accuracy of 85.4% for English and over 80% for 10 other languages without any heuristics-based preprocessing. The software to reproduce these results are open-sourced as a part of Text2Text: https://github.com/artitw/text2text  ( 2 min )
    Importance of Kernel Bandwidth in Quantum Machine Learning. (arXiv:2111.05451v4 [quant-ph] UPDATED)
    Quantum kernel methods are considered a promising avenue for applying quantum computers to machine learning problems. Identifying hyperparameters controlling the inductive bias of quantum machine learning models is expected to be crucial given the central role hyperparameters play in determining the performance of classical machine learning methods. In this work we introduce the hyperparameter controlling the bandwidth of a quantum kernel and show that it controls the expressivity of the resulting model. We use extensive numerical experiments with multiple quantum kernels and classical datasets to show consistent change in the model behavior from underfitting (bandwidth too large) to overfitting (bandwidth too small), with optimal generalization in between. We draw a connection between the bandwidth of classical and quantum kernels and show analogous behavior in both cases. Furthermore, we show that optimizing the bandwidth can help mitigate the exponential decay of kernel values with qubit count, which is the cause behind recent observations that the performance of quantum kernel methods decreases with qubit count. We reproduce these negative results and show that if the kernel bandwidth is optimized, the performance instead improves with growing qubit count and becomes competitive with the best classical methods.  ( 3 min )
  • Open

    SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks. (arXiv:2206.05794v2 [cs.LG] UPDATED)
    We analyze deep ReLU neural networks trained with mini-batch Stochastic Gradient Descent (SGD) and weight decay. We show, both theoretically and empirically, that when training a neural network using SGD with weight decay and small batch size, the resulting weight matrices tend to be of small rank. Our analysis relies on a minimal set of assumptions; the neural networks may be arbitrarily wide or deep and may include residual connections, as well as convolutional layers. The same analysis implies the inherent presence of SGD "noise", defined as the inability of SGD to converge to a stationary point. In particular, we prove that SGD noise must always be present, even asymptotically, as long as we incorporate weight decay and the batch size is smaller than the total number of training samples.  ( 2 min )
    Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective. (arXiv:2205.07320v3 [cs.LG] UPDATED)
    The lottery ticket hypothesis (LTH) has attracted attention because it can explain why over-parameterized models often show high generalization ability. It is known that when we use iterative magnitude pruning (IMP), which is an algorithm to find sparse networks with high generalization ability that can be trained from the initial weights independently, called winning tickets, the initial large learning rate does not work well in deep neural networks such as ResNet. However, since the initial large learning rate generally helps the optimizer to converge to flatter minima, we hypothesize that the winning tickets have relatively sharp minima, which is considered a disadvantage in terms of generalization ability. In this paper, we confirm this hypothesis and show that the PAC-Bayesian theory can provide an explicit understanding of the relationship between LTH and generalization behavior. On the basis of our experimental findings that flatness is useful for improving accuracy and robustness to label noise and that the distance from the initial weights is deeply involved in winning tickets, we offer the PAC-Bayes bound using a spike-and-slab distribution to analyze winning tickets. Finally, we revisit existing algorithms for finding winning tickets from a PAC-Bayesian perspective and provide new insights into these methods.  ( 3 min )
    CausalSim: A Causal Inference Framework for Unbiased Trace-Driven Simulation. (arXiv:2201.01811v3 [cs.LG] UPDATED)
    We present CausalSim, a causal inference framework for unbiased trace-driven simulation. Current trace-driven simulators assume that the interventions being simulated (e.g., a new algorithm) would not affect the validity of the traces. However, real-world traces are often biased by the choices of algorithms made during trace collection, and hence replaying traces under an intervention may lead to incorrect results. CausalSim addresses this challenge by learning a causal model of the system dynamics and latent factors capturing the underlying system conditions during trace collection. It learns these models using an initial randomized control trial (RCT) under a fixed set of algorithms, and then applies them to remove biases from trace data when simulating new algorithms. Key to CausalSim is mapping unbiased trace-driven simulation to a tensor completion problem with extremely sparse observations. By exploiting a basic distributional invariance property present in RCT data, CausalSim enables a novel tensor completion method despite the sparsity of observations. Our extensive evaluation of CausalSim on both real and synthetic datasets, including more than ten months of real data from the Puffer video streaming system show it improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines. Moreover, CausalSim provides markedly different insights about ABR algorithms compared to the biased baseline simulator, which we validate with a real deployment.  ( 3 min )
    Constraint-Based Causal Structure Learning from Undersampled Graphs. (arXiv:2205.09235v3 [stat.ML] UPDATED)
    Graphical structures estimated by causal learning algorithms from time series data can provide highly misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Although this problem has been recently recognized, practitioners have limited resources to respond to it, and so must continue using models that they know are likely misleading. Existing methods either (a) require that the difference between causal and measurement timescales is known; or (b) can handle only very small number of random variables when the timescale difference is unknown; or (c) apply to only pairs of variables, though with fewer assumptions about prior knowledge; or (d) return impractically too many solutions. This paper addresses all four challenges. We combine constraint programming with both theoretical insights into the problem structure and prior information about admissible causal interactions. The resulting system provides a practical approach that scales to significantly larger sets (>100) of random variables, does not require precise knowledge of the timescale difference, supports edge misidentification and parametric connection strengths, and can provide the optimum choice among many possible solutions. The cumulative impact of these improvements is gain of multiple orders of magnitude in speed and informativeness.  ( 3 min )
    Randomized K-FACs: Speeding up K-FAC with Randomized Numerical Linear Algebra. (arXiv:2206.15397v2 [cs.LG] UPDATED)
    K-FAC is a successful tractable implementation of Natural Gradient for Deep Learning, which nevertheless suffers from the requirement to compute the inverse of the Kronecker factors (through an eigen-decomposition). This can be very time-consuming (or even prohibitive) when these factors are large. In this paper, we theoretically show that, owing to the exponential-average construction paradigm of the Kronecker factors that is typically used, their eigen-spectrum must decay. We show numerically that in practice this decay is very rapid, leading to the idea that we could save substantial computation by only focusing on the first few eigen-modes when inverting the Kronecker-factors. Randomized Numerical Linear Algebra provides us with the necessary tools to do so. Numerical results show we obtain $\approx2.5\times$ reduction in per-epoch time and $\approx3.3\times$ reduction in time to target accuracy. We compare our proposed K-FAC sped-up versions with a more computationally efficient NG implementation, SENG, and observe we perform on par with it.  ( 2 min )
    Estimators of Entropy and Information via Inference in Probabilistic Models. (arXiv:2202.12363v3 [stat.ML] UPDATED)
    Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use importance sampling with proposal distribution families that include amortized variational inference and sequential Monte Carlo, which can be tailored to the target model and used to squeeze true information values with high accuracy. We present several theoretical properties of EEVI and demonstrate scalability and efficacy on two problems from the medical domain: (i) in an expert system for diagnosing liver disorders, we rank medical tests according to how informative they are about latent diseases, given a pattern of observed symptoms and patient attributes; and (ii) in a differential equation model of carbohydrate metabolism, we find optimal times to take blood glucose measurements that maximize information about a diabetic patient's insulin sensitivity, given their meal and medication schedule.  ( 3 min )
    Joint Learning of Linear Time-Invariant Dynamical Systems. (arXiv:2112.10955v4 [stat.ML] UPDATED)
    Linear time-invariant systems are very popular models in system theory and applications. A fundamental problem in system identification that remains rather unaddressed in extant literature is to leverage commonalities amongst related linear systems to estimate their transition matrices more accurately. To address this problem, the current paper investigates methods for jointly estimating the transition matrices of multiple systems. It is assumed that the transition matrices are unknown linear functions of some unknown shared basis matrices. We establish finite-time estimation error rates that fully reflect the roles of trajectory lengths, dimension, and number of systems under consideration. The presented results are fairly general and show the significant gains that can be achieved by pooling data across systems in comparison to learning each system individually. Further, they are shown to be robust against model misspecifications. To obtain the results, we develop novel techniques that are of interest for addressing similar joint-learning problems. They include tightly bounding estimation errors in terms of the eigen-structures of transition matrices, establishing sharp high probability bounds for singular values of dependent random matrices, and capturing effects of misspecified transition matrices as the systems evolve over time.  ( 3 min )
    Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks. (arXiv:2006.07002v8 [cs.LG] UPDATED)
    We study the transfer learning process between two linear regression problems. An important and timely special case is when the regressors are overparameterized and perfectly interpolate their training data. We examine a parameter transfer mechanism whereby a subset of the parameters of the target task solution are constrained to the values learned for a related source task. We analytically characterize the generalization error of the target task in terms of the salient factors in the transfer learning architecture, i.e., the number of examples available, the number of (free) parameters in each of the tasks, the number of parameters transferred from the source to target task, and the relation between the two tasks. Our non-asymptotic analysis shows that the generalization error of the target task follows a two-dimensional double descent trend (with respect to the number of free parameters in each of the tasks) that is controlled by the transfer learning factors. Our analysis points to specific cases where the transfer of parameters is beneficial as a substitute for extra overparameterization (i.e., additional free parameters in the target task). Specifically, we show that the usefulness of a transfer learning setting is fragile and depends on a delicate interplay among the set of transferred parameters, the relation between the tasks, and the true solution. We also demonstrate that overparameterized transfer learning is not necessarily more beneficial when the source task is closer or identical to the target task.  ( 3 min )
    Solar Flare Index Prediction Using SDO/HMI Vector Magnetic Data Products with Statistical and Machine Learning Methods. (arXiv:2209.13779v1 [astro-ph.SR])
    Solar flares, especially the M- and X-class flares, are often associated with coronal mass ejections (CMEs). They are the most important sources of space weather effects, that can severely impact the near-Earth environment. Thus it is essential to forecast flares (especially the M-and X-class ones) to mitigate their destructive and hazardous consequences. Here, we introduce several statistical and Machine Learning approaches to the prediction of the AR's Flare Index (FI) that quantifies the flare productivity of an AR by taking into account the numbers of different class flares within a certain time interval. Specifically, our sample includes 563 ARs appeared on solar disk from May 2010 to Dec 2017. The 25 magnetic parameters, provided by the Space-weather HMI Active Region Patches (SHARP) from Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO), characterize coronal magnetic energy stored in ARs by proxy and are used as the predictors. We investigate the relationship between these SHARP parameters and the FI of ARs with a machine-learning algorithm (spline regression) and the resampling method (Synthetic Minority Over-Sampling Technique for Regression with Gaussian Noise, short by SMOGN). Based on the established relationship, we are able to predict the value of FIs for a given AR within the next 1-day period. Compared with other 4 popular machine learning algorithms, our methods improve the accuracy of FI prediction, especially for large FI. In addition, we sort the importance of SHARP parameters by Borda Count method calculated from the ranks that are rendered by 9 different machine learning methods.  ( 3 min )
    ButterflyFlow: Building Invertible Layers with Butterfly Matrices. (arXiv:2209.13774v1 [cs.LG])
    Normalizing flows model complex probability distributions using maps obtained by composing invertible layers. Special linear layers such as masked and 1x1 convolutions play a key role in existing architectures because they increase expressive power while having tractable Jacobians and inverses. We propose a new family of invertible linear layers based on butterfly layers, which are known to theoretically capture complex linear structures including permutations and periodicity, yet can be inverted efficiently. This representational power is a key advantage of our approach, as such structures are common in many real-world datasets. Based on our invertible butterfly layers, we construct a new class of normalizing flow models called ButterflyFlow. Empirically, we demonstrate that ButterflyFlows not only achieve strong density estimation results on natural images such as MNIST, CIFAR-10, and ImageNet 32x32, but also obtain significantly better log-likelihoods on structured datasets such as galaxy images and MIMIC-III patient cohorts -- all while being more efficient in terms of memory and computation than relevant baselines.  ( 2 min )
    Hamiltonian Adaptive Importance Sampling. (arXiv:2209.13716v1 [cs.LG])
    Importance sampling (IS) is a powerful Monte Carlo (MC) methodology for approximating integrals, for instance in the context of Bayesian inference. In IS, the samples are simulated from the so-called proposal distribution, and the choice of this proposal is key for achieving a high performance. In adaptive IS (AIS) methods, a set of proposals is iteratively improved. AIS is a relevant and timely methodology although many limitations remain yet to be overcome, e.g., the curse of dimensionality in high-dimensional and multi-modal problems. Moreover, the Hamiltonian Monte Carlo (HMC) algorithm has become increasingly popular in machine learning and statistics. HMC has several appealing features such as its exploratory behavior, especially in high-dimensional targets, when other methods suffer. In this paper, we introduce the novel Hamiltonian adaptive importance sampling (HAIS) method. HAIS implements a two-step adaptive process with parallel HMC chains that cooperate at each iteration. The proposed HAIS efficiently adapts a population of proposals, extracting the advantages of HMC. HAIS can be understood as a particular instance of the generic layered AIS family with an additional resampling step. HAIS achieves a significant performance improvement in high-dimensional problems w.r.t. state-of-the-art algorithms. We discuss the statistical properties of HAIS and show its high performance in two challenging examples.  ( 3 min )
    Learning Asynchronous and Error-prone Longitudinal Data via Functional Calibration. (arXiv:2209.13807v1 [stat.ME])
    In many longitudinal settings, time-varying covariates may not be measured at the same time as responses and are often prone to measurement error. Naive last-observation-carried-forward methods incur estimation biases, and existing kernel-based methods suffer from slow convergence rates and large variations. To address these challenges, we propose a new functional calibration approach to efficiently learn longitudinal covariate processes based on sparse functional data with measurement error. Our approach, stemming from functional principal component analysis, calibrates the unobserved synchronized covariate values from the observed asynchronous and error-prone covariate values, and is broadly applicable to asynchronous longitudinal regression with time-invariant or time-varying coefficients. For regression with time-invariant coefficients, our estimator is asymptotically unbiased, root-n consistent, and asymptotically normal; for time-varying coefficient models, our estimator has the optimal varying coefficient model convergence rate with inflated asymptotic variance from the calibration. In both cases, our estimators present asymptotic properties superior to the existing methods. The feasibility and usability of the proposed methods are verified by simulations and an application to the Study of Women's Health Across the Nation, a large-scale multi-site longitudinal study on women's health during mid-life.  ( 2 min )
    Conformal Prediction is Robust to Label Noise. (arXiv:2209.14295v1 [cs.LG])
    We study the robustness of conformal prediction, a powerful tool for uncertainty quantification, to label noise. Our analysis tackles both regression and classification problems, characterizing when and how it is possible to construct uncertainty sets that correctly cover the unobserved noiseless ground truth labels. Through stylized theoretical examples and practical experiments, we argue that naive conformal prediction covers the noiseless ground truth label unless the noise distribution is adversarially designed. This leads us to believe that correcting for label noise is unnecessary except for pathological data distributions or noise sources. In such cases, we can also correct for noise of bounded size in the conformal prediction algorithm in order to ensure correct coverage of the ground truth labels without score or data regularity.  ( 2 min )
    Spectral Diffusion Processes. (arXiv:2209.14125v1 [stat.ML])
    Score-based generative modelling (SGM) has proven to be a very effective method for modelling densities on finite-dimensional spaces. In this work we propose to extend this methodology to learn generative models over functional spaces. To do so, we represent functional data in spectral space to dissociate the stochastic part of the processes from their space-time part. Using dimensionality reduction techniques we then sample from their stochastic component using finite dimensional SGM. We demonstrate our method's effectiveness for modelling various multimodal datasets.  ( 2 min )
    A Doubly Optimistic Strategy for Safe Linear Bandits. (arXiv:2209.13694v1 [cs.LG])
    We propose a \underline{d}oubly \underline{o}ptimistic strategy for the \underline{s}afe-\underline{l}inear-\underline{b}andit problem, DOSLB. The safe linear bandit problem is to optimise an unknown linear reward whilst satisfying unknown round-wise safety constraints on actions, using stochastic bandit feedback of reward and safety-risks of actions. In contrast to prior work on aggregated resource constraints, our formulation explicitly demands control on roundwise safety risks. Unlike existing optimistic-pessimistic paradigms for safe bandits, DOSLB exercises supreme optimism, using optimistic estimates of reward and safety scores to select actions. Yet, and surprisingly, we show that DOSLB rarely takes risky actions, and obtains $\tilde{O}(d \sqrt{T})$ regret, where our notion of regret accounts for both inefficiency and lack of safety of actions. Specialising to polytopal domains, we first notably show that the $\sqrt{T}$-regret bound cannot be improved even with large gaps, and then identify a slackened notion of regret for which we show tight instance-dependent $O(\log^2 T)$ bounds. We further argue that in such domains, the number of times an overly risky action is played is also bounded as $O(\log^2T)$.  ( 2 min )
    Statistical limits of correlation detection in trees. (arXiv:2209.13723v1 [math.ST])
    In this paper we address the problem of testing whether two observed trees $(t,t')$ are sampled either independently or from a joint distribution under which they are correlated. This problem, which we refer to as correlation detection in trees, plays a key role in the study of graph alignment for two correlated random graphs. Motivated by graph alignment, we investigate the conditions of existence of one-sided tests, i.e. tests which have vanishing type I error and non-vanishing power in the limit of large tree depth. For the correlated Galton-Watson model with Poisson offspring of mean $\lambda>0$ and correlation parameter $s \in (0,1)$, we identify a phase transition in the limit of large degrees at $s = \sqrt{\alpha}$, where $\alpha \sim 0.3383$ is Otter's constant. Namely, we prove that no such test exists for $s \leq \sqrt{\alpha}$, and that such a test exists whenever $s > \sqrt{\alpha}$, for $\lambda$ large enough. This result sheds new light on the graph alignment problem in the sparse regime (with $O(1)$ average node degrees) and on the performance of the MPAlign method studied in Ganassali et al. (2021), Piccioli et al. (2021), proving in particular the conjecture of Piccioli et al. (2021) that MPAlign succeeds in the partial recovery task for correlation parameter $s>\sqrt{\alpha}$ provided the average node degree $\lambda$ is large enough.  ( 3 min )
    Score Modeling for Simulation-based Inference. (arXiv:2209.14249v1 [cs.LG])
    Neural Posterior Estimation methods for simulation-based inference can be ill-suited for dealing with posterior distributions obtained by conditioning on multiple observations, as they may require a large number of simulator calls to yield accurate approximations. Neural Likelihood Estimation methods can naturally handle multiple observations, but require a separate inference step, which may affect their efficiency and performance. We introduce a new method for simulation-based inference that enjoys the benefits of both approaches. We propose to model the scores for the posterior distributions induced by individual observations, and introduce a sampling algorithm that combines the learned scores to approximately sample from the target efficiently.  ( 2 min )
    A deep learning approach for the computation of curvature in the level-set method. (arXiv:2002.02804v4 [math.NA] UPDATED)
    We propose a deep learning strategy to estimate the mean curvature of two-dimensional implicit interfaces in the level-set method. Our approach is based on fitting feed-forward neural networks to synthetic data sets constructed from circular interfaces immersed in uniform grids of various resolutions. These multilayer perceptrons process the level-set values from mesh points next to the free boundary and output the dimensionless curvature at their closest locations on the interface. Accuracy analyses involving irregular interfaces, in both uniform and adaptive grids, show that our models are competitive with traditional numerical schemes in the $L^1$ and $L^2$ norms. In particular, our neural networks approximate curvature with comparable precision in coarse resolutions, when the interface features steep curvature regions, and when the number of iterations to reinitialize the level-set function is small. Although the conventional numerical approach is more robust than our framework, our results have unveiled the potential of machine learning for dealing with computational tasks where the level-set method is known to experience difficulties. We also establish that an application-dependent map of local resolutions to neural models can be devised to estimate mean curvature more effectively than a universal neural network.  ( 3 min )
    Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach. (arXiv:2103.14029v3 [stat.ML] UPDATED)
    We study the estimation of causal parameters when not all confounders are observed and instead negative controls are available. Recent work has shown how these can enable identification and efficient estimation via two so-called bridge functions. In this paper, we tackle the primary challenge to causal inference using negative controls: the identification and estimation of these bridge functions. Previous work has relied on completeness conditions on these functions to identify the causal parameters and required uniqueness assumptions in estimation, and they also focused on parametric estimation of bridge functions. Instead, we provide a new identification strategy that avoids the completeness condition. And, we provide new estimators for these functions based on minimax learning formulations. These estimators accommodate general function classes such as Reproducing Kernel Hilbert Spaces and neural networks. We study finite-sample convergence results both for estimating bridge functions themselves and for the final estimation of the causal parameter under a variety of combinations of assumptions. We avoid uniqueness conditions on the bridge functions as much as possible.  ( 2 min )
    Consensus Knowledge Graph Learning via Multi-view Sparse Low Rank Block Model. (arXiv:2209.13762v1 [stat.ML])
    Network analysis has been a powerful tool to unveil relationships and interactions among a large number of objects. Yet its effectiveness in accurately identifying important node-node interactions is challenged by the rapidly growing network size, with data being collected at an unprecedented granularity and scale. Common wisdom to overcome such high dimensionality is collapsing nodes into smaller groups and conducting connectivity analysis on the group level. Dividing efforts into two phases inevitably opens a gap in consistency and drives down efficiency. Consensus learning emerges as a new normal for common knowledge discovery with multiple data sources available. To this end, this paper features developing a unified framework of simultaneous grouping and connectivity analysis by combining multiple data sources. The algorithm also guarantees a statistically optimal estimator.  ( 2 min )
    Online Policy Optimization for Robust MDP. (arXiv:2209.13841v1 [cs.LG])
    Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework -- in which the transition probabilities belong to an uncertainty set around a nominal model -- provides one way to develop robust models. While previous analysis shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address the additional uncertainty caused by an adversarial environment, our model features a new optimistic update rule derived via Fenchel conjugates. Our analysis establishes the first regret bound for online robust MDPs.  ( 2 min )
    Distance-based Positive and Unlabeled Learning for Ranking. (arXiv:2005.10700v3 [cs.LG] UPDATED)
    Learning to rank -- producing a ranked list of items specific to a query and with respect to a set of supervisory items -- is a problem of general interest. The setting we consider is one in which no analytic description of what constitutes a good ranking is available. Instead, we have a collection of representations and supervisory information consisting of a (target item, interesting items set) pair. We demonstrate analytically, in simulation, and in real data examples that learning to rank via combining representations using an integer linear program is effective when the supervision is as light as "these few items are similar to your item of interest." While this nomination task is quite general, for specificity we present our methodology from the perspective of vertex nomination in graphs. The methodology described herein is model agnostic.  ( 2 min )

  • Open

    [D] Learning Distinct Filters in CNNs
    I was training a 1D CNN to to classify a time series. I tried using 2 different filters because I’m classifying time series into 2 classes. But I noticed that the 2 filters ended up being very similar. I’ve seen visualizations of feature maps/filters in 2d CNNs and they always look pretty distinct. I was wondering if anyone has ever tried adding the absolute value of the dot product of each filter with every other filter to the loss function to penalize similarity and hopefully produce orthogonal feature maps/filters? I guess this wouldn’t work for networks with a lot of filters. Are there any techniques for ensuring filters are distinct? Or is it even important that filters are distinct? submitted by /u/LiquidDinosaurs69 [link] [comments]  ( 89 min )
    [R] 1nn with subsampling is infinity-nn with a specific set of weights
    I'm a data scientist with a regression problem that we're solving via k-nearest neighbors. We're concerned with the accuracy of ~100 estimates added together rather than the accuracy of any single estimate. Thus, bias was important to us. As long as we didn't systemically over or under predict, we would have a very good estimate in the end overall. I discovered recently that 1nn had worse precision than knn but the bias was significantly better. So I then thought to incorporate subsampling to see if that could improve the precision without hurting the bias much. I figured bootstrapping didn't make sense since a repeated item doesn't make much sense in the context of 1nn! So I went with subsampling without replacement, but with the bootstrap selection rate of 1-1/e ~ .632 Subsampling is…  ( 90 min )
    [D] How good of a language model could you fit into 10GB? 100GB?
    I know the big boys like GPT have billions and millions of parameters so they must be huge But how coherent of a language model could you make and fit in a client application to process and return output on user input? Keeping it to a reasonable size on disk feasible or no? submitted by /u/SnakeBladeStyle [link] [comments]  ( 89 min )
    [P] How to fine tune stable diffusion: how we made the text-to-pokemon model at Lambda
    Here's a guide Justin Pinkney ( u/Buntworthy ) put together on how he trained the text-to-pokemon model that you've been seeing all over twitter. the start sample looks like normal image, then start to get a Pokemon style, and eventually diverge from the original prompts as training continues. The post: https://lambdalabs.com/blog/how-to-fine-tune-stable-diffusion-how-we-made-the-text-to-pokemon-model-at-lambda/ The model, code, and dataset are all available here: Lambda Diffusers Captioned Pokémon dataset Model weights in Diffusers format Hosted demo: https://replicate.com/lambdal/text-to-pokemon Original model weights Training code You can start with the github which contains the code: https://github.com/justinpinkney/stable-diffusion And then follow this post on how to run it: https://lambdalabs.com/blog/how-to-fine-tune-stable-diffusion-how-we-made-the-text-to-pokemon-model-at-lambda/ submitted by /u/sabalaba [link] [comments]  ( 89 min )
    [D] Journal recommendation
    I have a paper that has been rejected a few times now. When we wrote it a year ago I believe it was really cutting edge as it showed a solution to a problem that no one managed to get right. In the mean time new papers came out and the results (although still good) are not state of the art any more. We updated the references in a few revisions over the past year so that new work is included. We are looking for a journal in q3 or q4 that would have a short review time so that we don’t have to wait even longer to get published. The paper deals in digital forensics, specifically applying machine learning to file fragment classification. Can anyone recommend a journal? Not looking for top tier submitted by /u/sillyscienceguy [link] [comments]  ( 89 min )
    [R]- Any open-source project for voice conversations?
    Hi Everyone, Is there any open source project which can allow a user to converse with the bot as with any other human being? submitted by /u/black_loop [link] [comments]  ( 88 min )
    [Project] Pubmedflow: One stop ML tool to simplify
    For many NLP tasks involved in medical domain, like data collection, training unsupervised models etc, PubMed articles are a great resource. With Pubmedflow, a researcher's effort to do all such tasks, will get simplified into few lines of python. Current tasks supported: Unsupervised Model training Question answering on the downloaded text Summarise each of them Perform entity extraction on each of them Github URL: https://github.com/nfflow/pubmedflow submitted by /u/metalvendetta [link] [comments]  ( 88 min )
    [D] Can T4 be faster than P100?
    Can T4 be faster than P100? On Colab Pro+, I usually get a P100. Today I got T4. To my amazement, it turned out ~25% faster with OpenAI Jukebox. Is that possible, or are some other factors likely at play? submitted by /u/vzakharov [link] [comments]  ( 88 min )
    [D] DALL·E Now Available Without Waitlist
    https://openai.com/blog/dall-e-now-available-without-waitlist/ It appears to work as advertised, not any special workflow. (as a bonus, it does work with organizations too, with credits shared) submitted by /u/minimaxir [link] [comments]  ( 93 min )
    [D] 7/4/4/4/2 Neurips reviews and desk reject from AAAI what to do?
    As the title states, I have a paper that has now been rejected from both Neurips and AAAI and I am uncertain of where to go from here. I have considered submitting to Image and Vision Computing journal due to its fast turnaround time and being a top 20 journals, but maybe there is a conference I am missing. In general reviewers, praised the vast experimentation (used 3 datasets) and the paper is well organized (in the strengths in weakness section). However, in the evaluation criteria section the ranked clarity as Fair for the phase 1 reviews at AAAI (a tad inconsistent I would say). Critiques stem around not being novel enough (but some of the comments seem to miss important distinctions that motivate the reviews to says its a trivial extension. Any advice on where to go from here would be appreciated. submitted by /u/AbjectDrink3276 [link] [comments]  ( 91 min )
    [D] Not able to understand the inequality in ERM
    I am reading this blog and I am not able to understand the following inequality. ​ https://preview.redd.it/xp8yotyuyjq91.png?width=742&format=png&auto=webp&s=9da6eda1a4903f97a1db0dc48098a5ff6d75e489 Like here first we are making a set of all the hypothesis where we have some true error (> epsilon). Then from that we made a separate set, M which has empirical error 0. Now how come S is a subset of M ? And how we are getting this inequality. submitted by /u/Adventurous-Ad742 [link] [comments]  ( 121 min )
    [D] Any Python Code available for Visualizing Named-Entities and Relations?
    I'm looking to visualize named-entities and relations over a very large set of documents and saving this in some external document (like a PDF or HTML file). Specifically, given that I have the document locations corresponding to the entities and relations, I am looking for something to label every entity with a specific color corresponding to its entity type and colored arrows corresponding to relation type between related entities. Is there any python code available to do this? I tried generating a PDF using PyFPDF but it won't easily let me color individual words. For now, I am giving up on PyFPDF and trying to use python to generate HTML and CSS. UPDATE: Turns out I can use PyFPDF to color individual words (using the write function rather than the cell/multicell functions). However, still trying to figure out how to draw arrows between words. Any suggestions for libraries that can do this easily would be appreciated! submitted by /u/newperson77777777 [link] [comments]  ( 91 min )
    [P] Prompt-based personalized virtual fashion try-on
    Hi everyone, I made a prompt-based virtual fashion try-on app, using an off-the-shelf virtual try-on model. The results are not half bad! Any ideas on how I can improve the generation of the cloth? Any feedback is much appreciated! [Tweet] [GitHub] submitted by /u/o_v_shake [link] [comments]  ( 88 min )
    [D] How often do matrix factorization with new data?
    I am reviewing how recommendation systems get built and one approach for collaborative recommendations is to have a N x M (user x movie) matrix, where each element represents if the i_th user liked j_th movie. To reduce the sparsity of this and to identity 'features', it is recommended to perform matrix factorization to get (N x m) and (m x M) matrices, where m are a # of latent features. If a new user is added or a new movie is added that would create a new row and column. My question is, do such a system get used in practice and practically speaking does this matrix refactorization need to happen at a regular cadence (e.g. every week offline) as new data becomes available? Thx. EDIT: Also wanted to pose one of my follow-up questions from below discussion. What's the advantage / disadvantage of using matrix factorization or this "grid" approach, versus feeding the various user features and movie features into a Neural Network, with a [user [dot] movie - label] cost function, so that the "per user" and "per movie" embeddings get trained in the hidden layers? Presumably this approach is more computationally intensive, but does this NN approach have advantages over the straight up matrix factorization approach? submitted by /u/Phoeniyx [link] [comments]  ( 93 min )
    [D] Bad GPA Mechanical Background, want to pursue PhD in ML
    I am highly interested in pursuing PhD in ML. Undergraduate in mechanical engineering from an old IIT Bad GPA (7.7/10) Various research internships Research Assistantship at an Ivy league ML Lab 1 NeurIPS paper, 1 good journal paper, 1 paper through MLRC My research interests are deep learning and compute vision. I wanted to know my chances of getting accepted in a PhD program (anywhere in world). Also, suggestions on which universities and Labs I should target. Any help would be appreciated!! submitted by /u/asdas1505 [link] [comments]  ( 88 min )
  • Open

    Can AI detect what is a rhetorical question and what is not?
    submitted by /u/ConsiderationFunny [link] [comments]  ( 90 min )
    What is the best Text-to-Speech option?
    I have tons of use cases for text-to-speech in my hands, like creating audio versions of reports and books in different languages. Is there any groundbreaking text-to-speech model (like GPT-3 is to t2t)? What is the best one in your opinion? submitted by /u/vowtz_ [link] [comments]  ( 87 min )
    OpenAI gravity and repulsion simulation
    https://www.youtube.com/watch?v=OkJT9f2IB40 ​ I got OpenAI to write me a gravity and repulsion simulation. I think it's safe to say that AI is good at programming. Here is the code (for a language called processing): Particle[] particles; void setup() { fullScreen(); particles = new Particle[500]; for (int i = 0; i 0) { float strength = (G * mass * p.mass) / (distance * distance); PVector dir = PVector.sub(p.position, position); //dir.normalize(); if (distance width) { position.x = width; velocity.x *= -1; } else if (position.x height) { velocity.y *= -1; position.y = height; } else if (position.y < 0) { velocity.y *= -1; position.y = 0; } } } submitted by /u/fmurph22 [link] [comments]  ( 88 min )
    Microsoft researcher describes two new deepfake methods and their risks
    submitted by /u/Number_5_alive [link] [comments]  ( 94 min )
    Cybernetic
    submitted by /u/widgia [link] [comments]  ( 86 min )
    "Prompt Explorer" - a GPT-3 powered google sheet that lets you explore the "narrative neighbourhood" of any prompt
    submitted by /u/walt74 [link] [comments]  ( 87 min )
    AI assistant that summarizes reviews for restaurants/businesses
    submitted by /u/SudoSharma [link] [comments]  ( 88 min )
    EU draft rules to make it easier to sue drone makers, AI systems
    submitted by /u/vernes1978 [link] [comments]  ( 87 min )
    If AI is essentially artificial perception, then why isn't human perception ultimate?
    If AI is essentially artificial perception, then why isn't human perception ultimate? submitted by /u/mavavilj [link] [comments]  ( 87 min )
    Data-Centric AI Vs. Model-Centric AI - Everything You Need Know
    Data-centric approach of building AI models is about focusing as diligently on the data as AI engineers usually do on the models and algorithms. Read more about it here - https://www.artiba.org/blog/data-centric-ai-vs-model-centric-ai-everything-you-need-know submitted by /u/Emily-joe [link] [comments]  ( 87 min )
    [Repost] Research on EARLY RISK PREDICTION ON THE INTERNET
    ​ Help us!! We are a team of academic researchers interested in psychology and natural language use. We are currently interested in gathering some data from people with no psychological disorders. More information: https://erisk.irlab.org/ We would greatly appreciate it if you could fill out the questionnaire attached. It takes 2 minutes :) It is a standard inventory of questions used by psychologists. Note that the questionnaire contains a field in which the respondent has to provide his/her Reddit username. This would help us to link word use (as extracted from your Reddit's public submissions) with your responses to the questionnaire. Of course, we will treat the information you provide with the utmost confidentiality and privacy. All information we will extract from Reddit will be anonymized. Link to the questionnaire: https://forms.gle/PkWyB64aAu6BQTqi6 Best regards David E. Losada, Univ. Santiago de Compostela, Spain ([david.losada@usc.es](mailto:david.losada@usc.es)) Fabio Crestani, Univ. della Svizzera Italiana, Switzerland ([fabio.crestani@usi.ch](mailto:fabio.crestani@usi.ch)) Javier Parapar, Univ. A Coruña, Spain ([javierparapar@udc.es](mailto:javierparapar@udc.es)) Patricia Martin-Rodilla, Univ. A Coruña, Spain ([patricia.martin.rodilla@udc.es](mailto:patricia.martin.rodilla@udc.es) ) submitted by /u/pamroda [link] [comments]  ( 88 min )
    Do you think AI Writer can help you write better content
    Have you ever wondered why it takes writers so long to write a book? Or why it's so hard for them to produce content? Well, the answer is simple: AI writer can help you do your job faster and better than ever before. AI Writers also provide feedback and suggestions based on its analysis of what works best for each individual client's needs so they can improve their writing skills even further without having much effort put into it themselves! submitted by /u/ai-writer1 [link] [comments]  ( 87 min )
    AI Dream 70 - The Most Amazing AI Galaxy Nebula
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    [OC] Check out my site HaikNews.com - generates haikus on demand from the daily news headlines
    and they actually make sense because it only uses consecutive words within the same sentence. That said, the syllable count is sometimes slightly off and obviously not all of them have some deep or specific meaning. They're haikus after all. Example from today: The Dryes Earn Praise converted pistols and shirt asteroid in first which was generated from the following news headlines (which it also shows each time): WATCH: The Dryes Earn Praise & A Spot On Team Blake Shelton On 'The Voice' With Cover Of Iconic Country Duet - Music Mayhem Magazine (retrieved 2022-09-27 01:15:03) 17 dead, 24 wounded in Russia school shooting by gunman with converted pistols and a shirt with "Nazi symbols" - CBS News (2022-09-27 01:15:05) NASA's DART spacecraft hits target asteroid in first planetary defense test - Reuters (2022-09-27 01:15:03) Obviously this isn't _really_ a use of AI, just a fun little site I made that I thought others might enjoy playing around with. submitted by /u/throwaway17880 [link] [comments]  ( 88 min )
    Why does AI struggle to make straight lines or non-wiggly lines?
    For context, I've only so far used Dalle 2 and RealESRGAN for my basis. Aside from that, Dalle seems to struggle (a bit) with image clarity and straight lines (non-wiggly lines). As I told it to depict a teddy bear in Tokyo, the city seemed unrecognizable upon scrutiny. On the same note, I used RealESRGAN to upscale some images, but again, under the scope of scrutiny, I could see some lines trying to deviate from the original image's path. Why is this? submitted by /u/typcalthowawayacount [link] [comments]  ( 92 min )
    Can't Dalle 2 based images on characters from franchises?
    I've told it to generate about Patrick from Spongebob SquarePants and Eula from Genshin Impact. All the results are not the respective characters from their franchises. submitted by /u/typcalthowawayacount [link] [comments]  ( 87 min )
    How to make Talking AI Faces for Stable Diffusion Midjourney Dall-E Or a...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
  • Open

    An interactive map for an RL wiki up to DQN. A bit like a skill tree to visualise progress. Succinct explanations of each concept on each node. What do you think?
    submitted by /u/Quackerooney [link] [comments]  ( 88 min )
    Agent learns to strafe-jump in Quake 3.
    submitted by /u/tendaikon [link] [comments]  ( 105 min )
    Hello all, We recently beta tested our platform for evaluating the robustness of AI models against adversarial attacks and natural noises, called GuardAI. Based on the feedback we collected during the first test phase, we updated the platform and added new features.
    Thank you to everyone who participated!) Some of the added features are: support for dataset poisoning detection for classification models (Spectral Signature Detection) support for several defenses (Gaussian Noise, Gaussian Augmentation, Reverse Sigmoid) support for the Kitti dataset format attacks and visualization for depth perception tasks webhook functionality to enable easy workflow automation performance improvement and more. If you haven't tested it so far, you can make an account and test out the updated version. Your feedback is really appreciated. You can sign up here https://www.navinfo.eu/services/cybersecurity/guardai/ and leave your feedback directly through the platform. Thank you! GuardAI We harness the power of AI and Cybersecurity to develop more secure and robust solutions. submitted by /u/GuardAITeam [link] [comments]  ( 88 min )
    Can anyone please explain model-free and model-based reinforcement learning with a good example?
    I am getting confused many times on this topic. If there is an example solved by both methods then it would help me to understand it very well. submitted by /u/Massive_Cup_4458 [link] [comments]  ( 92 min )
  • Open

    But what is a Gaussian process? (An intuition for dummies)
    Several machine learning models, as neural networks, are very popular in the data science community, due to its scalability and capacity… Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 11 min )
  • Open

    Build an AI-powered virtual agent for Genesys Cloud using QnABot and Amazon Lex
    The rise of artificial intelligence technologies enables organizations to adopt and improve self-service capabilities in contact center operations to create a more proactive, timely, and effective customer experience. Voice bots, or conversational interactive voice response systems (IVR), use natural language processing (NLP) to understand customers’ questions and provide relevant answers. Businesses can automate responses to […]  ( 8 min )
    Set up enterprise-level cost allocation for ML environments and workloads using resource tagging in Amazon SageMaker
    As businesses and IT leaders look to accelerate the adoption of machine learning (ML), there is a growing need to understand spend and cost allocation for your ML environment to meet enterprise requirements. Without proper cost management and governance, your ML spend may lead to surprises in your monthly AWS bill. Amazon SageMaker is a […]  ( 10 min )
  • Open

    DALL·E Now Available Without Waitlist
    New users can start creating straight away. Lessons learned from deployment and improvements to our safety systems make wider availability possible. Sign up Starting today, we are removing the waitlist for the DALL·E beta so users can sign up and start using it immediately. More than 1.5M  ( 3 min )
  • Open

    The original Room square
    A few days ago I wrote about Room squares, squares named after Thomas Room. This post will be about Room’s original square. You could think of a Room square as a tournament design in which the rows represent locations and the columns represent rounds (or vice versa). Every team plays every other team exactly once, […] The original Room square first appeared on John D. Cook.  ( 5 min )
  • Open

    Video Virtuoso Sabour Amirazodi Shares AI-Powered Editing Tips This Week ‘In the NVIDIA Studio’
    NVIDIA artist Sabour Amirazodi demonstrates his video editing workflows featuring AI this week in a special edition of In the NVIDIA Studio. The post Video Virtuoso Sabour Amirazodi Shares AI-Powered Editing Tips This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 7 min )
  • Open

    7 convincing reasons why your coworking needs a CRM system
    Source: Unsplash If you’re a coworking space owner and using CRM system, you know that keeping track of your members can be daunting. From juggling monthly membership payments to track who’s been using the printers lately, it’s easy for things to slip through the cracks. That’s where a CRM system comes in handy! Here are… Read More »7 convincing reasons why your coworking needs a CRM system The post 7 convincing reasons why your coworking needs a CRM system appeared first on Data Science Central.  ( 20 min )
    Data Erasure: How to Remove your Information from the Internet?
    Today, data has evolved into one of the most crucial resources in the world. Unlike tangible resources like wood and fuel, the same data set can be used repeatedly and for different applications. Tons of user information gets observed or generated by tech and algorithms to facilitate the personalization we see today. The post Data Erasure: How to Remove your Information from the Internet? appeared first on Data Science Central.  ( 22 min )
  • Open

    Fair Machine Learning Under Partial Compliance. (arXiv:2011.03654v4 [cs.CY] UPDATED)
    Typically, fair machine learning research focuses on a single decisionmaker and assumes that the underlying population is stationary. However, many of the critical domains motivating this work are characterized by competitive marketplaces with many decisionmakers. Realistically, we might expect only a subset of them to adopt any non-compulsory fairness-conscious policy, a situation that political philosophers call partial compliance. This possibility raises important questions: how does the strategic behavior of decision subjects in partial compliance settings affect the allocation outcomes? If k% of employers were to voluntarily adopt a fairness-promoting intervention, should we expect k% progress (in aggregate) towards the benefits of universal adoption, or will the dynamics of partial compliance wash out the hoped-for benefits? How might adopting a global (versus local) perspective impact the conclusions of an auditor? In this paper, we propose a simple model of an employment market, leveraging simulation as a tool to explore the impact of both interaction effects and incentive effects on outcomes and auditing metrics. Our key findings are that at equilibrium: (1) partial compliance (k% of employers) can result in far less than proportional (k%) progress towards the full compliance outcomes; (2) the gap is more severe when fair employers match global (vs local) statistics; (3) choices of local vs global statistics can paint dramatically different pictures of the performance vis-a-vis fairness desiderata of compliant versus non-compliant employers; and (4) partial compliance to local parity measures can induce extreme segregation.  ( 3 min )
    An Overview and Prospective Outlook on Robust Training and Certification of Machine Learning Models. (arXiv:2208.07464v2 [cs.LG] UPDATED)
    In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular and state-of-the-art techniques for training robust machine learning models as well as methods for provably certifying such robustness. From this unification of robust machine learning, we identify and discuss pressing directions for future research in the area.  ( 2 min )
    Learning with Subset Stacking. (arXiv:2112.06251v2 [cs.LG] UPDATED)
    We propose a new regression algorithm that learns from a set of input-output pairs. Our algorithm is designed for populations where the relation between the input variables and the output variable exhibits a heterogeneous behavior across the predictor space. The algorithm starts with generating subsets that are concentrated around random points in the input space. This is followed by training a local predictor for each subset. Those predictors are then combined in a novel way to yield an overall predictor. We call this algorithm ``LEarning with Subset Stacking'' or LESS, due to its resemblance to the method of stacking regressors. We compare the testing performance of LESS with state-of-the-art methods on several datasets. Our comparison shows that LESS is a competitive supervised learning method. Moreover, we observe that LESS is also efficient in terms of computation time and it allows a straightforward parallel implementation.  ( 2 min )
    Defining and Characterizing Reward Hacking. (arXiv:2209.13085v1 [cs.LG])
    We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$. We say that a proxy is unhackable if increasing the expected proxy return can never decrease the expected true return. Intuitively, it might be possible to create an unhackable proxy by leaving some terms out of the reward function (making it "narrower") or overlooking fine-grained distinctions between roughly equivalent outcomes, but we show this is usually not the case. A key insight is that the linearity of reward (in state-action visit counts) makes unhackability a very strong condition. In particular, for the set of all stochastic policies, two reward functions can only be unhackable if one of them is constant. We thus turn our attention to deterministic policies and finite sets of stochastic policies, where non-trivial unhackable pairs always exist, and establish necessary and sufficient conditions for the existence of simplifications, an important special case of unhackability. Our results reveal a tension between using reward functions to specify narrow tasks and aligning AI systems with human values.  ( 2 min )
    Active Linear Regression for $\ell_p$ Norms and Beyond. (arXiv:2111.04888v4 [cs.LG] UPDATED)
    We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector $b\in\mathbb R^n$ and output a near minimizer to $\min_{x\in\mathbb R^d} \|Ax-b\|$, for a design matrix $A\in\mathbb R^{n \times d}$ and loss $\|\cdot\|$. For $p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling outputting a $(1+\epsilon)$-approximate solution using just $\tilde O(d/\epsilon^2)$ queries to $b$ for $p\in(0,1)$, $\tilde{O}(d/\epsilon)$ queries for $1<p<2$, and $\tilde{O}(d^{p/2}/\epsilon^p)$ queries for $2<p<\infty$. For $0<p<2$, our bounds are optimal up to log factors, settling the query complexity for this range. For $2<p<\infty$, our dependence on $d$ is optimal, while our dependence on $\epsilon$ is off by at most $\epsilon$, up to log factors. Our result resolves an open question of [CD21], who gave near optimal bounds for the $1$ norm, but required $d^2/\epsilon^2$ samples for $\ell_p$ regression with $1<p<2$, and gave no bounds for $2<p<\infty$ or $0<p<1$. We also give the first total sensitivity bound of $O(d^{\max\{1,p/2\}}\log^2n)$ for loss functions of degree $p$ polynomial growth, improving a result of [TMF20]. By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21]. For the Huber loss, we further improve our bound to $\tilde O(d^{4-2\sqrt2}/\mathrm{poly}(\epsilon))$ samples. Our sensitivity bounds also have many applications, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed $p$-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $p$ norm.  ( 3 min )
    Exploring Low Rank Training of Deep Neural Networks. (arXiv:2209.13569v1 [cs.LG])
    Training deep neural networks in low rank, i.e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time. Prior work has focused on low rank approximations of pre-trained networks and training in low rank space with additional objectives, offering various ad hoc explanations for chosen practice. We analyse techniques that work well in practice, and through extensive ablations on models such as GPT2 we provide evidence falsifying common beliefs in the field, hinting in the process at exciting research opportunities that still need answering.  ( 2 min )
    A Novel Sequential Coreset Method for Gradient Descent Algorithms. (arXiv:2112.02504v2 [cs.LG] UPDATED)
    A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. {\em Coreset} is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the ''locality'' property of gradient descent algorithms, we propose a new framework, termed ''sequential coreset'', which effectively avoids these obstacles. Moreover, our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension. In practice, the experimental results suggest that our method can save a large amount of running time compared with the baseline algorithms.  ( 2 min )
    Project and Forget: Solving Large-Scale Metric Constrained Problems. (arXiv:2005.03853v2 [cs.LG] UPDATED)
    Given a set of dissimilarity measurements amongst data points, determining what metric representation is most "consistent" with the input measurements or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. Existing methods are restricted to specific kinds of metrics or small problem sizes because of the large number of metric constraints in such problems. In this paper, we provide an active set algorithm, Project and Forget, that uses Bregman projections, to solve metric constrained problems with many (possibly exponentially) inequality constraints. We provide a theoretical analysis of \textsc{Project and Forget} and prove that our algorithm converges to the global optimal solution and that the $L_2$ distance of the current iterate to the optimal solution decays asymptotically at an exponential rate. We demonstrate that using our method we can solve large problem instances of three types of metric constrained problems: general weight correlation clustering, metric nearness, and metric learning; in each case, out-performing the state of the art methods with respect to CPU times and problem sizes.  ( 2 min )
    Controlling mean exit time of stochastic dynamical systems based on quasipotential and machine learning. (arXiv:2209.13098v1 [stat.ML])
    The mean exit time escaping basin of attraction in the presence of white noise is of practical importance in various scientific fields. In this work, we propose a strategy to control mean exit time of general stochastic dynamical systems to achieve a desired value based on the quasipotential concept and machine learning. Specifically, we develop a neural network architecture to compute the global quasipotential function. Then we design a systematic iterated numerical algorithm to calculate the controller for a given mean exit time. Moreover, we identify the most probable path between metastable attractors with help of the effective Hamilton-Jacobi scheme and the trained neural network. Numerical experiments demonstrate that our control strategy is effective and sufficiently accurate.
    Leveraging Local Variation in Data: Sampling and Weighting Schemes for Supervised Deep Learning. (arXiv:2101.07561v3 [stat.ML] UPDATED)
    In the context of supervised learning of a function by a neural network, we claim and empirically verify that the neural network yields better results when the distribution of the data set focuses on regions where the function to learn is steep. We first traduce this assumption in a mathematically workable way using Taylor expansion and emphasize a new training distribution based on the derivatives of the function to learn. Then, theoretical derivations allow constructing a methodology that we call Variance Based Samples Weighting (VBSW). VBSW uses labels local variance to weight the training points. This methodology is general, scalable, cost-effective, and significantly increases the performances of a large class of neural networks for various classification and regression tasks on image, text, and multivariate data. We highlight its benefits with experiments involving neural networks from linear models to ResNet and Bert.  ( 2 min )
    Scaling Laws For Deep Learning Based Image Reconstruction. (arXiv:2209.13435v1 [eess.IV])
    Deep neural networks trained end-to-end to map a measurement of a (noisy) image to a clean image perform excellent for a variety of linear inverse problems. Current methods are only trained on a few hundreds or thousands of images as opposed to the millions of examples deep networks are trained on in other domains. In this work, we study whether major performance gains are expected from scaling up the training set size. We consider image denoising, accelerated magnetic resonance imaging, and super-resolution and empirically determine the reconstruction quality as a function of training set size, while optimally scaling the network size. For all three tasks we find that an initially steep power-law scaling slows significantly already at moderate training set sizes. Interpolating those scaling laws suggests that even training on millions of images would not significantly improve performance. To understand the expected behavior, we analytically characterize the performance of a linear estimator learned with early stopped gradient descent. The result formalizes the intuition that once the error induced by learning the signal model is small relative to the error floor, more training examples do not improve performance.  ( 2 min )
    Optimization of Annealed Importance Sampling Hyperparameters. (arXiv:2209.13226v1 [stat.ML])
    Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. Although AIS is guaranteed to provide unbiased estimate for any set of hyperparameters, the common implementations rely on simple heuristics such as the geometric average bridging distributions between initial and the target distribution which affect the estimation performance when the computation budget is limited. Optimization of fully parametric AIS remains challenging due to the use of Metropolis-Hasting (MH) correction steps in Markov transitions. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. A reparameterization method that allows us to optimize the distribution sequence and the parameters of Markov transitions is used which is applicable to a large class of Markov Kernels with MH correction. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.  ( 2 min )
    Sparse Bayesian Learning for Complex-Valued Rational Approximations. (arXiv:2206.02523v2 [stat.ML] UPDATED)
    Surrogate models are used to alleviate the computational burden in engineering tasks, which require the repeated evaluation of computationally demanding models of physical systems, such as the efficient propagation of uncertainties. For models that show a strongly non-linear dependence on their input parameters, standard surrogate techniques, such as polynomial chaos expansion, are not sufficient to obtain an accurate representation of the original model response. Through applying a rational approximation instead, the approximation error can be efficiently reduced for models whose non-linearity is accurately described through a rational function. Specifically, our aim is to approximate complex-valued models. A common approach to obtain the coefficients in the surrogate is to minimize the sample-based error between model and surrogate in the least-square sense. In order to obtain an accurate representation of the original model and to avoid overfitting, the sample set has be two to three times the number of polynomial terms in the expansion. For models that require a high polynomial degree or are high-dimensional in terms of their input parameters, this number often exceeds the affordable computational cost. To overcome this issue, we apply a sparse Bayesian learning approach to the rational approximation. Through a specific prior distribution structure, sparsity is induced in the coefficients of the surrogate model. The denominator polynomial coefficients as well as the hyperparameters of the problem are determined through a type-II-maximum likelihood approach. We apply a quasi-Newton gradient-descent algorithm in order to find the optimal denominator coefficients and derive the required gradients through application of $\mathbb{CR}$-calculus.  ( 3 min )
    On Sharp Stochastic Zeroth Order Hessian Estimators over Riemannian Manifolds. (arXiv:2201.10780v3 [stat.ML] UPDATED)
    We study Hessian estimators for functions defined over an $n$-dimensional complete analytic Riemannian manifold. We introduce new stochastic zeroth-order Hessian estimators using $O (1)$ function evaluations. We show that, for an analytic real-valued function $f$, our estimator achieves a bias bound of order $ O \left( \gamma \delta^2 \right) $, where $ \gamma $ depends on both the Levi-Civita connection and function $f$, and $\delta$ is the finite difference step size. To the best of our knowledge, our results provide the first bias bound for Hessian estimators that explicitly depends on the geometry of the underlying Riemannian manifold. We also study downstream computations based on our Hessian estimators. The supremacy of our method is evidenced by empirical evaluations.
    MolGAN: An implicit generative model for small molecular graphs. (arXiv:1805.11973v2 [stat.ML] UPDATED)
    Deep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs, it is possible to side-step expensive search procedures in the discrete and vast space of chemical structures. We introduce MolGAN, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuristics of previous likelihood-based methods. Our method adapts generative adversarial networks (GANs) to operate directly on graph-structured data. We combine our approach with a reinforcement learning objective to encourage the generation of molecules with specific desired chemical properties. In experiments on the QM9 chemical database, we demonstrate that our model is capable of generating close to 100% valid compounds. MolGAN compares favorably both to recent proposals that use string-based (SMILES) representations of molecules and to a likelihood-based method that directly generates graphs, albeit being susceptible to mode collapse. Code at https://github.com/nicola-decao/MolGAN
    Data-driven Efficient Solvers for Langevin Dynamics on Manifold in High Dimensions. (arXiv:2005.12787v3 [math.NA] UPDATED)
    We study the Langevin dynamics of a physical system with manifold structure $\mathcal{M}\subset\mathbb{R}^p$ based on collected sample points $\{\mathsf{x}_i\}_{i=1}^n \subset \mathcal{M}$ that probe the unknown manifold $\mathcal{M}$. Through the diffusion map, we first learn the reaction coordinates $\{\mathsf{y}_i\}_{i=1}^n\subset \mathcal{N}$ corresponding to $\{\mathsf{x}_i\}_{i=1}^n$, where $\mathcal{N}$ is a manifold diffeomorphic to $\mathcal{M}$ and isometrically embedded in $\mathbb{R}^\ell$ with $\ell \ll p$. The induced Langevin dynamics on $\mathcal{N}$ in terms of the reaction coordinates captures the slow time scale dynamics such as conformational changes in biochemical reactions. To construct an efficient and stable approximation for the Langevin dynamics on $\mathcal{N}$, we leverage the corresponding Fokker-Planck equation on the manifold $\mathcal{N}$ in terms of the reaction coordinates $\mathsf{y}$. We propose an implementable, unconditionally stable, data-driven finite volume scheme for this Fokker-Planck equation, which automatically incorporates the manifold structure of $\mathcal{N}$. Furthermore, we provide a weighted $L^2$ convergence analysis of the finite volume scheme to the Fokker-Planck equation on $\mathcal{N}$. The proposed finite volume scheme leads to a Markov chain on $\{\mathsf{y}_i\}_{i=1}^n$ with an approximated transition probability and jump rate between the nearest neighbor points. After an unconditionally stable explicit time discretization, the data-driven finite volume scheme gives an approximated Markov process for the Langevin dynamics on $\mathcal{N}$ and the approximated Markov process enjoys detailed balance, ergodicity, and other good properties.
    Superiority of GNN over NN in generalizing bandlimited functions. (arXiv:2206.05904v2 [cs.LG] UPDATED)
    We constructively show, via rigorous mathematical arguments, that GNN architectures outperform those of NN in approximating bandlimited functions on compact $d$-dimensional Euclidean grids. We show that the former only need $\mathcal{M}$ sampled functional values in order to achieve a uniform approximation error of $O_{d}(2^{-\mathcal{M}^{1/d}})$ and that this error rate is optimal, in the sense that, NNs might achieve worse.
    Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions. (arXiv:2109.06911v2 [stat.ML] UPDATED)
    We study the problem of designing optimal learning and decision-making formulations when only historical data is available. Prior work typically commits to a particular class of data-driven formulation and subsequently tries to establish out-of-sample performance guarantees. We take here the opposite approach. We define first a sensible yard stick with which to measure the quality of any data-driven formulation and subsequently seek to find an optimal such formulation. Informally, any data-driven formulation can be seen to balance a measure of proximity of the estimated cost to the actual cost while guaranteeing a level of out-of-sample performance. Given an acceptable level of out-of-sample performance, we construct explicitly a data-driven formulation that is uniformly closer to the true cost than any other formulation enjoying the same out-of-sample performance. We show the existence of three distinct out-of-sample performance regimes (a superexponential regime, an exponential regime and a subexponential regime) between which the nature of the optimal data-driven formulation experiences a phase transition. The optimal data-driven formulations can be interpreted as a classically robust formulation in the superexponential regime, an entropic distributionally robust formulation in the exponential regime and finally a variance penalized formulation in the subexponential regime. This final observation unveils a surprising connection between these three, at first glance seemingly unrelated, data-driven formulations which until now remained hidden.
    Making Sense of Dependence: Efficient Black-box Explanations Using Dependence Measure. (arXiv:2206.06219v3 [cs.CV] UPDATED)
    This paper presents a new efficient black-box attribution method based on Hilbert-Schmidt Independence Criterion (HSIC), a dependence measure based on Reproducing Kernel Hilbert Spaces (RKHS). HSIC measures the dependence between regions of an input image and the output of a model based on kernel embeddings of distributions. It thus provides explanations enriched by RKHS representation capabilities. HSIC can be estimated very efficiently, significantly reducing the computational cost compared to other black-box attribution methods. Our experiments show that HSIC is up to 8 times faster than the previous best black-box attribution methods while being as faithful. Indeed, we improve or match the state-of-the-art of both black-box and white-box attribution methods for several fidelity metrics on Imagenet with various recent model architectures. Importantly, we show that these advances can be transposed to efficiently and faithfully explain object detection models such as YOLOv4. Finally, we extend the traditional attribution methods by proposing a new kernel enabling an ANOVA-like orthogonal decomposition of importance scores based on HSIC, allowing us to evaluate not only the importance of each image patch but also the importance of their pairwise interactions. Our implementation is available at https://github.com/paulnovello/HSIC-Attribution-Method.
    Off-policy estimation of linear functionals: Non-asymptotic theory for semi-parametric efficiency. (arXiv:2209.13075v1 [math.ST])
    The problem of estimating a linear functional based on observational data is canonical in both the causal inference and bandit literatures. We analyze a broad class of two-stage procedures that first estimate the treatment effect function, and then use this quantity to estimate the linear functional. We prove non-asymptotic upper bounds on the mean-squared error of such procedures: these bounds reveal that in order to obtain non-asymptotically optimal procedures, the error in estimating the treatment effect should be minimized in a certain weighted $L^2$-norm. We analyze a two-stage procedure based on constrained regression in this weighted norm, and establish its instance-dependent optimality in finite samples via matching non-asymptotic local minimax lower bounds. These results show that the optimal non-asymptotic risk, in addition to depending on the asymptotically efficient variance, depends on the weighted norm distance between the true outcome function and its approximation by the richest function class supported by the sample size.
    DBGSL: Dynamic Brain Graph Structure Learning. (arXiv:2209.13513v1 [cs.LG])
    Functional connectivity (FC) between regions of the brain is commonly estimated through statistical dependency measures applied to functional magnetic resonance imaging (fMRI) data. The resulting functional connectivity matrix (FCM) is often taken to represent the adjacency matrix of a brain graph. Recently, graph neural networks (GNNs) have been successfully applied to FCMs to learn brain graph representations. A common limitation of existing GNN approaches, however, is that they require the graph adjacency matrix to be known prior to model training. As such, it is implicitly assumed the ground-truth dependency structure of the data is known. Unfortunately, for fMRI this is not the case as the choice of which statistical measure best represents the dependency structure of the data is non-trivial. Also, most GNN applications to fMRI assume FC is static over time, which is at odds with neuroscientific evidence that functional brain networks are time-varying and dynamic. These compounded issues can have a detrimental effect on the capacity of GNNs to learn representations of brain graphs. As a solution, we propose Dynamic Brain Graph Structure Learning (DBGSL), a supervised method for learning the optimal time-varying dependency structure of fMRI data. Specifically, DBGSL learns a dynamic graph from fMRI timeseries via spatial-temporal attention applied to brain region embeddings. The resulting graph is then fed to a spatial-temporal GNN to learn a graph representation for classification. Experiments on large resting-state as well as task fMRI datasets for the task of gender classification demonstrate that DBGSL achieves state-of-the-art performance. Moreover, analysis of the learnt dynamic graphs highlights prediction-related brain regions which align with findings from existing neuroscience literature.
    The Curse of Unrolling: Rate of Differentiating Through Optimization. (arXiv:2209.13271v1 [math.OC])
    Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling. Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials.
    Group-Invariant Quantum Machine Learning. (arXiv:2205.02261v2 [quant-ph] UPDATED)
    Quantum Machine Learning (QML) models are aimed at learning from data encoded in quantum states. Recently, it has been shown that models with little to no inductive biases (i.e., with no assumptions about the problem embedded in the model) are likely to have trainability and generalization issues, especially for large problem sizes. As such, it is fundamental to develop schemes that encode as much information as available about the problem at hand. In this work we present a simple, yet powerful, framework where the underlying invariances in the data are used to build QML models that, by construction, respect those symmetries. These so-called group-invariant models produce outputs that remain invariant under the action of any element of the symmetry group $\mathfrak{G}$ associated to the dataset. We present theoretical results underpinning the design of $\mathfrak{G}$-invariant models, and exemplify their application through several paradigmatic QML classification tasks including cases when $\mathfrak{G}$ is a continuous Lie group and also when it is a discrete symmetry group. Notably, our framework allows us to recover, in an elegant way, several well known algorithms for the literature, as well as to discover new ones. Taken together, we expect that our results will help pave the way towards a more geometric and group-theoretic approach to QML model design.  ( 3 min )
    Efficient Non-Parametric Optimizer Search for Diverse Tasks. (arXiv:2209.13575v1 [cs.LG])
    Efficient and automated design of optimizers plays a crucial role in full-stack AutoML systems. However, prior methods in optimizer search are often limited by their scalability, generability, or sample efficiency. With the goal of democratizing research and application of optimizer search, we present the first efficient, scalable and generalizable framework that can directly search on the tasks of interest. We first observe that optimizer updates are fundamentally mathematical expressions applied to the gradient. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the space of optimizers into a super-tree, where each path encodes an optimizer. This way, optimizer search can be naturally formulated as a path-finding problem, allowing a variety of well-established tree traversal methods to be used as the search algorithm. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection that leverage the characteristics of optimizer update rules to further boost the sample efficiency. We provide a diverse set of tasks to benchmark our algorithm and demonstrate that, with only 128 evaluations, the proposed framework can discover optimizers that surpass both human-designed counterparts and prior optimizer search methods.
    Accelerating hypersonic reentry simulations using deep learning-based hybridization (with guarantees). (arXiv:2209.13434v1 [stat.ML])
    In this paper, we are interested in the acceleration of numerical simulations. We focus on a hypersonic planetary reentry problem whose simulation involves coupling fluid dynamics and chemical reactions. Simulating chemical reactions takes most of the computational time but, on the other hand, cannot be avoided to obtain accurate predictions. We face a trade-off between cost-efficiency and accuracy: the simulation code has to be sufficiently efficient to be used in an operational context but accurate enough to predict the phenomenon faithfully. To tackle this trade-off, we design a hybrid simulation code coupling a traditional fluid dynamic solver with a neural network approximating the chemical reactions. We rely on their power in terms of accuracy and dimension reduction when applied in a big data context and on their efficiency stemming from their matrix-vector structure to achieve important acceleration factors ($\times 10$ to $\times 18.6$). This paper aims to explain how we design such cost-effective hybrid simulation codes in practice. Above all, we describe methodologies to ensure accuracy guarantees, allowing us to go beyond traditional surrogate modeling and to use these codes as references.
    Question Answering by Reasoning Across Documents with Graph Convolutional Networks. (arXiv:1808.09920v4 [cs.CL] UPDATED)
    Most research in reading comprehension has focused on answering questions based on individual documents or even single paragraphs. We introduce a neural model which integrates and reasons relying on information spread within documents and across multiple documents. We frame it as an inference problem on a graph. Mentions of entities are nodes of this graph while edges encode relations between different mentions (e.g., within- and cross-document co-reference). Graph convolutional networks (GCNs) are applied to these graphs and trained to perform multi-step reasoning. Our Entity-GCN method is scalable and compact, and it achieves state-of-the-art results on a multi-document question answering dataset, WikiHop (Welbl et al., 2018).
    Hyperspherical Variational Auto-Encoders. (arXiv:1804.00891v3 [stat.ML] UPDATED)
    The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types. Code at this http URL and https://github.com/nicola-decao/s-vae-pytorch
    Why neural networks find simple solutions: the many regularizers of geometric complexity. (arXiv:2209.13083v1 [cs.LG])
    In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.
    Reinforcement Learning with Non-Exponential Discounting. (arXiv:2209.13413v1 [cs.LG])
    Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.
    SetGAN: Improving the stability and diversity of generative models through a permutation invariant architecture. (arXiv:1907.00109v3 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) have proven effective in modeling distributions of high-dimensional data. However, their training instability is a well-known hindrance to convergence, which results in practical challenges in their applications to novel data. Furthermore, even when convergence is reached, GANs can be affected by mode collapse, a phenomenon for which the generator learns to model only a small part of the target distribution, disregarding the vast majority of the data manifold or distribution. This paper addresses these challenges by introducing SetGAN, an adversarial architecture that processes sets of generated and real samples, and discriminates between the origins of these sets (i.e., training versus generated data) in a flexible, permutation invariant manner. We also propose a new metric to quantitatively evaluate GANs that does not require previous knowledge of the application, apart from the data itself. Using the new metric, in conjunction with the state-of-the-art evaluation methods, we show that the proposed architecture, when compared with GAN variants stemming from similar strategies, produces more accurate models of the input data in a way that is also less sensitive to hyperparameter settings.
    Graph-aware Modeling of Brain Connectivity Networks. (arXiv:1903.02129v4 [stat.AP] UPDATED)
    Functional connections in the brain are frequently represented by weighted networks, with nodes representing locations in the brain, and edges representing the strength of connectivity between these locations. One challenge in analyzing such data is that inference at the individual edge level is not particularly biologically meaningful; interpretation is more useful at the level of so-called functional regions, or groups of nodes and connections between them; this is often called "graph-aware" inference in the neuroimaging literature. However, pooling over functional regions leads to significant loss of information and lower accuracy. Another challenge is correlation among edge weights within a subject, which makes inference based on independence assumptions unreliable. We address both these challenges with a linear mixed effects model, which accounts for functional regions and for edge dependence, while still modeling individual edge weights to avoid loss of information. The model allows for comparing two populations, such as patients and healthy controls, both at the functional regions level and at individual edge level, leading to biologically meaningful interpretations. We fit this model to a resting state fMRI data on schizophrenics and healthy controls, obtaining interpretable results consistent with the schizophrenia literature.
    Hierarchical Sliced Wasserstein Distance. (arXiv:2209.13570v1 [stat.ML])
    Sliced Wasserstein (SW) distance has been widely used in different application scenarios since it can be scaled to a large number of supports without suffering from the curse of dimensionality. The value of sliced Wasserstein distance is the average of transportation cost between one-dimensional representations (projections) of original measures that are obtained by Radon Transform (RT). Despite its efficiency in the number of supports, estimating the sliced Wasserstein requires a relatively large number of projections in high-dimensional settings. Therefore, for applications where the number of supports is relatively small compared with the dimension, e.g., several deep learning applications where the mini-batch approaches are utilized, the complexities from matrix multiplication of Radon Transform become the main computational bottleneck. To address this issue, we propose to derive projections by linearly and randomly combining a smaller number of projections which are named bottleneck projections. We explain the usage of these projections by introducing Hierarchical Radon Transform (HRT) which is constructed by applying Radon Transform variants recursively. We then formulate the approach into a new metric between measures, named Hierarchical Sliced Wasserstein (HSW) distance. By proving the injectivity of HRT, we derive the metricity of HSW. Moreover, we investigate the theoretical properties of HSW including its connection to SW variants and its computational and sample complexities. Finally, we compare the computational cost and generative quality of HSW with the conventional SW on the task of deep generative modeling using various benchmark datasets including CIFAR10, CelebA, and Tiny ImageNet.
    On Kernel Regression with Data-Dependent Kernels. (arXiv:2209.01691v2 [cs.LG] UPDATED)
    The primary hyperparameter in kernel regression (KR) is the choice of kernel. In most theoretical studies of KR, one assumes the kernel is fixed before seeing the training data. Under this assumption, it is known that the optimal kernel is equal to the prior covariance of the target function. In this note, we consider KR in which the kernel may be updated after seeing the training data. We point out that an analogous choice of kernel using the posterior of the target function is optimal in this setting. Connections to the view of deep neural networks as data-dependent kernel learners are discussed.  ( 2 min )
    On the inability of Gaussian process regression to optimally learn compositional functions. (arXiv:2205.07764v2 [stat.ML] UPDATED)
    We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate by a factor that is polynomially suboptimal in the sample size $n$.  ( 2 min )
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v3 [cs.LG] UPDATED)
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). Such methods are usually implemented by having clients perform one or more epochs of local training per round while randomly reshuffling their finite dataset in each epoch. Data imbalance, where clients have different numbers of local training samples, is ubiquitous in FL applications, resulting in different clients performing different numbers of local updates in each round. In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in this regime encompassing random reshuffling and heterogeneity. FedShuffle is the first local update method with theoretical convergence guarantees that incorporates random reshuffling, data imbalance, and client sampling - features that are essential in large-scale cross-device FL. We present a comprehensive theoretical analysis of FedShuffle and show, both theoretically and empirically, that it does not suffer from the objective function mismatch that is present in FL methods that assume homogeneous updates in heterogeneous FL setups, such as FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. Similar to Mime (Karimireddy et al., 2020), we show that FedShuffle with momentum variance reduction (Cutkosky & Orabona, 2019) improves upon non-local methods under a Hessian similarity assumption.  ( 3 min )
    Graph clustering with Boltzmann machines. (arXiv:2203.02471v3 [cs.LG] UPDATED)
    Graph clustering is the process of grouping vertices into densely connected sets called clusters. We tailor two mathematical programming formulations from the literature, to this problem. In doing so, we obtain a heuristic approximation to the intra-cluster density maximization problem. We use two variations of a Boltzmann machine heuristic to obtain numerical solutions. For benchmarking purposes, we compare solution quality and computational performances to those obtained using a commercial solver, Gurobi. We also compare clustering quality to the clusters obtained using the popular Louvain modularity maximization method. Our initial results clearly demonstrate the superiority of our problem formulations. They also establish the superiority of the Boltzmann machine over the traditional exact solver. In the case of smaller less complex graphs, Boltzmann machines provide the same solutions as Gurobi, but with solution times that are orders of magnitude lower. In the case of larger and more complex graphs, Gurobi fails to return meaningful results within a reasonable time frame. Finally, we also note that both our clustering formulations, the distance minimization and $K$-medoids, yield clusters of superior quality to those obtained with the Louvain algorithm.  ( 3 min )
  • Open

    AdaFocusV3: On Unified Spatial-temporal Dynamic Video Recognition. (arXiv:2209.13465v1 [cs.CV])
    Recent research has revealed that reducing the temporal and spatial redundancy are both effective approaches towards efficient video recognition, e.g., allocating the majority of computation to a task-relevant subset of frames or the most valuable image regions of each frame. However, in most existing works, either type of redundancy is typically modeled with another absent. This paper explores the unified formulation of spatial-temporal dynamic computation on top of the recently proposed AdaFocusV2 algorithm, contributing to an improved AdaFocusV3 framework. Our method reduces the computational cost by activating the expensive high-capacity network only on some small but informative 3D video cubes. These cubes are cropped from the space formed by frame height, width, and video duration, while their locations are adaptively determined with a light-weighted policy network on a per-sample basis. At test time, the number of the cubes corresponding to each video is dynamically configured, i.e., video cubes are processed sequentially until a sufficiently reliable prediction is produced. Notably, AdaFocusV3 can be effectively trained by approximating the non-differentiable cropping operation with the interpolation of deep features. Extensive empirical results on six benchmark datasets (i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2 and Diving48) demonstrate that our model is considerably more efficient than competitive baselines.  ( 2 min )
    WeightedSHAP: analyzing and improving Shapley based feature attributions. (arXiv:2209.13429v1 [cs.LG])
    Shapley value is a popular approach for measuring the influence of individual features. While Shapley feature attribution is built upon desiderata from game theory, some of its constraints may be less natural in certain machine learning settings, leading to unintuitive model interpretation. In particular, the Shapley value uses the same weight for all marginal contributions -- i.e. it gives the same importance when a large number of other features are given versus when a small number of other features are given. This property can be problematic if larger feature sets are more or less informative than smaller feature sets. Our work performs a rigorous analysis of the potential limitations of Shapley feature attribution. We identify simple settings where the Shapley value is mathematically suboptimal by assigning larger attributions for less influential features. Motivated by this observation, we propose WeightedSHAP, which generalizes the Shapley value and learns which marginal contributions to focus directly from data. On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions compared to the features identified by the Shapley value.  ( 2 min )
    A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective. (arXiv:2209.13232v1 [cs.CV])
    Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (\emph{e.g.,} social network analysis and recommender systems), computer vision (\emph{e.g.,} object detection and point cloud learning), and natural language processing (\emph{e.g.,} relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, \emph{i.e.,} 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.  ( 3 min )
    Conditional Antibody Design as 3D Equivariant Graph Translation. (arXiv:2208.06073v2 [q-bio.BM] UPDATED)
    Antibody design is valuable for therapeutic usage and biological research. Existing deep-learning-based methods encounter several key issues: 1) incomplete context for Complementarity-Determining Regions (CDRs) generation; 2) incapable of capturing the entire 3D geometry of the input structure; 3) inefficient prediction of the CDR sequences in an autoregressive manner. In this paper, we propose Multi-channel Equivariant Attention Network (MEAN), an end-to-end model that is able to co-design 1D sequences and 3D structures of CDRs. To be specific, MEAN formulates antibody design as a conditional graph translation problem by importing extra components including the target antigen and the light chain of the antibody. Then, MEAN resorts to E(3)-equivariant message passing along with a proposed attention mechanism to better capture the geometrical correlation between different components. Finally, it outputs both the 1D sequences and 3D structure via a multi-round progressive full-shot scheme, which enjoys more efficiency against previous autoregressive approaches. Our method significantly surpasses state-of-the-art models in sequence and structure modeling, antigen-binding antibody design, and binding affinity optimization. Specifically, the relative improvement to baselines is about 23% in antigen-binding CDR design and 34% for affinity optimization.  ( 2 min )
    DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars. (arXiv:2209.12729v2 [cs.CV] UPDATED)
    We propose DeepFusion, a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection. Specialized feature extractors take advantage of each modality and can be exchanged easily, making the approach simple and flexible. Extracted features are transformed into bird's-eye-view as a common representation for fusion. Spatial and semantic alignment is performed prior to fusing modalities in the feature space. Finally, a detection head exploits rich multi-modal features for improved 3D detection performance. Experimental results for lidar-camera, lidar-camera-radar and camera-radar fusion show the flexibility and effectiveness of our fusion approach. In the process, we study the largely unexplored task of faraway car detection up to 225 meters, showing the benefits of our lidar-camera fusion. Furthermore, we investigate the required density of lidar points for 3D object detection and illustrate implications at the example of robustness against adverse weather conditions. Moreover, ablation studies on our camera-radar fusion highlight the importance of accurate depth estimation.  ( 2 min )
    Analyzing Dynamic Adversarial Training Data in the Limit. (arXiv:2110.08514v2 [cs.CL] UPDATED)
    To create models that are robust across a wide range of test inputs, training datasets should include diverse examples that span numerous phenomena. Dynamic adversarial data collection (DADC), where annotators craft examples that challenge continually improving models, holds promise as an approach for generating such diverse training sets. Prior work has shown that running DADC over 1-3 rounds can help models fix some error types, but it does not necessarily lead to better generalization beyond adversarial test data. We argue that running DADC over many rounds maximizes its training-time benefits, as the different rounds can together cover many of the task-relevant phenomena. We present the first study of longer-term DADC, where we collect 20 rounds of NLI examples for a small set of premise paragraphs, with both adversarial and non-adversarial approaches. Models trained on DADC examples make 26% fewer errors on our expert-curated test set compared to models trained on non-adversarial data. Our analysis shows that DADC yields examples that are more difficult, more lexically and syntactically diverse, and contain fewer annotation artifacts compared to non-adversarial examples.  ( 2 min )
    Safeguarded Learned Convex Optimization. (arXiv:2003.01880v3 [math.OC] UPDATED)
    Applications abound in which optimization problems must be repeatedly solved, each time with new (but similar) data. Analytic optimization algorithms can be hand-designed to provably solve these problems in an iterative fashion. On one hand, data-driven algorithms can "learn to optimize" (L2O) with much fewer iterations and similar cost per iteration as general-purpose optimization algorithms. On the other hand, unfortunately, many L2O algorithms lack converge guarantees. To fuse the advantages of these approaches, we present a Safe-L2O framework. Safe-L2O updates incorporate a safeguard to guarantee convergence for convex problems with proximal and/or gradient oracles. The safeguard is simple and computationally cheap to implement, and it is activated only when the data-driven L2O updates would perform poorly or appear to diverge. This yields the numerical benefits of employing machine learning to create rapid L2O algorithms while still guaranteeing convergence. Our numerical examples show convergence of Safe-L2O algorithms, even when the provided data is not from the distribution of training data.  ( 2 min )
    Active Linear Regression for $\ell_p$ Norms and Beyond. (arXiv:2111.04888v4 [cs.LG] UPDATED)
    We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector $b\in\mathbb R^n$ and output a near minimizer to $\min_{x\in\mathbb R^d} \|Ax-b\|$, for a design matrix $A\in\mathbb R^{n \times d}$ and loss $\|\cdot\|$. For $p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling outputting a $(1+\epsilon)$-approximate solution using just $\tilde O(d/\epsilon^2)$ queries to $b$ for $p\in(0,1)$, $\tilde{O}(d/\epsilon)$ queries for $1<p<2$, and $\tilde{O}(d^{p/2}/\epsilon^p)$ queries for $2<p<\infty$. For $0<p<2$, our bounds are optimal up to log factors, settling the query complexity for this range. For $2<p<\infty$, our dependence on $d$ is optimal, while our dependence on $\epsilon$ is off by at most $\epsilon$, up to log factors. Our result resolves an open question of [CD21], who gave near optimal bounds for the $1$ norm, but required $d^2/\epsilon^2$ samples for $\ell_p$ regression with $1<p<2$, and gave no bounds for $2<p<\infty$ or $0<p<1$. We also give the first total sensitivity bound of $O(d^{\max\{1,p/2\}}\log^2n)$ for loss functions of degree $p$ polynomial growth, improving a result of [TMF20]. By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21]. For the Huber loss, we further improve our bound to $\tilde O(d^{4-2\sqrt2}/\mathrm{poly}(\epsilon))$ samples. Our sensitivity bounds also have many applications, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed $p$-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $p$ norm.  ( 3 min )
    Learning and Decision-Making with Data: Optimal Formulations and Phase Transitions. (arXiv:2109.06911v2 [stat.ML] UPDATED)
    We study the problem of designing optimal learning and decision-making formulations when only historical data is available. Prior work typically commits to a particular class of data-driven formulation and subsequently tries to establish out-of-sample performance guarantees. We take here the opposite approach. We define first a sensible yard stick with which to measure the quality of any data-driven formulation and subsequently seek to find an optimal such formulation. Informally, any data-driven formulation can be seen to balance a measure of proximity of the estimated cost to the actual cost while guaranteeing a level of out-of-sample performance. Given an acceptable level of out-of-sample performance, we construct explicitly a data-driven formulation that is uniformly closer to the true cost than any other formulation enjoying the same out-of-sample performance. We show the existence of three distinct out-of-sample performance regimes (a superexponential regime, an exponential regime and a subexponential regime) between which the nature of the optimal data-driven formulation experiences a phase transition. The optimal data-driven formulations can be interpreted as a classically robust formulation in the superexponential regime, an entropic distributionally robust formulation in the exponential regime and finally a variance penalized formulation in the subexponential regime. This final observation unveils a surprising connection between these three, at first glance seemingly unrelated, data-driven formulations which until now remained hidden.  ( 3 min )
    Resource Allocation for Mobile Metaverse with the Internet of Vehicles over 6G Wireless Communications: A Deep Reinforcement Learning Approach. (arXiv:2209.13425v1 [cs.NI])
    Improving the interactivity and interconnectivity between people is one of the highlights of the Metaverse. The Metaverse relies on a core approach, digital twinning, which is a means to replicate physical world objects, people, actions and scenes onto the virtual world. Being able to access scenes and information associated with the physical world, in the Metaverse in real-time and under mobility, is essential in developing a highly accessible, interactive and interconnective experience for all users. This development allows users from other locations to access high-quality real-world and up-to-date information about events happening in another location, and socialize with others hyper-interactively. Nevertheless, receiving continual, smooth updates generated by others from the Metaverse is a challenging task due to the large data size of the virtual world graphics and the need for low latency transmission. With the development of Mobile Augmented Reality (MAR), users can interact via the Metaverse in a highly interactive manner, even under mobility. Hence in our work, we considered an environment with users in moving Internet of Vehicles (IoV), downloading real-time virtual world updates from Metaverse Service Provider Cell Stations (MSPCSs) via wireless communications. We design an environment with multiple cell stations, where there will be a handover of users' virtual world graphic download tasks between cell stations. As transmission latency is the primary concern in receiving virtual world updates under mobility, our work aims to allocate system resources to minimize the total time taken for users in vehicles to download their virtual world scenes from the cell stations. We utilize deep reinforcement learning and evaluate the performance of the algorithms under different environmental configurations. Our work provides a use case of the Metaverse over AI-enabled 6G communications.  ( 3 min )
    Neural parameter calibration for large-scale multi-agent models. (arXiv:2209.13565v1 [math.OC])
    Computational models have become a powerful tool in the quantitative sciences to understand the behaviour of complex systems that evolve in time. However, they often contain a potentially large number of free parameters whose values cannot be obtained from theory but need to be inferred from data. This is especially the case for models in the social sciences, economics, or computational epidemiology. Yet many current parameter estimation methods are mathematically involved and computationally slow to run. In this paper we present a computationally simple and fast method to retrieve accurate probability densities for model parameters using neural differential equations. We present a pipeline comprising multi-agent models acting as forward solvers for systems of ordinary or stochastic differential equations, and a neural network to then extract parameters from the data generated by the model. The two combined create a powerful tool that can quickly estimate densities on model parameters, even for very large systems. We demonstrate the method on synthetic time series data of the SIR model of the spread of infection, and perform an in-depth analysis of the Harris-Wilson model of economic activity on a network, representing a non-convex problem. For the latter, we apply our method both to synthetic data and to data of economic activity across Greater London. We find that our method calibrates the model orders of magnitude more accurately than a previous study of the same dataset using classical techniques, while running between 195 and 390 times faster.  ( 3 min )
    Superiority of GNN over NN in generalizing bandlimited functions. (arXiv:2206.05904v2 [cs.LG] UPDATED)
    We constructively show, via rigorous mathematical arguments, that GNN architectures outperform those of NN in approximating bandlimited functions on compact $d$-dimensional Euclidean grids. We show that the former only need $\mathcal{M}$ sampled functional values in order to achieve a uniform approximation error of $O_{d}(2^{-\mathcal{M}^{1/d}})$ and that this error rate is optimal, in the sense that, NNs might achieve worse.  ( 2 min )
    Denoising Diffusion Error Correction Codes. (arXiv:2209.13533v1 [cs.IT])
    Error correction code (ECC) is an integral part of the physical communication layer, ensuring reliable data transfer over noisy channels. Recently, neural decoders have demonstrated their advantage over classical decoding techniques. However, recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders. In this work, we propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths. Our framework models the forward channel corruption as a series of diffusion steps that can be reversed iteratively. Three contributions are made: (i) a diffusion process suitable for the decoding setting is introduced, (ii) the neural diffusion decoder is conditioned on the number of parity errors, which indicates the level of corruption at a given step, (iii) a line search procedure based on the code's syndrome obtains the optimal reverse diffusion step size. The proposed approach demonstrates the power of diffusion models for ECC and is able to achieve state of the art accuracy, outperforming the other neural decoders by sizable margins, even for a single reverse diffusion step.  ( 2 min )
    Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs. (arXiv:2209.13443v1 [cs.LG])
    With deep neural networks (DNNs) emerging as the backbone in a multitude of computer vision tasks, their adoption in real-world consumer applications broadens continuously. Given the abundance and omnipresence of smart devices, "smart ecosystems" are being formed where sensing happens simultaneously rather than standalone. This is shifting the on-device inference paradigm towards deploying centralised neural processing units (NPUs) at the edge, where multiple devices (e.g. in smart homes or autonomous vehicles) can stream their data for processing with dynamic rates. While this provides enhanced potential for input batching, naive solutions can lead to subpar performance and quality of experience, especially under spiking loads. At the same time, the deployment of dynamic DNNs, comprising stochastic computation graphs (e.g. early-exit (EE) models), introduces a new dimension of dynamic behaviour in such systems. In this work, we propose a novel early-exit-aware scheduling algorithm that allows sample preemption at run time, to account for the dynamicity introduced both by the arrival and early-exiting processes. At the same time, we introduce two novel dimensions to the design space of the NPU hardware architecture, namely Fluid Batching and Stackable Processing Elements, that enable run-time adaptability to different batch sizes and significantly improve the NPU utilisation even at small batch sizes. Our evaluation shows that our system achieves an average 1.97x and 6.7x improvement over state-of-the-art DNN streaming systems in terms of average latency and tail latency SLO satisfaction, respectively.  ( 3 min )
    Simulation-Informed Revenue Extrapolation with Confidence Estimate for Scaleup Companies Using Scarce Time-Series Data. (arXiv:2208.10375v3 [cs.CE] UPDATED)
    Investment professionals rely on extrapolating company revenue into the future (i.e. revenue forecast) to approximate the valuation of scaleups (private companies in a high-growth stage) and inform their investment decision. This task is manual and empirical, leaving the forecast quality heavily dependent on the investment professionals' experiences and insights. Furthermore, financial data on scaleups is typically proprietary, costly and scarce, ruling out the wide adoption of data-driven approaches. To this end, we propose a simulation-informed revenue extrapolation (SiRE) algorithm that generates fine-grained long-term revenue predictions on small datasets and short time-series. SiRE models the revenue dynamics as a linear dynamical system (LDS), which is solved using the EM algorithm. The main innovation lies in how the noisy revenue measurements are obtained during training and inferencing. SiRE works for scaleups that operate in various sectors and provides confidence estimates. The quantitative experiments on two practical tasks show that SiRE significantly surpasses the baseline methods by a large margin. We also observe high performance when SiRE extrapolates long-term predictions from short time-series. The performance-efficiency balance and result explainability of SiRE are also validated empirically. Evaluated from the perspective of investment professionals, SiRE can precisely locate the scaleups that have a great potential return in 2 to 5 years. Furthermore, our qualitative inspection illustrates some advantageous attributes of the SiRE revenue forecasts.  ( 3 min )
    Regularized Contrastive Learning of Semantic Search. (arXiv:2209.13241v1 [cs.LG])
    Semantic search is an important task which objective is to find the relevant index from a database for query. It requires a retrieval model that can properly learn the semantics of sentences. Transformer-based models are widely used as retrieval models due to their excellent ability to learn semantic representations. in the meantime, many regularization methods suitable for them have also been proposed. In this paper, we propose a new regularization method: Regularized Contrastive Learning, which can help transformer-based models to learn a better representation of sentences. It firstly augments several different semantic representations for every sentence, then take them into the contrastive objective as regulators. These contrastive regulators can overcome overfitting issues and alleviate the anisotropic problem. We firstly evaluate our approach on 7 semantic search benchmarks with the outperforming pre-trained model SRoBERTA. The results show that our method is more effective for learning a superior sentence representation. Then we evaluate our approach on 2 challenging FAQ datasets, Cough and Faqir, which have long query and index. The results of our experiments demonstrate that our method outperforms baseline methods.  ( 2 min )
    Integrated multimodal artificial intelligence framework for healthcare applications. (arXiv:2202.12998v4 [cs.LG] UPDATED)
    Artificial intelligence (AI) systems hold great promise to improve healthcare over the next decades. Specifically, AI systems leveraging multiple data sources and input modalities are poised to become a viable method to deliver more accurate results and deployable pipelines across a wide range of applications. In this work, we propose and evaluate a unified Holistic AI in Medicine (HAIM) framework to facilitate the generation and testing of AI systems that leverage multimodal inputs. Our approach uses generalizable data pre-processing and machine learning modeling stages that can be readily adapted for research and deployment in healthcare environments. We evaluate our HAIM framework by training and characterizing 14,324 independent models based on HAIM-MIMIC-MM, a multimodal clinical database (N=34,537 samples) containing 7,279 unique hospitalizations and 6,485 patients, spanning all possible input combinations of 4 data modalities (i.e., tabular, time-series, text, and images), 11 unique data sources and 12 predictive tasks. We show that this framework can consistently and robustly produce models that outperform similar single-source approaches across various healthcare demonstrations (by 6-33%), including 10 distinct chest pathology diagnoses, along with length-of-stay and 48-hour mortality predictions. We also quantify the contribution of each modality and data source using Shapley values, which demonstrates the heterogeneity in data modality importance and the necessity of multimodal inputs across different healthcare-relevant tasks. The generalizable properties and flexibility of our Holistic AI in Medicine (HAIM) framework could offer a promising pathway for future multimodal predictive systems in clinical and operational healthcare settings.  ( 3 min )
    Multi-Spatio-temporal Fusion Graph Recurrent Network for Traffic forecasting. (arXiv:2205.01480v2 [cs.LG] UPDATED)
    Traffic forecasting is essential for the traffic construction of smart cities in the new era. However, traffic data's complex spatial and temporal dependencies make traffic forecasting extremely challenging. Most existing traffic forecasting methods rely on the predefined adjacency matrix to model the Spatio-temporal dependencies. Nevertheless, the road traffic state is highly real-time, so the adjacency matrix should change dynamically with time. This article presents a new Multi-Spatio-temporal Fusion Graph Recurrent Network (MSTFGRN) to address the issues above. The network proposes a data-driven weighted adjacency matrix generation method to compensate for real-time spatial dependencies not reflected by the predefined adjacency matrix. It also efficiently learns hidden Spatio-temporal dependencies by performing a new two-way Spatio-temporal fusion operation on parallel Spatio-temporal relations at different moments. Finally, global Spatio-temporal dependencies are captured simultaneously by integrating a global attention mechanism into the Spatio-temporal fusion module. Extensive trials on four large-scale, real-world traffic datasets demonstrate that our method achieves state-of-the-art performance compared to alternative baselines.  ( 2 min )
    Fair Machine Learning Under Partial Compliance. (arXiv:2011.03654v4 [cs.CY] UPDATED)
    Typically, fair machine learning research focuses on a single decisionmaker and assumes that the underlying population is stationary. However, many of the critical domains motivating this work are characterized by competitive marketplaces with many decisionmakers. Realistically, we might expect only a subset of them to adopt any non-compulsory fairness-conscious policy, a situation that political philosophers call partial compliance. This possibility raises important questions: how does the strategic behavior of decision subjects in partial compliance settings affect the allocation outcomes? If k% of employers were to voluntarily adopt a fairness-promoting intervention, should we expect k% progress (in aggregate) towards the benefits of universal adoption, or will the dynamics of partial compliance wash out the hoped-for benefits? How might adopting a global (versus local) perspective impact the conclusions of an auditor? In this paper, we propose a simple model of an employment market, leveraging simulation as a tool to explore the impact of both interaction effects and incentive effects on outcomes and auditing metrics. Our key findings are that at equilibrium: (1) partial compliance (k% of employers) can result in far less than proportional (k%) progress towards the full compliance outcomes; (2) the gap is more severe when fair employers match global (vs local) statistics; (3) choices of local vs global statistics can paint dramatically different pictures of the performance vis-a-vis fairness desiderata of compliant versus non-compliant employers; and (4) partial compliance to local parity measures can induce extreme segregation.  ( 3 min )
    An Overview and Prospective Outlook on Robust Training and Certification of Machine Learning Models. (arXiv:2208.07464v2 [cs.LG] UPDATED)
    In this discussion paper, we survey recent research surrounding robustness of machine learning models. As learning algorithms become increasingly more popular in data-driven control systems, their robustness to data uncertainty must be ensured in order to maintain reliable safety-critical operations. We begin by reviewing common formalisms for such robustness, and then move on to discuss popular and state-of-the-art techniques for training robust machine learning models as well as methods for provably certifying such robustness. From this unification of robust machine learning, we identify and discuss pressing directions for future research in the area.  ( 2 min )
    Provably efficient machine learning for quantum many-body problems. (arXiv:2106.12627v4 [quant-ph] UPDATED)
    Classical machine learning (ML) provides a potentially powerful approach to solving challenging quantum many-body problems in physics and chemistry. However, the advantages of ML over more traditional methods have not been firmly established. In this work, we prove that classical ML algorithms can efficiently predict ground state properties of gapped Hamiltonians in finite spatial dimensions, after learning from data obtained by measuring other Hamiltonians in the same quantum phase of matter. In contrast, under widely accepted complexity theory assumptions, classical algorithms that do not learn from data cannot achieve the same guarantee. We also prove that classical ML algorithms can efficiently classify a wide range of quantum phases of matter. Our arguments are based on the concept of a classical shadow, a succinct classical description of a many-body quantum state that can be constructed in feasible quantum experiments and be used to predict many properties of the state. Extensive numerical experiments corroborate our theoretical results in a variety of scenarios, including Rydberg atom systems, 2D random Heisenberg models, symmetry-protected topological phases, and topologically ordered phases.  ( 3 min )
    Machine learning in front of statistical methods for prediction spread SARS-CoV-2 in Colombia. (arXiv:2208.05910v3 [physics.soc-ph] UPDATED)
    An analytical study of the disease COVID-19 in Colombia was carried out using mathematical models such as Susceptible-Exposed-Infectious-Removed (SEIR), Logistic Regression (LR), and a machine learning method called Polynomial Regression Method. Previous analysis has been performed on the daily number of cases, deaths, infected people, and people who were exposed to the virus, all of them in a timeline of 550 days. Moreover, it has made the fitting of infection spread detailing the most efficient and optimal methods with lower propagation error and the presence of statistical biases. Finally, four different prevention scenarios were proposed to evaluate the ratio of each one of the parameters related to the disease.  ( 3 min )
    GOOD: A Graph Out-of-Distribution Benchmark. (arXiv:2206.08452v2 [cs.LG] UPDATED)
    Out-of-distribution (OOD) learning deals with scenarios in which training and test data follow different distributions. Although general OOD problems have been intensively studied in machine learning, graph OOD is only an emerging area of research. Currently, there lacks a systematic benchmark tailored to graph OOD method evaluation. In this work, we aim at developing an OOD benchmark, known as GOOD, for graphs specifically. We explicitly make distinctions between covariate and concept shifts and design data splits that accurately reflect different shifts. We consider both graph and node prediction tasks as there are key differences in designing shifts. Overall, GOOD contains 11 datasets with 17 domain selections. When combined with covariate, concept, and no shifts, we obtain 51 different splits. We provide performance results on 10 commonly used baseline methods with 10 random runs. This results in 510 dataset-model combinations in total. Our results show significant performance gaps between in-distribution and OOD settings. Our results also shed light on different performance trends between covariate and concept shifts by different methods. Our GOOD benchmark is a growing project and expects to expand in both quantity and variety of resources as the area develops. The GOOD benchmark can be accessed via https://github.com/divelab/GOOD/.  ( 2 min )
    A model-agnostic approach for generating Saliency Maps to explain inferred decisions of Deep Learning Models. (arXiv:2209.08906v2 [cs.CV] UPDATED)
    The widespread use of black-box AI models has raised the need for algorithms and methods that explain the decisions made by these models. In recent years, the AI research community is increasingly interested in models' explainability since black-box models take over more and more complicated and challenging tasks. Explainability becomes critical considering the dominance of deep learning techniques for a wide range of applications, including but not limited to computer vision. In the direction of understanding the inference process of deep learning models, many methods that provide human comprehensible evidence for the decisions of AI models have been developed, with the vast majority relying their operation on having access to the internal architecture and parameters of these models (e.g., the weights of neural networks). We propose a model-agnostic method for generating saliency maps that has access only to the output of the model and does not require additional information such as gradients. We use Differential Evolution (DE) to identify which image pixels are the most influential in a model's decision-making process and produce class activation maps (CAMs) whose quality is comparable to the quality of CAMs created with model-specific algorithms. DE-CAM achieves good performance without requiring access to the internal details of the model's architecture at the cost of more computational complexity.  ( 3 min )
    Learning When to Advise Human Decision Makers. (arXiv:2209.13578v1 [cs.AI])
    Artificial intelligence (AI) systems are increasingly used for providing advice to facilitate human decision making. While a large body of work has explored how AI systems can be optimized to produce accurate and fair advice and how algorithmic advice should be presented to human decision makers, in this work we ask a different basic question: When should algorithms provide advice? Motivated by limitations of the current practice of constantly providing algorithmic advice, we propose the design of AI systems that interact with the human user in a two-sided manner and provide advice only when it is likely to be beneficial to the human in making their decision. Our AI systems learn advising policies using past human decisions. Then, for new cases, the learned policies utilize input from the human to identify cases where algorithmic advice would be useful, as well as those where the human is better off deciding alone. We conduct a large-scale experiment to evaluate our approach by using data from the US criminal justice system on pretrial-release decisions. In our experiment, participants were asked to assess the risk of defendants to violate their release terms if released and were advised by different advising approaches. The results show that our interactive-advising approach manages to provide advice at times of need and to significantly improve human decision making compared to fixed, non-interactive advising approaches. Our approach has additional advantages in facilitating human learning, preserving complementary strengths of human decision makers, and leading to more positive responsiveness to the advice.  ( 3 min )
    Extracting Weighted Finite Automata from Recurrent Neural Networks for Natural Languages. (arXiv:2206.14621v2 [cs.CL] UPDATED)
    Recurrent Neural Networks (RNNs) have achieved tremendous success in sequential data processing. However, it is quite challenging to interpret and verify RNNs' behaviors directly. To this end, many efforts have been made to extract finite automata from RNNs. Existing approaches such as exact learning are effective in extracting finite-state models to characterize the state dynamics of RNNs for formal languages, but are limited in the scalability to process natural languages. Compositional approaches that are scablable to natural languages fall short in extraction precision. In this paper, we identify the transition sparsity problem that heavily impacts the extraction precision. To address this problem, we propose a transition rule extraction approach, which is scalable to natural language processing models and effective in improving extraction precision. Specifically, we propose an empirical method to complement the missing rules in the transition diagram. In addition, we further adjust the transition matrices to enhance the context-aware ability of the extracted weighted finite automaton (WFA). Finally, we propose two data augmentation tactics to track more dynamic behaviors of the target RNN. Experiments on two popular natural language datasets show that our method can extract WFA from RNN for natural language processing with better precision than existing approaches. Our code is available at https://github.com/weizeming/Extract_WFA_from_RNN_for_NL.
    Characterizing Uncertainty in the Visual Text Analysis Pipeline. (arXiv:2209.13498v1 [cs.HC])
    Current visual text analysis approaches rely on sophisticated processing pipelines. Each step of such a pipeline potentially amplifies any uncertainties from the previous step. To ensure the comprehensibility and interoperability of the results, it is of paramount importance to clearly communicate the uncertainty not only of the output but also within the pipeline. In this paper, we characterize the sources of uncertainty along the visual text analysis pipeline. Within its three phases of labeling, modeling, and analysis, we identify six sources, discuss the type of uncertainty they create, and how they propagate.  ( 2 min )
    Sparse Bayesian Learning for Complex-Valued Rational Approximations. (arXiv:2206.02523v2 [stat.ML] UPDATED)
    Surrogate models are used to alleviate the computational burden in engineering tasks, which require the repeated evaluation of computationally demanding models of physical systems, such as the efficient propagation of uncertainties. For models that show a strongly non-linear dependence on their input parameters, standard surrogate techniques, such as polynomial chaos expansion, are not sufficient to obtain an accurate representation of the original model response. Through applying a rational approximation instead, the approximation error can be efficiently reduced for models whose non-linearity is accurately described through a rational function. Specifically, our aim is to approximate complex-valued models. A common approach to obtain the coefficients in the surrogate is to minimize the sample-based error between model and surrogate in the least-square sense. In order to obtain an accurate representation of the original model and to avoid overfitting, the sample set has be two to three times the number of polynomial terms in the expansion. For models that require a high polynomial degree or are high-dimensional in terms of their input parameters, this number often exceeds the affordable computational cost. To overcome this issue, we apply a sparse Bayesian learning approach to the rational approximation. Through a specific prior distribution structure, sparsity is induced in the coefficients of the surrogate model. The denominator polynomial coefficients as well as the hyperparameters of the problem are determined through a type-II-maximum likelihood approach. We apply a quasi-Newton gradient-descent algorithm in order to find the optimal denominator coefficients and derive the required gradients through application of $\mathbb{CR}$-calculus.
    Causal Balancing for Domain Generalization. (arXiv:2206.05263v3 [cs.LG] UPDATED)
    While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations. We propose a balanced mini-batch sampling strategy to transform a biased data distribution into a spurious-free balanced distribution, based on the invariance of the underlying causal mechanisms for the data generation process. We argue that the Bayes optimal classifiers trained on such balanced distribution are minimax optimal across a diverse enough environment space. We also provide an identifiability guarantee of the latent variable model of the proposed data generation process, when utilizing enough train environments. Experiments are conducted on DomainBed, demonstrating empirically that our method obtains the best performance across 20 baselines reported on the benchmark.
    VDDB: a comprehensive resource and machine learning platform for antiviral drug discovery. (arXiv:2209.13521v1 [q-bio.BM])
    Virus infection is one of the major diseases that seriously threaten human health. To meet the growing demand for mining and sharing data resources related to antiviral drugs and to accelerate the design and discovery of new antiviral drugs, we presented an open-access antiviral drug resource and machine learning platform (VDDB), which, to the best of our knowledge, is the first comprehensive dedicated resource for experimentally verified potential drugs/molecules based on manually curated data. Currently, VDDB highlights 848 clinical vaccines, 199 clinical antibodies, as well as over 710,000 small molecules targeting 39 medically important viruses including SARS-CoV-2. Furthermore, VDDB stores approximately 3 million records of pharmacological data for these collected potential antiviral drugs/molecules, involving 314 cell infection-based phenotypic and 234 target-based genotypic assays. Based on these annotated pharmacological data, VDDB allows users to browse, search and download reliable information about these collects for various viruses of interest. In particular, VDDB also integrates 57 cell infection- and 117 target-based associated high-accuracy machine learning models to support various antivirals identification-related tasks, such as compound activity prediction, virtual screening, drug repositioning and target fishing. VDDB is freely accessible at this http URL
    Genetic Programming-Based Evolutionary Deep Learning for Data-Efficient Image Classification. (arXiv:2209.13233v1 [cs.NE])
    Data-efficient image classification is a challenging task that aims to solve image classification using small training data. Neural network-based deep learning methods are effective for image classification, but they typically require large-scale training data and have major limitations such as requiring expertise to design network architectures and having poor interpretability. Evolutionary deep learning is a recent hot topic that combines evolutionary computation with deep learning. However, most evolutionary deep learning methods focus on evolving architectures of neural networks, which still suffer from limitations such as poor interpretability. To address this, this paper proposes a new genetic programming-based evolutionary deep learning approach to data-efficient image classification. The new approach can automatically evolve variable-length models using many important operators from both image and classification domains. It can learn different types of image features from colour or gray-scale images, and construct effective and diverse ensembles for image classification. A flexible multi-layer representation enables the new approach to automatically construct shallow or deep models/trees for different tasks and perform effective transformations on the input data via multiple internal nodes. The new approach is applied to solve five image classification tasks with different training set sizes. The results show that it achieves better performance in most cases than deep learning methods for data-efficient image classification. A deep analysis shows that the new approach has good convergence and evolves models with high interpretability, different lengths/sizes/shapes, and good transferability.
    MammoDL: Mammographic Breast Density Estimation using Federated Learning. (arXiv:2206.05575v2 [eess.IV] UPDATED)
    Assessing breast cancer risk from imaging remains a subjective process, in which radiologists employ simple computer aided detection (CAD) systems or qualitative visual assessment to estimate breast percent density (PD). Machine learning (ML) models have become the most promising way to quantify breast cancer risk for early, accurate, and equitable diagnoses, but training such models in medical research is often restricted to small, single-institution data. Since patient demographics and imaging characteristics may vary considerably across imaging sites, models trained on single-institution data tend not to generalize well. In response to this problem, MammoDL is proposed, an open-source software tool that leverages a U-Net architecture to accurately estimate breast PD and complexity from mammography. With the Open Federated Learning (OpenFL) library, this solution enables secure training on datasets across multiple institutions. MammoDL is a leaner, more flexible model than its predecessors, boasting improved generalization due to federation-enabled training on larger, more representative datasets.
    On Kernel Regression with Data-Dependent Kernels. (arXiv:2209.01691v2 [cs.LG] UPDATED)
    The primary hyperparameter in kernel regression (KR) is the choice of kernel. In most theoretical studies of KR, one assumes the kernel is fixed before seeing the training data. Under this assumption, it is known that the optimal kernel is equal to the prior covariance of the target function. In this note, we consider KR in which the kernel may be updated after seeing the training data. We point out that an analogous choice of kernel using the posterior of the target function is optimal in this setting. Connections to the view of deep neural networks as data-dependent kernel learners are discussed.  ( 2 min )
    DAMO-NLP at NLPCC-2022 Task 2: Knowledge Enhanced Robust NER for Speech Entity Linking. (arXiv:2209.13187v1 [cs.CL])
    Speech Entity Linking aims to recognize and disambiguate named entities in spoken languages. Conventional methods suffer gravely from the unfettered speech styles and the noisy transcripts generated by ASR systems. In this paper, we propose a novel approach called Knowledge Enhanced Named Entity Recognition (KENER), which focuses on improving robustness through painlessly incorporating proper knowledge in the entity recognition stage and thus improving the overall performance of entity linking. KENER first retrieves candidate entities for a sentence without mentions, and then utilizes the entity descriptions as extra information to help recognize mentions. The candidate entities retrieved by a dense retrieval module are especially useful when the input is short or noisy. Moreover, we investigate various data sampling strategies and design effective loss functions, in order to improve the quality of retrieved entities in both recognition and disambiguation stages. Lastly, a linking with filtering module is applied as the final safeguard, making it possible to filter out wrongly-recognized mentions. Our system achieves 1st place in Track 1 and 2nd place in Track 2 of NLPCC-2022 Shared Task 2.
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v3 [cs.LG] UPDATED)
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). Such methods are usually implemented by having clients perform one or more epochs of local training per round while randomly reshuffling their finite dataset in each epoch. Data imbalance, where clients have different numbers of local training samples, is ubiquitous in FL applications, resulting in different clients performing different numbers of local updates in each round. In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in this regime encompassing random reshuffling and heterogeneity. FedShuffle is the first local update method with theoretical convergence guarantees that incorporates random reshuffling, data imbalance, and client sampling - features that are essential in large-scale cross-device FL. We present a comprehensive theoretical analysis of FedShuffle and show, both theoretically and empirically, that it does not suffer from the objective function mismatch that is present in FL methods that assume homogeneous updates in heterogeneous FL setups, such as FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. Similar to Mime (Karimireddy et al., 2020), we show that FedShuffle with momentum variance reduction (Cutkosky & Orabona, 2019) improves upon non-local methods under a Hessian similarity assumption.
    A comprehensive survey on computational learning methods for analysis of gene expression data. (arXiv:2202.02958v5 [q-bio.GN] UPDATED)
    Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
    EgoSpeed-Net: Forecasting Speed-Control in Driver Behavior from Egocentric Video Data. (arXiv:2209.13459v1 [cs.CV])
    Speed-control forecasting, a challenging problem in driver behavior analysis, aims to predict the future actions of a driver in controlling vehicle speed such as braking or acceleration. In this paper, we try to address this challenge solely using egocentric video data, in contrast to the majority of works in the literature using either third-person view data or extra vehicle sensor data such as GPS, or both. To this end, we propose a novel graph convolutional network (GCN) based network, namely, EgoSpeed-Net. We are motivated by the fact that the position changes of objects over time can provide us very useful clues for forecasting the speed change in future. We first model the spatial relations among the objects from each class, frame by frame, using fully-connected graphs, on top of which GCNs are applied for feature extraction. Then we utilize a long short-term memory network to fuse such features per class over time into a vector, concatenate such vectors and forecast a speed-control action using a multilayer perceptron classifier. We conduct extensive experiments on the Honda Research Institute Driving Dataset and demonstrate the superior performance of EgoSpeed-Net.
    MARS: A Motif-based Autoregressive Model for Retrosynthesis Prediction. (arXiv:2209.13178v1 [cs.LG])
    Retrosynthesis is a major task for drug discovery. It is formulated as a graph-generating problem by many existing approaches. Specifically, these methods firstly identify the reaction center, and break target molecule accordingly to generate synthons. Reactants are generated by either adding atoms sequentially to synthon graphs or directly adding proper leaving groups. However, both two strategies suffer since adding atoms results in a long prediction sequence which increases generation difficulty, while adding leaving groups can only consider the ones in the training set which results in poor generalization. In this paper, we propose a novel end-to-end graph generation model for retrosynthesis prediction, which sequentially identifies the reaction center, generates the synthons, and adds motifs to the synthons to generate reactants. Since chemically meaningful motifs are bigger than atoms and smaller than leaving groups, our method enjoys lower prediction complexity than adding atoms and better generalization than adding leaving groups. Experiments on a benchmark dataset show that the proposed model significantly outperforms previous state-of-the-art algorithms.
    RADio -- Rank-Aware Divergence Metrics to Measure Normative Diversity in News Recommendations. (arXiv:2209.13520v1 [cs.IR])
    In traditional recommender system literature, diversity is often seen as the opposite of similarity, and typically defined as the distance between identified topics, categories or word models. However, this is not expressive of the social science's interpretation of diversity, which accounts for a news organization's norms and values and which we here refer to as normative diversity. We introduce RADio, a versatile metrics framework to evaluate recommendations according to these normative goals. RADio introduces a rank-aware Jensen Shannon (JS) divergence. This combination accounts for (i) a user's decreasing propensity to observe items further down a list and (ii) full distributional shifts as opposed to point estimates. We evaluate RADio's ability to reflect five normative concepts in news recommendations on the Microsoft News Dataset and six (neural) recommendation algorithms, with the help of our metadata enrichment pipeline. We find that RADio provides insightful estimates that can potentially be used to inform news recommender system design.
    Reward Learning using Structural Motifs in Inverse Reinforcement Learning. (arXiv:2209.13489v1 [cs.LG])
    The Inverse Reinforcement Learning (\textit{IRL}) problem has seen rapid evolution in the past few years, with important applications in domains like robotics, cognition, and health. In this work, we explore the inefficacy of current IRL methods in learning an agent's reward function from expert trajectories depicting long-horizon, complex sequential tasks. We hypothesize that imbuing IRL models with structural motifs capturing underlying tasks can enable and enhance their performance. Subsequently, we propose a novel IRL method, SMIRL, that first learns the (approximate) structure of a task as a finite-state-automaton (FSA), then uses the structural motif to solve the IRL problem. We test our model on both discrete grid world and high-dimensional continuous domain environments. We empirically show that our proposed approach successfully learns all four complex tasks, where two foundational IRL baselines fail. Our model also outperforms the baselines in sample efficiency on a simpler toy task. We further show promising test results in a modified continuous domain on tasks with compositional reward functions.
    On Extending Amdahl's law to Learn Computer Performance. (arXiv:2110.07822v2 [cs.LG] UPDATED)
    The problem of learning parallel computer performance is investigated in the context of multicore processors. Given a fixed workload, the effect of varying system configuration on performance is sought. Conventionally, the performance speedup due to a single resource enhancement is formulated using Amdahl's law. However, in case of multiple configurable resources the conventional formulation results in several disconnected speedup equations that cannot be combined together to determine the overall speedup. To solve this problem, we propose to (1) extend Amdahl's law to accommodate multiple configurable resources into the overall speedup equation, and (2) transform the speedup equation into a multivariable regression problem suitable for machine learning. Using experimental data from fifty-eight tests spanning two benchmarks (SPECCPU 2017 and PCMark 10) and four hardware platforms (Intel Xeon 8180M, AMD EPYC 7702P, Intel CoffeeLake 8700K, and AMD Ryzen 3900X), analytical models are developed and cross-validated. Findings indicate that in most cases, the models result in an average cross-validated accuracy higher than 95%, thereby validating the proposed extension of Amdahl's law. The proposed methodology enables rapid generation of multivariable analytical models to support future industrial development, optimization, and simulation needs.
    Predicting Swarm Equatorial Plasma Bubbles Via Supervised Machine Learning. (arXiv:2209.13482v1 [physics.space-ph])
    Equatorial Plasma Bubbles (EPBs) are plumes of low density plasma that rise up from the bottomside of the F layer towards the exosphere. EPBs are known causes of radio wave scintillations which can degrade communications with spacecraft. We build a random forest regressor to predict and forecast the probability of an EPB [0-1] detected by the IBI processor on-board the SWARM spacecraft. We use 8-years of Swarm data from 2014 to 2021 and transform the data from a time series into a 5 dimensional space consisting of latitude, longitude, mlt, year, and day-of-the-year. We also add Kp, F10.7cm and solar wind speed. The observations of EPBs with respect to geolocation, local time, season and solar activity mostly agrees with existing work, whilst the link geomagnetic activity is less clear. The prediction has an accuracy of 88% and performs well across the EPB specific spatiotemporal scales. This proves that the XGBoost method is able to successfully capture the climatological and daily variability of SWARM EPBs. Capturing the daily variance has long evaded researchers because of local and stochastic features within the ionosphere. We take advantage of Shapley Values to explain the model and to gain insight into the physics of EPBs. We find that as the solar wind speed increases the probability of an EPB decreases. We also identify a spike in EPB probability around the Earth-Sun perihelion. Both of these insights were derived directly from the XGBoost and Shapley technique.
    Graph-Based Active Machine Learning Method for Diverse and Novel Antimicrobial Peptides Generation and Selection. (arXiv:2209.13518v1 [q-bio.BM])
    As antibiotic-resistant bacterial strains are rapidly spreading worldwide, infections caused by these strains are emerging as a global crisis causing the death of millions of people every year. Antimicrobial Peptides (AMPs) are one of the candidates to tackle this problem because of their potential diversity, and ability to favorably modulate the host immune response. However, large-scale screening of new AMP candidates is expensive, time-consuming, and now affordable in developing countries, which need the treatments the most. In this work, we propose a novel active machine learning-based framework that statistically minimizes the number of wet-lab experiments needed to design new AMPs, while ensuring a high diversity and novelty of generated AMPs sequences, in multi-rounds of wet-lab AMP screening settings. Combining recurrent neural network models and a graph-based filter (GraphCC), our proposed approach delivers novel and diverse candidates and demonstrates better performances according to our defined metrics.
    On Sharp Stochastic Zeroth Order Hessian Estimators over Riemannian Manifolds. (arXiv:2201.10780v3 [stat.ML] UPDATED)
    We study Hessian estimators for functions defined over an $n$-dimensional complete analytic Riemannian manifold. We introduce new stochastic zeroth-order Hessian estimators using $O (1)$ function evaluations. We show that, for an analytic real-valued function $f$, our estimator achieves a bias bound of order $ O \left( \gamma \delta^2 \right) $, where $ \gamma $ depends on both the Levi-Civita connection and function $f$, and $\delta$ is the finite difference step size. To the best of our knowledge, our results provide the first bias bound for Hessian estimators that explicitly depends on the geometry of the underlying Riemannian manifold. We also study downstream computations based on our Hessian estimators. The supremacy of our method is evidenced by empirical evaluations.
    Reinforcement Learning with Non-Exponential Discounting. (arXiv:2209.13413v1 [cs.LG])
    Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown that humans often adopt a hyperbolic discounting scheme, which is optimal when a specific task termination time distribution is assumed. In this work, we propose a theory for continuous-time model-based reinforcement learning generalized to arbitrary discount functions. This formulation covers the case in which there is a non-exponential random termination time. We derive a Hamilton-Jacobi-Bellman (HJB) equation characterizing the optimal policy and describe how it can be solved using a collocation method, which uses deep learning for function approximation. Further, we show how the inverse RL problem can be approached, in which one tries to recover properties of the discount function given decision data. We validate the applicability of our proposed approach on two simulated problems. Our approach opens the way for the analysis of human discounting in sequential decision-making tasks.
    Semi-Blind Source Separation with Learned Constraints. (arXiv:2209.13585v1 [eess.SP])
    Blind source separation (BSS) algorithms are unsupervised methods, which are the cornerstone of hyperspectral data analysis by allowing for physically meaningful data decompositions. BSS problems being ill-posed, the resolution requires efficient regularization schemes to better distinguish between the sources and yield interpretable solutions. For that purpose, we investigate a semi-supervised source separation approach in which we combine a projected alternating least-square algorithm with a learning-based regularization scheme. In this article, we focus on constraining the mixing matrix to belong to a learned manifold by making use of generative models. Altogether, we show that this allows for an innovative BSS algorithm, with improved accuracy, which provides physically interpretable solutions. The proposed method, coined sGMCA, is tested on realistic hyperspectral astrophysical data in challenging scenarios involving strong noise, highly correlated spectra and unbalanced sources. The results highlight the significant benefit of the learned prior to reduce the leakages between the sources, which allows an overall better disentanglement.
    MolGAN: An implicit generative model for small molecular graphs. (arXiv:1805.11973v2 [stat.ML] UPDATED)
    Deep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs, it is possible to side-step expensive search procedures in the discrete and vast space of chemical structures. We introduce MolGAN, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuristics of previous likelihood-based methods. Our method adapts generative adversarial networks (GANs) to operate directly on graph-structured data. We combine our approach with a reinforcement learning objective to encourage the generation of molecules with specific desired chemical properties. In experiments on the QM9 chemical database, we demonstrate that our model is capable of generating close to 100% valid compounds. MolGAN compares favorably both to recent proposals that use string-based (SMILES) representations of molecules and to a likelihood-based method that directly generates graphs, albeit being susceptible to mode collapse. Code at https://github.com/nicola-decao/MolGAN
    DBGSL: Dynamic Brain Graph Structure Learning. (arXiv:2209.13513v1 [cs.LG])
    Functional connectivity (FC) between regions of the brain is commonly estimated through statistical dependency measures applied to functional magnetic resonance imaging (fMRI) data. The resulting functional connectivity matrix (FCM) is often taken to represent the adjacency matrix of a brain graph. Recently, graph neural networks (GNNs) have been successfully applied to FCMs to learn brain graph representations. A common limitation of existing GNN approaches, however, is that they require the graph adjacency matrix to be known prior to model training. As such, it is implicitly assumed the ground-truth dependency structure of the data is known. Unfortunately, for fMRI this is not the case as the choice of which statistical measure best represents the dependency structure of the data is non-trivial. Also, most GNN applications to fMRI assume FC is static over time, which is at odds with neuroscientific evidence that functional brain networks are time-varying and dynamic. These compounded issues can have a detrimental effect on the capacity of GNNs to learn representations of brain graphs. As a solution, we propose Dynamic Brain Graph Structure Learning (DBGSL), a supervised method for learning the optimal time-varying dependency structure of fMRI data. Specifically, DBGSL learns a dynamic graph from fMRI timeseries via spatial-temporal attention applied to brain region embeddings. The resulting graph is then fed to a spatial-temporal GNN to learn a graph representation for classification. Experiments on large resting-state as well as task fMRI datasets for the task of gender classification demonstrate that DBGSL achieves state-of-the-art performance. Moreover, analysis of the learnt dynamic graphs highlights prediction-related brain regions which align with findings from existing neuroscience literature.
    On the inability of Gaussian process regression to optimally learn compositional functions. (arXiv:2205.07764v2 [stat.ML] UPDATED)
    We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate by a factor that is polynomially suboptimal in the sample size $n$.
    EVE: Environmental Adaptive Neural Network Models for Low-power Energy Harvesting System. (arXiv:2207.09258v2 [cs.LG] UPDATED)
    IoT devices are increasingly being implemented with neural network models to enable smart applications. Energy harvesting (EH) technology that harvests energy from ambient environment is a promising alternative to batteries for powering those devices due to the low maintenance cost and wide availability of the energy sources. However, the power provided by the energy harvester is low and has an intrinsic drawback of instability since it varies with the ambient environment. This paper proposes EVE, an automated machine learning (autoML) co-exploration framework to search for desired multi-models with shared weights for energy harvesting IoT devices. Those shared models incur significantly reduced memory footprint with different levels of model sparsity, latency, and accuracy to adapt to the environmental changes. An efficient on-device implementation architecture is further developed to efficiently execute each model on device. A run-time model extraction algorithm is proposed that retrieves individual model with negligible overhead when a specific model mode is triggered.Experimental results show that the neural networks models generated by EVE is on average 2.5X times faster than the baseline models without pruning and shared weights.
    Learning Variational Models with Unrolling and Bilevel Optimization. (arXiv:2209.12651v2 [stat.ML] UPDATED)
    In this paper we consider the problem learning of variational models in the context of supervised learning via risk minimization. Our goal is to provide a deeper understanding of the two approaches of learning of variational models via bilevel optimization and via algorithm unrolling. The former considers the variational model as a lower level optimization problem below the risk minimization problem, while the latter replaces the lower level optimization problem by an algorithm that solves said problem approximately. Both approaches are used in practice, but, unrolling is much simpler from a computational point of view. To analyze and compare the two approaches, we consider a simple toy model, and compute all risks and the respective estimators explicitly. We show that unrolling can be better than the bilevel optimization approach, but also that the performance of unrolling can depend significantly on further parameters, sometimes in unexpected ways: While the stepsize of the unrolled algorithm matters a lot, the number of unrolled iterations only matters if the number is even or odd, and these two cases are notably different.
    Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction. (arXiv:2107.14432v3 [cs.LG] UPDATED)
    We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance.
    DiffWire: Inductive Graph Rewiring via the Lov\'asz Bound. (arXiv:2206.07369v2 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have been shown to achieve competitive results to tackle graph-related tasks, such as node and graph classification, link prediction and node and graph clustering in a variety of domains. Most GNNs use a message passing framework and hence are called MPNNs. Despite their promising results, MPNNs have been reported to suffer from over-smoothing, over-squashing and under-reaching. Graph rewiring and graph pooling have been proposed in the literature as solutions to address these limitations. However, most state-of-the-art graph rewiring methods fail to preserve the global topology of the graph, are neither differentiable nor inductive, and require the tuning of hyper-parameters. In this paper, we propose DiffWire, a novel framework for graph rewiring in MPNNs that is principled, fully differentiable and parameter-free by leveraging the Lov\'asz bound. Our approach provides a unified theory for graph rewiring by proposing two new, complementary layers in MPNNs: CT-Layer, a layer that learns the commute times and uses them as a relevance function for edge re-weighting; and GAP-Layer, a layer to optimize the spectral gap, depending on the nature of the network and the task at hand. We empirically validate the value of each of these layers separately with benchmark datasets for graph classification. DiffWire brings together the learnability of commute times to related definitions of curvature, opening the door to creating more expressive MPNNs.
    Frame Interpolation for Dynamic Scenes with Implicit Flow Encoding. (arXiv:2209.13284v1 [cs.CV])
    In this paper, we propose an algorithm to interpolate between a pair of images of a dynamic scene. While in the past years significant progress in frame interpolation has been made, current approaches are not able to handle images with brightness and illumination changes, which are common even when the images are captured shortly apart. We propose to address this problem by taking advantage of the existing optical flow methods that are highly robust to the variations in the illumination. Specifically, using the bidirectional flows estimated using an existing pre-trained flow network, we predict the flows from an intermediate frame to the two input images. To do this, we propose to encode the bidirectional flows into a coordinate-based network, powered by a hypernetwork, to obtain a continuous representation of the flow across time. Once we obtain the estimated flows, we use them within an existing blending network to obtain the final intermediate frame. Through extensive experiments, we demonstrate that our approach is able to produce significantly better results than state-of-the-art frame interpolation algorithms.
    Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images. (arXiv:2202.12267v2 [eess.IV] UPDATED)
    In the application of deep learning on optical coherence tomography (OCT) data, it is common to train classification networks using 2D images originating from volumetric data. Given the micrometer resolution of OCT systems, consecutive images are often very similar in both visible structures and noise. Thus, an inappropriate data split can result in overlap between the training and testing sets, with a large portion of the literature overlooking this aspect. In this study, the effect of improper dataset splitting on model evaluation is demonstrated for three classification tasks using three OCT open-access datasets extensively used, Kermany's and Srinivasan's ophthalmology datasets, and AIIMS breast tissue dataset. Results show that the classification performance is inflated by 0.07 up to 0.43 in terms of Matthews Correlation Coefficient (accuracy: 5% to 30%) for models tested on datasets with improper splitting, highlighting the considerable effect of dataset handling on model evaluation. This study intends to raise awareness on the importance of dataset splitting given the increased research interest in implementing deep learning on OCT data.
    Design Perspectives of Multitask Deep Learning Models and Applications. (arXiv:2209.13444v1 [cs.LG])
    In recent years, multi-task learning has turned out to be of great success in various applications. Though single model training has promised great results throughout these years, it ignores valuable information that might help us estimate a metric better. Under learning-related tasks, multi-task learning has been able to generalize the models even better. We try to enhance the feature mapping of the multi-tasking models by sharing features among related tasks and inductive transfer learning. Also, our interest is in learning the task relationships among various tasks for acquiring better benefits from multi-task learning. In this chapter, our objective is to visualize the existing multi-tasking models, compare their performances, the methods used to evaluate the performance of the multi-tasking models, discuss the problems faced during the design and implementation of these models in various domains, and the advantages and milestones achieved by them
    Beyond Real-world Benchmark Datasets: An Empirical Study of Node Classification with GNNs. (arXiv:2206.09144v2 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have achieved great success on a node classification task. Despite the broad interest in developing and evaluating GNNs, they have been assessed with limited benchmark datasets. As a result, the existing evaluation of GNNs lacks fine-grained analysis from various characteristics of graphs. Motivated by this, we conduct extensive experiments with a synthetic graph generator that can generate graphs having controlled characteristics for fine-grained analysis. Our empirical studies clarify the strengths and weaknesses of GNNs from four major characteristics of real-world graphs with class labels of nodes, i.e., 1) class size distributions (balanced vs. imbalanced), 2) edge connection proportions between classes (homophilic vs. heterophilic), 3) attribute values (biased vs. random), and 4) graph sizes (small vs. large). In addition, to foster future research on GNNs, we publicly release our codebase that allows users to evaluate various GNNs with various graphs. We hope this work offers interesting insights for future research.
    Accelerating the Genetic Algorithm for Large-scale Traveling Salesman Problems by Cooperative Coevolutionary Pointer Network with Reinforcement Learning. (arXiv:2209.13077v1 [cs.NE])
    In this paper, we propose a two-stage optimization strategy for solving the Large-scale Traveling Salesman Problems (LSTSPs) named CCPNRL-GA. First, we hypothesize that the participation of a well-performed individual as an elite can accelerate the convergence of optimization. Based on this hypothesis, in the first stage, we cluster the cities and decompose the LSTSPs into multiple subcomponents, and each subcomponent is optimized with a reusable Pointer Network (PtrNet). After subcomponents optimization, we combine all sub-tours to form a valid solution, this solution joins the second stage of optimization with GA. We validate the performance of our proposal on 10 LSTSPs and compare it with traditional EAs. Experimental results show that the participation of an elite individual can greatly accelerate the optimization of LSTSPs, and our proposal has broad prospects for dealing with LSTSPs.
    Group-Invariant Quantum Machine Learning. (arXiv:2205.02261v2 [quant-ph] UPDATED)
    Quantum Machine Learning (QML) models are aimed at learning from data encoded in quantum states. Recently, it has been shown that models with little to no inductive biases (i.e., with no assumptions about the problem embedded in the model) are likely to have trainability and generalization issues, especially for large problem sizes. As such, it is fundamental to develop schemes that encode as much information as available about the problem at hand. In this work we present a simple, yet powerful, framework where the underlying invariances in the data are used to build QML models that, by construction, respect those symmetries. These so-called group-invariant models produce outputs that remain invariant under the action of any element of the symmetry group $\mathfrak{G}$ associated to the dataset. We present theoretical results underpinning the design of $\mathfrak{G}$-invariant models, and exemplify their application through several paradigmatic QML classification tasks including cases when $\mathfrak{G}$ is a continuous Lie group and also when it is a discrete symmetry group. Notably, our framework allows us to recover, in an elegant way, several well known algorithms for the literature, as well as to discover new ones. Taken together, we expect that our results will help pave the way towards a more geometric and group-theoretic approach to QML model design.
    A Derivation of Feedforward Neural Network Gradients Using Fr\'echet Calculus. (arXiv:2209.13234v1 [cs.LG])
    We present a derivation of the gradients of feedforward neural networks using Fr\'echet calculus which is arguably more compact than the ones usually presented in the literature. We first derive the gradients for ordinary neural networks working on vectorial data and show how these derived formulas can be used to derive a simple and efficient algorithm for calculating a neural networks gradients. Subsequently we show how our analysis generalizes to more general neural network architectures including, but not limited to, convolutional networks.
    Reinforcement Learning for Cognitive Delay/Disruption Tolerant Network Node Management in an LEO-based Satellite Constellation. (arXiv:2209.13237v1 [cs.AI])
    In recent years, with the large-scale deployment of space spacecraft entities and the increase of satellite onboard capabilities, delay/disruption tolerant network (DTN) emerged as a more robust communication protocol than TCP/IP in the case of excessive network dynamics. DTN node buffer management is still an active area of research, as the current implementation of the DTN core protocol still relies on the assumption that there is always enough memory available in different network nodes to store and forward bundles. In addition, the classical queuing theory does not apply to the dynamic management of DTN node buffers. Therefore, this paper proposes a centralized approach to automatically manage cognitive DTN nodes in low earth orbit (LEO) satellite constellation scenarios based on the advanced reinforcement learning (RL) strategy advantage actor-critic (A2C). The method aims to explore training a geosynchronous earth orbit intelligent agent to manage all DTN nodes in an LEO satellite constellation scenario. The goal of the A2C agent is to maximize delivery success rate and minimize network resource consumption cost while considering node memory utilization. The intelligent agent can dynamically adjust the radio data rate and perform drop operations based on bundle priority. In order to measure the effectiveness of applying A2C technology to DTN node management issues in LEO satellite constellation scenarios, this paper compares the trained intelligent agent strategy with the other two non-RL policies, including random and standard policies. Experiments show that the A2C strategy balances delivery success rate and cost, and provides the highest reward and the lowest node memory utilization.
    Theoretical Exploration of Solutions of Feedforward ReLU Networks. (arXiv:2202.01919v7 [cs.LG] UPDATED)
    This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions for piecewise linear functions, through the deduction from basic rules. The constructed solution should be universal enough to explain some network architectures of engineering; in order for that, several ways are provided to enhance the solution universality. Some of the consequences of our theories include: Under affine-geometry background, the solutions of both three-layer networks and deep-layer networks are given, particularly for those architectures applied in practice, such as multilayer feedforward neural networks and decoders; We give clear and intuitive interpretations of each component of network architectures; The parameter-sharing mechanism for multi-outputs is investigated; We provide an explanation of overparameterization solutions in terms of affine transforms; Under our framework, an advantage of deep layers compared to shallower ones is natural to be obtained. Some intermediate results are the basic knowledge for the modeling or understanding of neural networks, such as the classification of data embedded in a higher-dimensional space, the generalization of affine transforms, the probabilistic model of matrix ranks, and the concepts of distinguishable data sets as well as interference among hyperplanes.
    Accelerating hypersonic reentry simulations using deep learning-based hybridization (with guarantees). (arXiv:2209.13434v1 [stat.ML])
    In this paper, we are interested in the acceleration of numerical simulations. We focus on a hypersonic planetary reentry problem whose simulation involves coupling fluid dynamics and chemical reactions. Simulating chemical reactions takes most of the computational time but, on the other hand, cannot be avoided to obtain accurate predictions. We face a trade-off between cost-efficiency and accuracy: the simulation code has to be sufficiently efficient to be used in an operational context but accurate enough to predict the phenomenon faithfully. To tackle this trade-off, we design a hybrid simulation code coupling a traditional fluid dynamic solver with a neural network approximating the chemical reactions. We rely on their power in terms of accuracy and dimension reduction when applied in a big data context and on their efficiency stemming from their matrix-vector structure to achieve important acceleration factors ($\times 10$ to $\times 18.6$). This paper aims to explain how we design such cost-effective hybrid simulation codes in practice. Above all, we describe methodologies to ensure accuracy guarantees, allowing us to go beyond traditional surrogate modeling and to use these codes as references.
    Lossy compression of matrices by black-box optimisation of mixed integer nonlinear programming. (arXiv:2204.10579v2 [cs.LG] UPDATED)
    In edge computing, suppressing data size is a challenge for machine learning models that perform complex tasks such as autonomous driving, in which computational resources (speed, memory size and power) are limited. Efficient lossy compression of matrix data has been introduced by decomposing it into the product of an integer and real matrices. However, its optimisation is difficult as it requires simultaneous optimisation of an integer and real variables. In this paper, we improve this optimisation by utilising recently developed black-box optimisation (BBO) algorithms with an Ising solver for integer variables. In addition, the algorithm can be used to solve mixed-integer programming problems that are linear and non-linear in terms of real and integer variables, respectively. The differences between the choice of Ising solvers (simulated annealing, quantum annealing and simulated quenching) and the strategies of the BBO algorithms (BOCS, FMQA and their variations) are discussed for further development of the BBO techniques.
    Continuous approximation by convolutional neural networks with a sigmoidal function. (arXiv:2209.13332v1 [cs.LG])
    In this paper we present a class of convolutional neural networks (CNNs) called non-overlapping CNNs in the study of approximation capabilities of CNNs. We prove that such networks with sigmoidal activation function are capable of approximating arbitrary continuous function defined on compact input sets with any desired degree of accuracy. This result extends existing results where only multilayer feedforward networks are a class of approximators. Evaluations elucidate the accuracy and efficiency of our result and indicate that the proposed non-overlapping CNNs are less sensitive to noise.
    Survey Descent: A Multipoint Generalization of Gradient Descent for Nonsmooth Optimization. (arXiv:2111.15645v5 [math.OC] UPDATED)
    For strongly convex objectives that are smooth, the classical theory of gradient descent ensures linear convergence relative to the number of gradient evaluations. An analogous nonsmooth theory is challenging. Even when the objective is smooth at every iterate, the corresponding local models are unstable and the number of cutting planes invoked by traditional remedies is difficult to bound, leading to convergences guarantees that are sublinear relative to the cumulative number of gradient evaluations. We instead propose a multipoint generalization of the gradient descent iteration for local optimization. While designed with general objectives in mind, we are motivated by a ``max-of-smooth'' model that captures the subdifferential dimension at optimality. We prove linear convergence when the objective is itself max-of-smooth, and experiments suggest a more general phenomenon.
    Statistical Analysis of Time-Frequency Features Based On Multivariate Synchrosqueezing Transform for Hand Gesture Classification. (arXiv:2209.13350v1 [cs.CV])
    In this study, the four joint time-frequency (TF) moments; mean, variance, skewness, and kurtosis of TF matrix obtained from Multivariate Synchrosqueezing Transform (MSST) are proposed as features for hand gesture recognition. A publicly available dataset containing surface EMG (sEMG) signals of 40 subjects performing 10 hand gestures, was used. The distinguishing power of the feature variables for the tested gestures was evaluated according to their p values obtained from the Kruskal-Wallis (KW) test. It is concluded that the mean, variance, skewness, and kurtosis of TF matrices can be candidate feature sets for the recognition of hand gestures.
    UniCLIP: Unified Framework for Contrastive Language-Image Pre-training. (arXiv:2209.13430v1 [cs.CV])
    Pre-training vision-language models with contrastive objectives has shown promising results that are both scalable to large uncurated datasets and transferable to many downstream applications. Some following works have targeted to improve data efficiency by adding self-supervision terms, but inter-domain (image-text) contrastive loss and intra-domain (image-image) contrastive loss are defined on individual spaces in those works, so many feasible combinations of supervision are overlooked. To overcome this issue, we propose UniCLIP, a Unified framework for Contrastive Language-Image Pre-training. UniCLIP integrates the contrastive loss of both inter-domain pairs and intra-domain pairs into a single universal space. The discrepancies that occur when integrating contrastive loss between different domains are resolved by the three key components of UniCLIP: (1) augmentation-aware feature embedding, (2) MP-NCE loss, and (3) domain dependent similarity measure. UniCLIP outperforms previous vision-language pre-training methods on various single- and multi-modality downstream tasks. In our experiments, we show that each component that comprises UniCLIP contributes well to the final performance.
    Black-box Error Diagnosis in Deep Neural Networks for Computer Vision: a Survey of Tools. (arXiv:2201.06444v3 [cs.LG] UPDATED)
    The application of Deep Neural Networks (DNNs) to a broad variety of tasks demands methods for coping with the complex and opaque nature of these architectures. When a gold standard is available, performance assessment treats the DNN as a black box and computes standard metrics based on the comparison of the predictions with the ground truth. A deeper understanding of performances requires going beyond such evaluation metrics to diagnose the model behavior and the prediction errors. This goal can be pursued in two complementary ways. On one side, model interpretation techniques "open the box" and assess the relationship between the input, the inner layers and the output, so as to identify the architecture modules most likely to cause the performance loss. On the other hand, black-box error diagnosis techniques study the correlation between the model response and some properties of the input not used for training, so as to identify the features of the inputs that make the model fail. Both approaches give hints on how to improve the architecture and/or the training process. This paper focuses on the application of DNNs to Computer Vision (CV) tasks and presents a survey of the tools that support the black-box performance diagnosis paradigm. It illustrates the features and gaps of the current proposals, discusses the relevant research directions and provides a brief overview of the diagnosis tools in sectors other than CV.
    Seamless lightning nowcasting with recurrent-convolutional deep learning. (arXiv:2203.10114v3 [physics.ao-ph] UPDATED)
    A deep learning model is presented to nowcast the occurrence of lightning at a five-minute time resolution 60 minutes into the future. The model is based on a recurrent-convolutional architecture that allows it to recognize and predict the spatiotemporal development of convection, including the motion, growth and decay of thunderstorm cells. The predictions are performed on a stationary grid, without the use of storm object detection and tracking. The input data, collected from an area in and surrounding Switzerland, comprise ground-based radar data, visible/infrared satellite data and derived cloud products, lightning detection, numerical weather prediction and digital elevation model data. We analyze different alternative loss functions, class weighting strategies and model features, providing guidelines for future studies to select loss functions optimally and to properly calibrate the probabilistic predictions of their model. Based on these analyses, we use focal loss in this study, but conclude that it only provides a small benefit over cross entropy, which is a viable option if recalibration of the model is not practical. The model achieves a pixel-wise critical success index (CSI) of 0.45 to predict lightning occurrence within 8 km over the 60-min nowcast period, ranging from a CSI of 0.75 at a 5-min lead time to a CSI of 0.32 at a 60-min lead time.
    Graph clustering with Boltzmann machines. (arXiv:2203.02471v3 [cs.LG] UPDATED)
    Graph clustering is the process of grouping vertices into densely connected sets called clusters. We tailor two mathematical programming formulations from the literature, to this problem. In doing so, we obtain a heuristic approximation to the intra-cluster density maximization problem. We use two variations of a Boltzmann machine heuristic to obtain numerical solutions. For benchmarking purposes, we compare solution quality and computational performances to those obtained using a commercial solver, Gurobi. We also compare clustering quality to the clusters obtained using the popular Louvain modularity maximization method. Our initial results clearly demonstrate the superiority of our problem formulations. They also establish the superiority of the Boltzmann machine over the traditional exact solver. In the case of smaller less complex graphs, Boltzmann machines provide the same solutions as Gurobi, but with solution times that are orders of magnitude lower. In the case of larger and more complex graphs, Gurobi fails to return meaningful results within a reasonable time frame. Finally, we also note that both our clustering formulations, the distance minimization and $K$-medoids, yield clusters of superior quality to those obtained with the Louvain algorithm.
    STCGAT: A Spatio-temporal Causal Graph Attention Network for traffic flow prediction. (arXiv:2203.10749v2 [cs.LG] UPDATED)
    Traffic flow prediction as an essential part of the intelligent transportation system has received critical attention from researchers. However, the complex spatial and temporal dependencies between traffic roads make traffic flow prediction challenging. Existing methods are usually based on graph neural networks using predefined spatial adjacency graphs of traffic networks to model spatial dependencies, ignoring the dynamic correlation of relationships between road nodes. In addition, they usually use independent Spatio-temporal components to capture Spatio-temporal dependencies and do not effectively model global Spatio-temporal dependencies. This paper proposes a new Spatio-temporal Causal Graph Attention Network (STCGAT) for traffic prediction to address the above challenges. In STCGAT, we use a node embedding approach that can adaptively generate spatial adjacency subgraphs at each time step without a priori geographic knowledge and fine-grained modeling of the topology of dynamically generated graphs for different time steps. Meanwhile, we propose an efficient causal temporal correlation component that contains node adaptive learning, graph convolution, and local and global causal temporal convolution modules to learn local and global Spatio-temporal dependencies jointly. Extensive experiments on four real, large traffic datasets show that our model consistently outperforms all baseline models.
    Differentiable Invariant Causal Discovery. (arXiv:2205.15638v3 [cs.LG] UPDATED)
    Learning causal structure from observational data is a fundamental challenge in machine learning. However, the majority of commonly used differentiable causal discovery methods are non-identifiable, turning this problem into a continuous optimization task prone to data biases. In many real-life situations, data is collected from different environments, in which the functional relations remain consistent across environments, while the distribution of additive noises may vary. This paper proposes Differentiable Invariant Causal Discovery (DICD), utilizing the multi-environment information based on a differentiable framework to avoid learning spurious edges and wrong causal directions. Specifically, DICD aims to discover the environment-invariant causation while removing the environment-dependent correlation. We further formulate the constraint that enforces the target structure equation model to maintain optimal across the environments. Theoretical guarantees for the identifiability of proposed DICD are provided under mild conditions with enough environments. Extensive experiments on synthetic and real-world datasets verify that DICD outperforms state-of-the-art causal discovery methods up to 36% in SHD. Our code will be open-sourced.
    Optimization of Annealed Importance Sampling Hyperparameters. (arXiv:2209.13226v1 [stat.ML])
    Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. Although AIS is guaranteed to provide unbiased estimate for any set of hyperparameters, the common implementations rely on simple heuristics such as the geometric average bridging distributions between initial and the target distribution which affect the estimation performance when the computation budget is limited. Optimization of fully parametric AIS remains challenging due to the use of Metropolis-Hasting (MH) correction steps in Markov transitions. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. A reparameterization method that allows us to optimize the distribution sequence and the parameters of Markov transitions is used which is applicable to a large class of Markov Kernels with MH correction. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
    Graph Neural Network Expressivity and Meta-Learning for Molecular Property Regression. (arXiv:2209.13410v1 [cs.LG])
    We demonstrate the applicability of model-agnostic algorithms for meta-learning, specifically Reptile, to GNN models in molecular regression tasks. Using meta-learning we are able to learn new chemical prediction tasks with only a few model updates, as compared to using randomly initialized GNNs which require learning each regression task from scratch. We experimentally show that GNN layer expressivity is correlated to improved meta-learning. Additionally, we also experiment with GNN emsembles which yield best performance and rapid convergence for k-shot learning.
    Spiking GATs: Learning Graph Attentions via Spiking Neural Network. (arXiv:2209.13539v1 [cs.NE])
    Graph Attention Networks (GATs) have been intensively studied and widely used in graph data learning tasks. Existing GATs generally adopt the self-attention mechanism to conduct graph edge attention learning, requiring expensive computation. It is known that Spiking Neural Networks (SNNs) can perform inexpensive computation by transmitting the input signal data into discrete spike trains and can also return sparse outputs. Inspired by the merits of SNNs, in this work, we propose a novel Graph Spiking Attention Network (GSAT) for graph data representation and learning. In contrast to self-attention mechanism in existing GATs, the proposed GSAT adopts a SNN module architecture which is obvious energy-efficient. Moreover, GSAT can return sparse attention coefficients in natural and thus can perform feature aggregation on the selective neighbors which makes GSAT perform robustly w.r.t graph edge noises. Experimental results on several datasets demonstrate the effectiveness, energy efficiency and robustness of the proposed GSAT model.
    Transmit Power Control for Indoor Small Cells: A Method Based on Federated Reinforcement Learning. (arXiv:2209.13536v1 [cs.NI])
    Setting the transmit power setting of 5G cells has been a long-term topic of discussion, as optimized power settings can help reduce interference and improve the quality of service to users. Recently, machine learning (ML)-based, especially reinforcement learning (RL)-based control methods have received much attention. However, there is little discussion about the generalisation ability of the trained RL models. This paper points out that an RL agent trained in a specific indoor environment is room-dependent, and cannot directly serve new heterogeneous environments. Therefore, in the context of Open Radio Access Network (O-RAN), this paper proposes a distributed cell power-control scheme based on Federated Reinforcement Learning (FRL). Models in different indoor environments are aggregated to the global model during the training process, and then the central server broadcasts the updated model back to each client. The model will also be used as the base model for adaptive training in the new environment. The simulation results show that the FRL model has similar performance to a single RL agent, and both are better than the random power allocation method and exhaustive search method. The results of the generalisation test show that using the FRL model as the base model improves the convergence speed of the model in the new environment.
    Exploring Low Rank Training of Deep Neural Networks. (arXiv:2209.13569v1 [cs.LG])
    Training deep neural networks in low rank, i.e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time. Prior work has focused on low rank approximations of pre-trained networks and training in low rank space with additional objectives, offering various ad hoc explanations for chosen practice. We analyse techniques that work well in practice, and through extensive ablations on models such as GPT2 we provide evidence falsifying common beliefs in the field, hinting in the process at exciting research opportunities that still need answering.
    Neural Network Panning: Screening the Optimal Sparse Network Before Training. (arXiv:2209.13378v1 [cs.LG])
    Pruning on neural networks before training not only compresses the original models, but also accelerates the network training phase, which has substantial application value. The current work focuses on fine-grained pruning, which uses metrics to calculate weight scores for weight screening, and extends from the initial single-order pruning to iterative pruning. Through these works, we argue that network pruning can be summarized as an expressive force transfer process of weights, where the reserved weights will take on the expressive force from the removed ones for the purpose of maintaining the performance of original networks. In order to achieve optimal expressive force scheduling, we propose a pruning scheme before training called Neural Network Panning which guides expressive force transfer through multi-index and multi-process steps, and designs a kind of panning agent based on reinforcement learning to automate processes. Experimental results show that Panning performs better than various available pruning before training methods.
    Deep Cross-Modality and Resolution Graph Integration for Universal Brain Connectivity Mapping and Augmentation. (arXiv:2209.13529v1 [q-bio.NC])
    The connectional brain template (CBT) captures the shared traits across all individuals of a given population of brain connectomes, thereby acting as a fingerprint. Estimating a CBT from a population where brain graphs are derived from diverse neuroimaging modalities (e.g., functional and structural) and at different resolutions (i.e., number of nodes) remains a formidable challenge to solve. Such network integration task allows for learning a rich and universal representation of the brain connectivity across varying modalities and resolutions. The resulting CBT can be substantially used to generate entirely new multimodal brain connectomes, which can boost the learning of the downs-stream tasks such as brain state classification. Here, we propose the Multimodal Multiresolution Brain Graph Integrator Network (i.e., M2GraphIntegrator), the first multimodal multiresolution graph integration framework that maps a given connectomic population into a well centered CBT. M2GraphIntegrator first unifies brain graph resolutions by utilizing resolution-specific graph autoencoders. Next, it integrates the resulting fixed-size brain graphs into a universal CBT lying at the center of its population. To preserve the population diversity, we further design a novel clustering-based training sample selection strategy which leverages the most heterogeneous training samples. To ensure the biological soundness of the learned CBT, we propose a topological loss that minimizes the topological gap between the ground-truth brain graphs and the learned CBT. Our experiments show that from a single CBT, one can generate realistic connectomic datasets including brain graphs of varying resolutions and modalities. We further demonstrate that our framework significantly outperforms benchmarks in reconstruction quality, augmentation task, centeredness and topological soundness.
    Efficient Non-Parametric Optimizer Search for Diverse Tasks. (arXiv:2209.13575v1 [cs.LG])
    Efficient and automated design of optimizers plays a crucial role in full-stack AutoML systems. However, prior methods in optimizer search are often limited by their scalability, generability, or sample efficiency. With the goal of democratizing research and application of optimizer search, we present the first efficient, scalable and generalizable framework that can directly search on the tasks of interest. We first observe that optimizer updates are fundamentally mathematical expressions applied to the gradient. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the space of optimizers into a super-tree, where each path encodes an optimizer. This way, optimizer search can be naturally formulated as a path-finding problem, allowing a variety of well-established tree traversal methods to be used as the search algorithm. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection that leverage the characteristics of optimizer update rules to further boost the sample efficiency. We provide a diverse set of tasks to benchmark our algorithm and demonstrate that, with only 128 evaluations, the proposed framework can discover optimizers that surpass both human-designed counterparts and prior optimizer search methods.
    Phy-Taylor: Physics-Model-Based Deep Neural Networks. (arXiv:2209.13511v1 [cs.LG])
    Purely data-driven deep neural networks (DNNs) applied to physical engineering systems can infer relations that violate physics laws, thus leading to unexpected consequences. To address this challenge, we propose a physics-model-based DNN framework, called Phy-Taylor, that accelerates learning compliant representations with physical knowledge. The Phy-Taylor framework makes two key contributions; it introduces a new architectural Physics-compatible neural network (PhN), and features a novel compliance mechanism, we call {\em Physics-guided Neural Network Editing\/}. The PhN aims to directly capture nonlinearities inspired by physical quantities, such as kinetic energy, potential energy, electrical power, and aerodynamic drag force. To do so, the PhN augments neural network layers with two key components: (i) monomials of Taylor series expansion of nonlinear functions capturing physical knowledge, and (ii) a suppressor for mitigating the influence of noise. The neural-network editing mechanism further modifies network links and activation functions consistently with physical knowledge. As an extension, we also propose a self-correcting Phy-Taylor framework that introduces two additional capabilities: (i) physics-model-based safety relationship learning, and (ii) automatic output correction when violations of safety occur. Through experiments, we show that (by expressing hard-to-learn nonlinearities directly and by constraining dependencies) Phy-Taylor features considerably fewer parameters, and a remarkably accelerated training process, while offering enhanced model robustness and accuracy.
    Retrieval Based Time Series Forecasting. (arXiv:2209.13525v1 [cs.AI])
    Time series data appears in a variety of applications such as smart transportation and environmental monitoring. One of the fundamental problems for time series analysis is time series forecasting. Despite the success of recent deep time series forecasting methods, they require sufficient observation of historical values to make accurate forecasting. In other words, the ratio of the output length (or forecasting horizon) to the sum of the input and output lengths should be low enough (e.g., 0.3). As the ratio increases (e.g., to 0.8), the uncertainty for the forecasting accuracy increases significantly. In this paper, we show both theoretically and empirically that the uncertainty could be effectively reduced by retrieving relevant time series as references. In the theoretical analysis, we first quantify the uncertainty and show its connections to the Mean Squared Error (MSE). Then we prove that models with references are easier to learn than models without references since the retrieved references could reduce the uncertainty. To empirically demonstrate the effectiveness of the retrieval based time series forecasting models, we introduce a simple yet effective two-stage method, called ReTime consisting of a relational retrieval and a content synthesis. We also show that ReTime can be easily adapted to the spatial-temporal time series and time series imputation settings. Finally, we evaluate ReTime on real-world datasets to demonstrate its effectiveness.
    SetGAN: Improving the stability and diversity of generative models through a permutation invariant architecture. (arXiv:1907.00109v3 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) have proven effective in modeling distributions of high-dimensional data. However, their training instability is a well-known hindrance to convergence, which results in practical challenges in their applications to novel data. Furthermore, even when convergence is reached, GANs can be affected by mode collapse, a phenomenon for which the generator learns to model only a small part of the target distribution, disregarding the vast majority of the data manifold or distribution. This paper addresses these challenges by introducing SetGAN, an adversarial architecture that processes sets of generated and real samples, and discriminates between the origins of these sets (i.e., training versus generated data) in a flexible, permutation invariant manner. We also propose a new metric to quantitatively evaluate GANs that does not require previous knowledge of the application, apart from the data itself. Using the new metric, in conjunction with the state-of-the-art evaluation methods, we show that the proposed architecture, when compared with GAN variants stemming from similar strategies, produces more accurate models of the input data in a way that is also less sensitive to hyperparameter settings.
    Leveraging Local Variation in Data: Sampling and Weighting Schemes for Supervised Deep Learning. (arXiv:2101.07561v3 [stat.ML] UPDATED)
    In the context of supervised learning of a function by a neural network, we claim and empirically verify that the neural network yields better results when the distribution of the data set focuses on regions where the function to learn is steep. We first traduce this assumption in a mathematically workable way using Taylor expansion and emphasize a new training distribution based on the derivatives of the function to learn. Then, theoretical derivations allow constructing a methodology that we call Variance Based Samples Weighting (VBSW). VBSW uses labels local variance to weight the training points. This methodology is general, scalable, cost-effective, and significantly increases the performances of a large class of neural networks for various classification and regression tasks on image, text, and multivariate data. We highlight its benefits with experiments involving neural networks from linear models to ResNet and Bert.
    Regularized Soft Actor-Critic for Behavior Transfer Learning. (arXiv:2209.13224v1 [cs.LG])
    Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior, but do not address the potential contradiction between the behavior style and the objective of a task. There is a general lack of efficient methods that allow an agent to partially imitate a demonstrated behavior to varying degrees, while completing the main objective of a task. In this paper we propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task under the Constrained Markov Decision Process framework (CMDP). The main task is defined as the maximum entropy objective used in Soft Actor-Critic (SAC) and the imitation task is defined as a constraint. We evaluate our method on continuous control tasks relevant to video games applications.
    Approximate Secular Equations for the Cubic Regularization Subproblem. (arXiv:2209.13268v1 [math.OC])
    The cubic regularization method (CR) is a popular algorithm for unconstrained non-convex optimization. At each iteration, CR solves a cubically regularized quadratic problem, called the cubic regularization subproblem (CRS). One way to solve the CRS relies on solving the secular equation, whose computational bottleneck lies in the computation of all eigenvalues of the Hessian matrix. In this paper, we propose and analyze a novel CRS solver based on an approximate secular equation, which requires only some of the Hessian eigenvalues and is therefore much more efficient. Two approximate secular equations (ASEs) are developed. For both ASEs, we first study the existence and uniqueness of their roots and then establish an upper bound on the gap between the root and that of the standard secular equation. Such an upper bound can in turn be used to bound the distance from the approximate CRS solution based ASEs to the true CRS solution, thus offering a theoretical guarantee for our CRS solver. A desirable feature of our CRS solver is that it requires only matrix-vector multiplication but not matrix inversion, which makes it particularly suitable for high-dimensional applications of unconstrained non-convex optimization, such as low-rank recovery and deep learning. Numerical experiments with synthetic and real data-sets are conducted to investigate the practical performance of the proposed CRS solver. Experimental results show that the proposed solver outperforms two state-of-the-art methods.
    FG-UAP: Feature-Gathering Universal Adversarial Perturbation. (arXiv:2209.13113v1 [cs.CV])
    Deep Neural Networks (DNNs) are susceptible to elaborately designed perturbations, whether such perturbations are dependent or independent of images. The latter one, called Universal Adversarial Perturbation (UAP), is very attractive for model robustness analysis, since its independence of input reveals the intrinsic characteristics of the model. Relatively, another interesting observation is Neural Collapse (NC), which means the feature variability may collapse during the terminal phase of training. Motivated by this, we propose to generate UAP by attacking the layer where NC phenomenon happens. Because of NC, the proposed attack could gather all the natural images' features to its surrounding, which is hence called Feature-Gathering UAP (FG-UAP). We evaluate the effectiveness our proposed algorithm on abundant experiments, including untargeted and targeted universal attacks, attacks under limited dataset, and transfer-based black-box attacks among different architectures including Vision Transformers, which are believed to be more robust. Furthermore, we investigate FG-UAP in the view of NC by analyzing the labels and extracted features of adversarial examples, finding that collapse phenomenon becomes stronger after the model is corrupted. The code will be released when the paper is accepted.
    Learning to Counter: Stochastic Feature-based Learning for Diverse Counterfactual Explanations. (arXiv:2209.13446v1 [cs.AI])
    Interpretable machine learning seeks to understand the reasoning process of complex black-box systems that are long notorious for lack of explainability. One growing interpreting approach is through counterfactual explanations, which go beyond why a system arrives at a certain decision to further provide suggestions on what a user can do to alter the outcome. A counterfactual example must be able to counter the original prediction from the black-box classifier, while also satisfying various constraints for practical applications. These constraints exist at trade-offs between one and another presenting radical challenges to existing works. To this end, we propose a stochastic learning-based framework that effectively balances the counterfactual trade-offs. The framework consists of a generation and a feature selection module with complementary roles: the former aims to model the distribution of valid counterfactuals whereas the latter serves to enforce additional constraints in a way that allows for differentiable training and amortized optimization. We demonstrate the effectiveness of our method in generating actionable and plausible counterfactuals that are more diverse than the existing methods and particularly in a more efficient manner than counterparts of the same capacity.
    A Novel Sequential Coreset Method for Gradient Descent Algorithms. (arXiv:2112.02504v2 [cs.LG] UPDATED)
    A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. {\em Coreset} is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the ''locality'' property of gradient descent algorithms, we propose a new framework, termed ''sequential coreset'', which effectively avoids these obstacles. Moreover, our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension. In practice, the experimental results suggest that our method can save a large amount of running time compared with the baseline algorithms.
    Deep Ensembles for Graphs with Higher-order Dependencies. (arXiv:2205.13988v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) continue to achieve state-of-the-art performance on many graph learning tasks, but rely on the assumption that a given graph is a sufficient approximation of the true neighborhood structure. When a system contains higher-order sequential dependencies, we show that the tendency of traditional graph representations to underfit each node's neighborhood causes existing GNNs to generalize poorly. To address this, we propose a novel Deep Graph Ensemble (DGE), which captures neighborhood variance by training an ensemble of GNNs on different neighborhood subspaces of the same node within a higher-order network structure. We show that DGE consistently outperforms existing GNNs on semisupervised and supervised tasks on six real-world data sets with known higher-order dependencies, even under a similar parameter budget. We demonstrate that learning diverse and accurate base classifiers is central to DGE's success, and discuss the implications of these findings for future work on ensembles of GNNs.
    FORLORN: A Framework for Comparing Offline Methods and Reinforcement Learning for Optimization of RAN Parameters. (arXiv:2209.13540v1 [cs.NI])
    The growing complexity and capacity demands for mobile networks necessitate innovative techniques for optimizing resource usage. Meanwhile, recent breakthroughs have brought Reinforcement Learning (RL) into the domain of continuous control of real-world systems. As a step towards RL-based network control, this paper introduces a new framework for benchmarking the performance of an RL agent in network environments simulated with ns-3. Within this framework, we demonstrate that an RL agent without domain-specific knowledge can learn how to efficiently adjust Radio Access Network (RAN) parameters to match offline optimization in static scenarios, while also adapting on the fly in dynamic scenarios, in order to improve the overall user experience. Our proposed framework may serve as a foundation for further work in developing workflows for designing RL-based RAN control algorithms.
    Hierarchical Interdisciplinary Topic Detection Model for Research Proposal Classification. (arXiv:2209.13519v1 [cs.IR])
    The peer merit review of research proposals has been the major mechanism for deciding grant awards. However, research proposals have become increasingly interdisciplinary. It has been a longstanding challenge to assign interdisciplinary proposals to appropriate reviewers, so proposals are fairly evaluated. One of the critical steps in reviewer assignment is to generate accurate interdisciplinary topic labels for proposal-reviewer matching. Existing systems mainly collect topic labels manually generated by principal investigators. However, such human-reported labels can be non-accurate, incomplete, labor intensive, and time costly. What role can AI play in developing a fair and precise proposal reviewer assignment system? In this study, we collaborate with the National Science Foundation of China to address the task of automated interdisciplinary topic path detection. For this purpose, we develop a deep Hierarchical Interdisciplinary Research Proposal Classification Network (HIRPCN). Specifically, we first propose a hierarchical transformer to extract the textual semantic information of proposals. We then design an interdisciplinary graph and leverage GNNs for learning representations of each discipline in order to extract interdisciplinary knowledge. After extracting the semantic and interdisciplinary knowledge, we design a level-wise prediction component to fuse the two types of knowledge representations and detect interdisciplinary topic paths for each proposal. We conduct extensive experiments and expert evaluations on three real-world datasets to demonstrate the effectiveness of our proposed model.
    Deep Unfolding of the DBFB Algorithm with Application to ROI CT Imaging with Limited Angular Density. (arXiv:2209.13264v1 [eess.IV])
    This paper addresses the problem of image reconstruction for region-of-interest (ROI) computed tomography (CT). While model-based iterative methods can be used for such a problem, their practicability is often limited due to tedious parameterization and slow convergence. In addition, inadequate solutions can be obtained when the retained priors do not perfectly fit the solution space. Deep learning methods offer an alternative approach that is fast, leverages information from large data sets, and thus can reach high reconstruction quality. However, these methods usually rely on black boxes not accounting for the physics of the imaging system, and their lack of interpretability is often deplored. At the crossroads of both methods, unfolded deep learning techniques have been recently proposed. They incorporate the physics of the model and iterative optimization algorithms into a neural network design, leading to superior performance in various applications. This paper introduces a novel, unfolded deep learning approach called U-RDBFB designed for ROI CT reconstruction from limited data. Few-view truncated data are efficiently handled thanks to a robust non-convex data fidelity function combined with sparsity-inducing regularization functions. Iterations of a block dual forward-backward (DBFB) algorithm, embedded in an iterative reweighted scheme, are then unrolled over a neural network architecture, allowing the learning of various parameters in a supervised manner. Our experiments show an improvement over various state-of-the-art methods, including model-based iterative schemes, deep learning architectures, and deep unfolding methods.
    Machine learning-accelerated chemistry modeling of protoplanetary disks. (arXiv:2209.13336v1 [astro-ph.EP])
    Aims. With the large amount of molecular emission data from (sub)millimeter observatories and incoming James Webb Space Telescope infrared spectroscopy, access to fast forward models of the chemical composition of protoplanetary disks is of paramount importance. Methods. We used a thermo-chemical modeling code to generate a diverse population of protoplanetary disk models. We trained a K-nearest neighbors (KNN) regressor to instantly predict the chemistry of other disk models. Results. We show that it is possible to accurately reproduce chemistry using just a small subset of physical conditions, thanks to correlations between the local physical conditions in adopted protoplanetary disk models. We discuss the uncertainties and limitations of this method. Conclusions. The proposed method can be used for Bayesian fitting of the line emission data to retrieve disk properties from observations. We present a pipeline for reproducing the same approach on other disk chemical model sets.
    Analysis of Reinforcement Learning for determining task replication in workflows. (arXiv:2209.13531v1 [cs.PF])
    Executing workflows on volunteer computing resources where individual tasks may be forced to relinquish their resource for the resource's primary use leads to unpredictability and often significantly increases execution time. Task replication is one approach that can ameliorate this challenge. This comes at the expense of a potentially significant increase in system load and energy consumption. We propose the use of Reinforcement Learning (RL) such that a system may `learn' the `best' number of replicas to run to increase the number of workflows which complete promptly whilst minimising the additional workload on the system when replicas are not beneficial. We show, through simulation, that we can save 34% of the energy consumption using RL compared to a fixed number of replicas with only a 4% decrease in workflows achieving a pre-defined overhead bound.
    Semi-supervised machine learning model for analysis of nanowire morphologies from transmission electron microscopy images. (arXiv:2203.13875v2 [cond-mat.mtrl-sci] UPDATED)
    In the field of materials science, microscopy is the first and often only accessible method for structural characterization. There is a growing interest in the development of machine learning methods that can automate the analysis and interpretation of microscopy images. Typically training of machine learning models requires large numbers of images with associated structural labels, however, manual labeling of images requires domain knowledge and is prone to human error and subjectivity. To overcome these limitations, we present a semi-supervised transfer learning approach that uses a small number of labeled microscopy images for training and performs as effectively as methods trained on significantly larger image datasets. Specifically, we train an image encoder with unlabeled images using self-supervised learning methods and use that encoder for transfer learning of different downstream image tasks (classification and segmentation) with a minimal number of labeled images for training. We test the transfer learning ability of two self-supervised learning methods: SimCLR and Barlow-Twins on transmission electron microscopy (TEM) images. We demonstrate in detail how this machine learning workflow applied to TEM images of protein nanowires enables automated classification of nanowire morphologies (e.g., single nanowires, nanowire bundles, phase separated) as well as segmentation tasks that can serve as groundwork for quantification of nanowire domain sizes and shape analysis. We also extend the application of the machine learning workflow to classification of nanoparticle morphologies and identification of different type of viruses from TEM images.  ( 3 min )
    Representation and Invariance in Reinforcement Learning. (arXiv:2112.07752v2 [cs.AI] UPDATED)
    If we changed the rules, would the wise become fools? Different groups formalize reinforcement learning (RL) in different ways. If an agent in one RL framework is to run within another RL framework's environments, the agent must first be converted, or mapped, into that other framework. Whether or not this is possible depends on the RL frameworks in question and on how intelligence is measured. In this paper, we lay foundations for studying relative-intelligence-preserving mappability between RL frameworks.  ( 2 min )
    Meta-RegGNN: Predicting Verbal and Full-Scale Intelligence Scores using Graph Neural Networks and Meta-Learning. (arXiv:2209.13530v1 [q-bio.NC])
    Decrypting intelligence from the human brain construct is vital in the detection of particular neurological disorders. Recently, functional brain connectomes have been used successfully to predict behavioral scores. However, state-of-the-art methods, on one hand, neglect the topological properties of the connectomes and, on the other hand, fail to solve the high inter-subject brain heterogeneity. To address these limitations, we propose a novel regression graph neural network through meta-learning namely Meta-RegGNN for predicting behavioral scores from brain connectomes. The parameters of our proposed regression GNN are explicitly trained so that a small number of gradient steps combined with a small training data amount produces a good generalization to unseen brain connectomes. Our results on verbal and full-scale intelligence quotient (IQ) prediction outperform existing methods in both neurotypical and autism spectrum disorder cohorts. Furthermore, we show that our proposed approach ensures generalizability, particularly for autistic subjects. Our Meta-RegGNN source code is available at https://github.com/basiralab/Meta-RegGNN.
    PARSE: Pairwise Alignment of Representations in Semi-Supervised EEG Learning for Emotion Recognition. (arXiv:2202.05400v2 [cs.LG] UPDATED)
    We propose PARSE, a novel semi-supervised architecture for learning strong EEG representations for emotion recognition. To reduce the potential distribution mismatch between the large amounts of unlabeled data and the limited amount of labeled data, PARSE uses pairwise representation alignment. First, our model performs data augmentation followed by label guessing for large amounts of original and augmented unlabeled data. This is then followed by sharpening of the guessed labels and convex combinations of the unlabeled and labeled data. Finally, representation alignment and emotion classification are performed. To rigorously test our model, we compare PARSE to several state-of-the-art semi-supervised approaches which we implement and adapt for EEG learning. We perform these experiments on four public EEG-based emotion recognition datasets, SEED, SEED-IV, SEED-V and AMIGOS (valence and arousal). The experiments show that our proposed framework achieves the overall best results with varying amounts of limited labeled samples in SEED, SEED-IV and AMIGOS (valence), while approaching the overall best result (reaching the second-best) in SEED-V and AMIGOS (arousal). The analysis shows that our pairwise representation alignment considerably improves the performance by reducing the distribution alignment between unlabeled and labeled data, especially when only 1 sample per class is labeled.  ( 3 min )
    Learning with Subset Stacking. (arXiv:2112.06251v2 [cs.LG] UPDATED)
    We propose a new regression algorithm that learns from a set of input-output pairs. Our algorithm is designed for populations where the relation between the input variables and the output variable exhibits a heterogeneous behavior across the predictor space. The algorithm starts with generating subsets that are concentrated around random points in the input space. This is followed by training a local predictor for each subset. Those predictors are then combined in a novel way to yield an overall predictor. We call this algorithm ``LEarning with Subset Stacking'' or LESS, due to its resemblance to the method of stacking regressors. We compare the testing performance of LESS with state-of-the-art methods on several datasets. Our comparison shows that LESS is a competitive supervised learning method. Moreover, we observe that LESS is also efficient in terms of computation time and it allows a straightforward parallel implementation.  ( 2 min )
    Global Convergence and Stability of Stochastic Gradient Descent. (arXiv:2110.01663v2 [cs.LG] UPDATED)
    In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we make substantial progress on this shortcoming. First, we establish that SGD's iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability.
    Hyperspherical Variational Auto-Encoders. (arXiv:1804.00891v3 [stat.ML] UPDATED)
    The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types. Code at this http URL and https://github.com/nicola-decao/s-vae-pytorch
    Toward Safe and Accelerated Deep Reinforcement Learning for Next-Generation Wireless Networks. (arXiv:2209.13532v1 [cs.NI])
    Deep reinforcement learning (DRL) algorithms have recently gained wide attention in the wireless networks domain. They are considered promising approaches for solving dynamic radio resource management (RRM) problems in next-generation networks. Given their capabilities to build an approximate and continuously updated model of the wireless network environments, DRL algorithms can deal with the multifaceted complexity of such environments. Nevertheless, several challenges hinder the practical adoption of DRL in commercial networks. In this article, we first discuss two key practical challenges that are faced but rarely tackled when developing DRL-based RRM solutions. We argue that it is inevitable to address these DRL-related challenges for DRL to find its way to RRM commercial solutions. In particular, we discuss the need to have safe and accelerated DRL-based RRM solutions that mitigate the slow convergence and performance instability exhibited by DRL algorithms. We then review and categorize the main approaches used in the RRM domain to develop safe and accelerated DRL-based solutions. Finally, a case study is conducted to demonstrate the importance of having safe and accelerated DRL-based RRM solutions. We employ multiple variants of transfer learning (TL) techniques to accelerate the convergence of intelligent radio access network (RAN) slicing DRL-based controllers. We also propose a hybrid TL-based approach and sigmoid function-based rewards as examples of safe exploration in DRL-based RAN slicing.  ( 3 min )
    Taking a Respite from Representation Learning for Molecular Property Prediction. (arXiv:2209.13492v1 [q-bio.QM])
    Artificial intelligence (AI) has been widely applied in drug discovery with a major task as molecular property prediction. Despite the boom of AI techniques in molecular representation learning, some key aspects underlying molecular property prediction haven't been carefully examined yet. In this study, we conducted a systematic comparison on three representative models, random forest, MolBERT and GROVER, which utilize three major molecular representations, extended-connectivity fingerprints, SMILES strings and molecular graphs, respectively. Notably, MolBERT and GROVER, are pretrained on large-scale unlabelled molecule corpuses in a self-supervised manner. In addition to the commonly used MoleculeNet benchmark datasets, we also assembled a suite of opioids-related datasets for downstream prediction evaluation. We first conducted dataset profiling on label distribution and structural analyses; we also examined the activity cliffs issue in the opioids-related datasets. Then, we trained 4,320 predictive models and evaluated the usefulness of the learned representations. Furthermore, we explored into the model evaluation by studying the effect of statistical tests, evaluation metrics and task settings. Finally, we dissected the chemical space generalization into inter-scaffold and intra-scaffold generalization and measured prediction performance to evaluate model generalizbility under both settings. By taking this respite, we reflected on the key aspects underlying molecular property prediction, the awareness of which can, hopefully, bring better AI techniques in this field.
    Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models. (arXiv:2209.13523v1 [cs.LG])
    Targeted adversarial attacks against Automatic Speech Recognition (ASR) are thought to require white-box access to the targeted model to be effective, which mitigates the threat that they pose. We show that the recent line of Transformer ASR models pretrained with Self-Supervised Learning (SSL) are much more at risk: adversarial examples generated against them are transferable, making these models vulnerable to targeted, zero-knowledge attacks. We release an adversarial dataset that partially fools most publicly released SSL-pretrained ASR models (Wav2Vec2, HuBERT, WavLM, etc). With low-level additive noise achieving a 30dB Signal-Noise Ratio, we can force these models to predict our target sentences with up to 80% accuracy, instead of their original transcription. With an ablation study, we show that Self-Supervised pretraining is the main cause of that vulnerability. We also propose an explanation for that curious phenomenon, which increases the threat posed by adversarial attacks on state-of-the-art ASR models.
    Molecular Design Based on Integer Programming and Quadratic Descriptors in a Two-layered Model. (arXiv:2209.13527v1 [q-bio.BM])
    A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property, where design of novel drugs is an important topic in bioinformatics and chemo-informatics. The framework infers a desired chemical graph by solving a mixed integer linear program (MILP) that simulates the computation process of a feature function defined by a two-layered model on chemical graphs and a prediction function constructed by a machine learning method. A set of graph theoretical descriptors in the feature function plays a key role to derive a compact formulation of such an MILP. To improve the learning performance of prediction functions in the framework maintaining the compactness of the MILP, this paper utilizes the product of two of those descriptors as a new descriptor and then designs a method of reducing the number of descriptors. The results of our computational experiments suggest that the proposed method improved the learning performance for many chemical properties and can infer a chemical structure with up to 50 non-hydrogen atoms.
    Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels. (arXiv:2209.13476v1 [eess.IV])
    Recent studies on contrastive learning have achieved remarkable performance solely by leveraging few labels in the context of medical image segmentation. Existing methods mainly focus on instance discrimination and invariant mapping. However, they face three common pitfalls: (1) tailness: medical image data usually follows an implicit long-tail class distribution. Blindly leveraging all pixels in training hence can lead to the data imbalance issues, and cause deteriorated performance; (2) consistency: it remains unclear whether a segmentation model has learned meaningful and yet consistent anatomical features due to the intra-class variations between different anatomical features; and (3) diversity: the intra-slice correlations within the entire dataset have received significantly less attention. This motivates us to seek a principled approach for strategically making use of the dataset itself to discover similar yet distinct samples from different anatomical views. In this paper, we introduce a novel semi-supervised medical image segmentation framework termed Mine yOur owN Anatomy (MONA), and make three contributions. First, prior work argues that every pixel equally matters to the model training; we observe empirically that this alone is unlikely to define meaningful anatomical features, mainly due to lacking the supervision signal. We show two simple solutions towards learning invariances - through the use of stronger data augmentations and nearest neighbors. Second, we construct a set of objectives that encourage the model to be capable of decomposing medical images into a collection of anatomical features in an unsupervised manner. Lastly, our extensive results on three benchmark datasets with different labeled settings validate the effectiveness of our proposed MONA which achieves new state-of-the-art under different labeled settings.
    Reservoir Computing Approach for Gray Images Segmentation. (arXiv:2107.11077v2 [cs.CV] UPDATED)
    The paper proposes a novel approach for gray scale images segmentation. It is based on multiple features extraction from single feature per image pixel, namely its intensity value, using Echo state network. The newly extracted features -- reservoir equilibrium states -- reveal hidden image characteristics that improve its segmentation via a clustering algorithm. Moreover, it was demonstrated that the intrinsic plasticity tuning of reservoir fits its equilibrium states to the original image intensity distribution thus allowing for its better segmentation. The proposed approach is tested on the benchmark image Lena.
    Adapting Brain-Like Neural Networks for Modeling Cortical Visual Prostheses. (arXiv:2209.13561v1 [q-bio.NC])
    Cortical prostheses are devices implanted in the visual cortex that attempt to restore lost vision by electrically stimulating neurons. Currently, the vision provided by these devices is limited, and accurately predicting the visual percepts resulting from stimulation is an open challenge. We propose to address this challenge by utilizing 'brain-like' convolutional neural networks (CNNs), which have emerged as promising models of the visual system. To investigate the feasibility of adapting brain-like CNNs for modeling visual prostheses, we developed a proof-of-concept model to predict the perceptions resulting from electrical stimulation. We show that a neurologically-inspired decoding of CNN activations produces qualitatively accurate phosphenes, comparable to phosphenes reported by real patients. Overall, this is an essential first step towards building brain-like models of electrical stimulation, which may not just improve the quality of vision provided by cortical prostheses but could also further our understanding of the neural code of vision.
    Project and Forget: Solving Large-Scale Metric Constrained Problems. (arXiv:2005.03853v2 [cs.LG] UPDATED)
    Given a set of dissimilarity measurements amongst data points, determining what metric representation is most "consistent" with the input measurements or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. Existing methods are restricted to specific kinds of metrics or small problem sizes because of the large number of metric constraints in such problems. In this paper, we provide an active set algorithm, Project and Forget, that uses Bregman projections, to solve metric constrained problems with many (possibly exponentially) inequality constraints. We provide a theoretical analysis of \textsc{Project and Forget} and prove that our algorithm converges to the global optimal solution and that the $L_2$ distance of the current iterate to the optimal solution decays asymptotically at an exponential rate. We demonstrate that using our method we can solve large problem instances of three types of metric constrained problems: general weight correlation clustering, metric nearness, and metric learning; in each case, out-performing the state of the art methods with respect to CPU times and problem sizes.
    Hierarchical Sliced Wasserstein Distance. (arXiv:2209.13570v1 [stat.ML])
    Sliced Wasserstein (SW) distance has been widely used in different application scenarios since it can be scaled to a large number of supports without suffering from the curse of dimensionality. The value of sliced Wasserstein distance is the average of transportation cost between one-dimensional representations (projections) of original measures that are obtained by Radon Transform (RT). Despite its efficiency in the number of supports, estimating the sliced Wasserstein requires a relatively large number of projections in high-dimensional settings. Therefore, for applications where the number of supports is relatively small compared with the dimension, e.g., several deep learning applications where the mini-batch approaches are utilized, the complexities from matrix multiplication of Radon Transform become the main computational bottleneck. To address this issue, we propose to derive projections by linearly and randomly combining a smaller number of projections which are named bottleneck projections. We explain the usage of these projections by introducing Hierarchical Radon Transform (HRT) which is constructed by applying Radon Transform variants recursively. We then formulate the approach into a new metric between measures, named Hierarchical Sliced Wasserstein (HSW) distance. By proving the injectivity of HRT, we derive the metricity of HSW. Moreover, we investigate the theoretical properties of HSW including its connection to SW variants and its computational and sample complexities. Finally, we compare the computational cost and generative quality of HSW with the conventional SW on the task of deep generative modeling using various benchmark datasets including CIFAR10, CelebA, and Tiny ImageNet.
    Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans. (arXiv:2209.13020v1 [cs.CY])
    We are currently unable to specify human goals and societal values in a way that reliably directs AI behavior. Law is a computational engine that converts opaque human values into legible and enforceable directives. Law Informs Code is the research agenda attempting to capture that complex computational process of human law, and embed it in AI. Similar to how parties to a legal contract cannot foresee every potential contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed bills will be applied, we cannot ex ante specify rules that provably direct good AI behavior. Legal theory and practice have developed arrays of tools to address these specification problems. For instance, legal standards allow humans to develop shared understandings and adapt them to novel situations. In contrast to more prosaic uses of the law (e.g., as a deterrent of bad behavior through the threat of sanction), leveraged as an expression of how humans communicate their goals, and what society values, Law Informs Code. We describe how the data generated by legal processes and the theoretical constructs and practices of law (methods of law-making, statutory interpretation, contract drafting, applications of standards, legal reasoning, etc.) can facilitate the robust specification of inherently vague human goals for AI. This helps with human-AI alignment and the local usefulness of AI. Toward society-AI alignment, we present a framework for understanding law as the applied philosophy of multi-agent alignment. Although law is partly a reflection of historically contingent political power - and thus not a perfect aggregation of citizen preferences - if properly parsed, its distillation offers a legitimate computational comprehension of societal values.
    Measuring Overfitting in Convolutional Neural Networks using Adversarial Perturbations and Label Noise. (arXiv:2209.13382v1 [cs.LG])
    Although numerous methods to reduce the overfitting of convolutional neural networks (CNNs) exist, it is still not clear how to confidently measure the degree of overfitting. A metric reflecting the overfitting level might be, however, extremely helpful for the comparison of different architectures and for the evaluation of various techniques to tackle overfitting. Motivated by the fact that overfitted neural networks tend to rather memorize noise in the training data than generalize to unseen data, we examine how the training accuracy changes in the presence of increasing data perturbations and study the connection to overfitting. While previous work focused on label noise only, we examine a spectrum of techniques to inject noise into the training data, including adversarial perturbations and input corruptions. Based on this, we define two new metrics that can confidently distinguish between correct and overfitted models. For the evaluation, we derive a pool of models for which the overfitting behavior is known beforehand. To test the effect of various factors, we introduce several anti-overfitting measures in architectures based on VGG and ResNet and study their impact, including regularization techniques, training set size, and the number of parameters. Finally, we assess the applicability of the proposed metrics by measuring the overfitting degree of several CNN architectures outside of our model pool.
    Experimental validation of machine-learning based spectral-spatial power evolution shaping using Raman amplifiers. (arXiv:2209.13401v1 [cs.ET])
    We experimentally validate a real-time machine learning framework, capable of controlling the pump power values of Raman amplifiers to shape the signal power evolution in two-dimensions (2D): frequency and fiber distance. In our setup, power values of four first-order counter-propagating pumps are optimized to achieve the desired 2D power profile. The pump power optimization framework includes a convolutional neural network (CNN) followed by differential evolution (DE) technique, applied online to the amplifier setup to automatically achieve the target 2D power profiles. The results on achievable 2D profiles show that the framework is able to guarantee very low maximum absolute error (MAE) (<0.5 dB) between the obtained and the target 2D profiles. Moreover, the framework is tested in a multi-objective design scenario where the goal is to achieve the 2D profiles with flat gain levels at the end of the span, jointly with minimum spectral excursion over the entire fiber length. In this case, the experimental results assert that for 2D profiles with the target flat gain levels, the DE obtains less than 1 dB maximum gain deviation, when the setup is not physically limited in the pump power values. The simulation results also prove that with enough pump power available, better gain deviation (less than 0.6 dB) for higher target gain levels is achievable.  ( 3 min )
    EditEval: An Instruction-Based Benchmark for Text Improvements. (arXiv:2209.13331v1 [cs.CL])
    Evaluation of text generation to date has primarily focused on content created sequentially, rather than improvements on a piece of text. Writing, however, is naturally an iterative and incremental process that requires expertise in different modular skills such as fixing outdated information or making the style more consistent. Even so, comprehensive evaluation of a model's capacity to perform these skills and the ability to edit remains sparse. This work presents EditEval: An instruction-based, benchmark and evaluation suite that leverages high-quality existing and new datasets for automatic evaluation of editing capabilities such as making text more cohesive and paraphrasing. We evaluate several pre-trained models, which shows that InstructGPT and PEER perform the best, but that most baselines fall below the supervised SOTA, particularly when neutralizing and updating information. Our analysis also shows that commonly used metrics for editing tasks do not always correlate well, and that optimization for prompts with the highest performance does not necessarily entail the strongest robustness to different models. Through the release of this benchmark and a publicly available leaderboard challenge, we hope to unlock future research in developing models capable of iterative and more controllable editing.  ( 2 min )
    When Handcrafted Features and Deep Features Meet Mismatched Training and Test Sets for Deepfake Detection. (arXiv:2209.13289v1 [cs.CV])
    The accelerated growth in synthetic visual media generation and manipulation has now reached the point of raising significant concerns and posing enormous intimidations towards society. There is an imperative need for automatic detection networks towards false digital content and avoid the spread of dangerous artificial information to contend with this threat. In this paper, we utilize and compare two kinds of handcrafted features(SIFT and HoG) and two kinds of deep features(Xception and CNN+RNN) for the deepfake detection task. We also check the performance of these features when there are mismatches between training sets and test sets. Evaluation is performed on the famous FaceForensics++ dataset, which contains four sub-datasets, Deepfakes, Face2Face, FaceSwap and NeuralTextures. The best results are from Xception, where the accuracy could surpass over 99\% when the training and test set are both from the same sub-dataset. In comparison, the results drop dramatically when the training set mismatches the test set. This phenomenon reveals the challenge of creating a universal deepfake detection system.  ( 2 min )
    Scaling Laws For Deep Learning Based Image Reconstruction. (arXiv:2209.13435v1 [eess.IV])
    Deep neural networks trained end-to-end to map a measurement of a (noisy) image to a clean image perform excellent for a variety of linear inverse problems. Current methods are only trained on a few hundreds or thousands of images as opposed to the millions of examples deep networks are trained on in other domains. In this work, we study whether major performance gains are expected from scaling up the training set size. We consider image denoising, accelerated magnetic resonance imaging, and super-resolution and empirically determine the reconstruction quality as a function of training set size, while optimally scaling the network size. For all three tasks we find that an initially steep power-law scaling slows significantly already at moderate training set sizes. Interpolating those scaling laws suggests that even training on millions of images would not significantly improve performance. To understand the expected behavior, we analytically characterize the performance of a linear estimator learned with early stopped gradient descent. The result formalizes the intuition that once the error induced by learning the signal model is small relative to the error floor, more training examples do not improve performance.  ( 2 min )
    A Pathologist-Informed Workflow for Classification of Prostate Glands in Histopathology. (arXiv:2209.13408v1 [eess.IV])
    Pathologists diagnose and grade prostate cancer by examining tissue from needle biopsies on glass slides. The cancer's severity and risk of metastasis are determined by the Gleason grade, a score based on the organization and morphology of prostate cancer glands. For diagnostic work-up, pathologists first locate glands in the whole biopsy core, and -- if they detect cancer -- they assign a Gleason grade. This time-consuming process is subject to errors and significant inter-observer variability, despite strict diagnostic criteria. This paper proposes an automated workflow that follows pathologists' \textit{modus operandi}, isolating and classifying multi-scale patches of individual glands in whole slide images (WSI) of biopsy tissues using distinct steps: (1) two fully convolutional networks segment epithelium versus stroma and gland boundaries, respectively; (2) a classifier network separates benign from cancer glands at high magnification; and (3) an additional classifier predicts the grade of each cancer gland at low magnification. Altogether, this process provides a gland-specific approach for prostate cancer grading that we compare against other machine-learning-based grading methods.  ( 2 min )
    Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models. (arXiv:2209.13325v1 [cs.LG])
    Transformer architecture has become the fundamental element of the widespread natural language processing~(NLP) models. With the trends of large NLP models, the increasing memory and computation costs hinder their efficient deployment on resource-limited devices. Therefore, transformer quantization attracts wide research interest. Recent work recognizes that structured outliers are the critical bottleneck for quantization performance. However, their proposed methods increase the computation overhead and still leave the outliers there. To fundamentally address this problem, this paper delves into the inherent inducement and importance of the outliers. We discover that $\boldsymbol \gamma$ in LayerNorm (LN) acts as a sinful amplifier for the outliers, and the importance of outliers varies greatly where some outliers provided by a few tokens cover a large area but can be clipped sharply without negative impacts. Motivated by these findings, we propose an outlier suppression framework including two components: Gamma Migration and Token-Wise Clipping. The Gamma Migration migrates the outlier amplifier to subsequent modules in an equivalent transformation, contributing to a more quantization-friendly model without any extra burden. The Token-Wise Clipping takes advantage of the large variance of token range and designs a token-wise coarse-to-fine pipeline, obtaining a clipping range with minimal final quantization loss in an efficient way. This framework effectively suppresses the outliers and can be used in a plug-and-play mode. Extensive experiments prove that our framework surpasses the existing works and, for the first time, pushes the 6-bit post-training BERT quantization to the full-precision (FP) level. Our code is available at https://github.com/wimh966/outlier_suppression.  ( 3 min )
    Formal Conceptual Views in Neural Networks. (arXiv:2209.13517v1 [cs.LG])
    Explaining neural network models is a challenging task that remains unsolved in its entirety to this day. This is especially true for high dimensional and complex data. With the present work, we introduce two notions for conceptual views of a neural network, specifically a many-valued and a symbolic view. Both provide novel analysis methods to enable a human AI analyst to grasp deeper insights into the knowledge that is captured by the neurons of a network. We test the conceptual expressivity of our novel views through different experiments on the ImageNet and Fruit-360 data sets. Furthermore, we show to which extent the views allow to quantify the conceptual similarity of different learning architectures. Finally, we demonstrate how conceptual views can be applied for abductive learning of human comprehensible rules from neurons. In summary, with our work, we contribute to the most relevant task of globally explaining neural networks models.  ( 2 min )
    Magnitude and Angle Dynamics in Training Single ReLU Neurons. (arXiv:2209.13394v1 [cs.LG])
    To understand learning the dynamics of deep ReLU networks, we investigate the dynamic system of gradient flow $w(t)$ by decomposing it to magnitude $w(t)$ and angle $\phi(t):= \pi - \theta(t) $ components. In particular, for multi-layer single ReLU neurons with spherically symmetric data distribution and the square loss function, we provide upper and lower bounds for magnitude and angle components to describe the dynamics of gradient flow. Using the obtained bounds, we conclude that small scale initialization induces slow convergence speed for deep single ReLU neurons. Finally, by exploiting the relation of gradient flow and gradient descent, we extend our results to the gradient descent approach. All theoretical results are verified by experiments.  ( 2 min )
    Activation Learning by Local Competitions. (arXiv:2209.13400v1 [cs.NE])
    The backpropagation that drives the success of deep learning is most likely different from the learning mechanism of the brain. In this paper, we develop a biology-inspired learning rule that discovers features by local competitions among neurons, following the idea of Hebb's famous proposal. It is demonstrated that the unsupervised features learned by this local learning rule can serve as a pre-training model to improve the performance of some supervised learning tasks. More importantly, this local learning rule enables us to build a new learning paradigm very different from the backpropagation, named activation learning, where the output activation of the neural network roughly measures how probable the input patterns are. The activation learning is capable of learning plentiful local features from few shots of input patterns, and demonstrates significantly better performances than the backpropagation algorithm when the number of training samples is relatively small. This learning paradigm unifies unsupervised learning, supervised learning and generative models, and is also more secure against adversarial attack, paving a road to some possibilities of creating general-task neural networks.  ( 2 min )
    Evolution TANN and the discovery of the internal variables and evolution equations in solid mechanics. (arXiv:2209.13269v1 [cs.CE])
    Data-driven and deep learning approaches have demonstrated to have the potential of replacing classical constitutive models for complex materials, displaying path-dependency and possessing multiple inherent scales. Yet, the necessity of structuring constitutive models with an incremental formulation has given rise to data-driven approaches where physical quantities, e.g. deformation, blend with artificial, non-physical ones, such as the increments in deformation and time. Neural networks and the consequent constitutive models depend, thus, on the particular incremental formulation, fail in identifying material representations locally in time, and suffer from poor generalization. Here, we propose a new approach which allows, for the first time, to decouple the material representation from the incremental formulation. Inspired by the Thermodynamics-based Artificial Neural Networks (TANN) and the theory of the internal variables, the evolution TANN (eTANN) are continuous-time, thus independent of the aforementioned artificial quantities. Key feature of the proposed approach is the discovery of the evolution equations of the internal variables in the form of ordinary differential equations, rather than in an incremental discrete-time form. In this work, we focus attention to juxtapose and show how the various general notions of solid mechanics are implemented in eTANN. The laws of thermodynamics are hardwired in the structure of the network and allow predictions which are always consistent. We propose a methodology that allows to discover, from data and first principles, admissible sets of internal variables from the microscopic fields in complex materials. The capabilities as well as the scalability of the proposed approach are demonstrated through several applications involving a broad spectrum of complex material behaviors, from plasticity to damage and viscosity.  ( 3 min )
    Semi-Synchronous Personalized Federated Learning over Mobile Edge Networks. (arXiv:2209.13115v1 [cs.LG])
    Personalized Federated Learning (PFL) is a new Federated Learning (FL) approach to address the heterogeneity issue of the datasets generated by distributed user equipments (UEs). However, most existing PFL implementations rely on synchronous training to ensure good convergence performances, which may lead to a serious straggler problem, where the training time is heavily prolonged by the slowest UE. To address this issue, we propose a semi-synchronous PFL algorithm, termed as Semi-Synchronous Personalized FederatedAveraging (PerFedS$^2$), over mobile edge networks. By jointly optimizing the wireless bandwidth allocation and UE scheduling policy, it not only mitigates the straggler problem but also provides convergent training loss guarantees. We derive an upper bound of the convergence rate of PerFedS2 in terms of the number of participants per global round and the number of rounds. On this basis, the bandwidth allocation problem can be solved using analytical solutions and the UE scheduling policy can be obtained by a greedy algorithm. Experimental results verify the effectiveness of PerFedS2 in saving training time as well as guaranteeing the convergence of training loss, in contrast to synchronous and asynchronous PFL algorithms.  ( 2 min )
    Towards Real Time Thermal Simulations for Design Optimization using Graph Neural Networks. (arXiv:2209.13348v1 [cs.CE])
    This paper presents a method to simulate the thermal behavior of 3D systems using a graph neural network. The method discussed achieves a significant speed-up with respect to a traditional finite-element simulation. The graph neural network is trained on a diverse dataset of 3D CAD designs and the corresponding finite-element simulations, representative of the different geometries, material properties and losses that appear in the design of electronic systems. We present for the transient thermal behavior of a test system. The accuracy of the network result for one-step predictions is remarkable (\SI{0.003}{\%} error). After 400 time steps, the accumulated error reaches \SI{0.78}{\%}. The computing time of each time step is \SI{50}{ms}. Reducing the accumulated error is the current focus of our work. In the future, a tool such as the one we are presenting could provide nearly instantaneous approximations of the thermal behavior of a system that can be used for design optimization.  ( 2 min )
    Explainable Graph Pyramid Autoformer for Long-Term Traffic Forecasting. (arXiv:2209.13123v1 [cs.LG])
    Accurate traffic forecasting is vital to an intelligent transportation system. Although many deep learning models have achieved state-of-art performance for short-term traffic forecasting of up to 1 hour, long-term traffic forecasting that spans multiple hours remains a major challenge. Moreover, most of the existing deep learning traffic forecasting models are black box, presenting additional challenges related to explainability and interpretability. We develop Graph Pyramid Autoformer (X-GPA), an explainable attention-based spatial-temporal graph neural network that uses a novel pyramid autocorrelation attention mechanism. It enables learning from long temporal sequences on graphs and improves long-term traffic forecasting accuracy. Our model can achieve up to 35 % better long-term traffic forecast accuracy than that of several state-of-the-art methods. The attention-based scores from the X-GPA model provide spatial and temporal explanations based on the traffic dynamics, which change for normal vs. peak-hour traffic and weekday vs. weekend traffic.  ( 2 min )
    Safe reinforcement learning of dynamic high-dimensional robotic tasks: navigation, manipulation, interaction. (arXiv:2209.13308v1 [cs.RO])
    Safety is a crucial property of every robotic platform: any control policy should always comply with actuator limits and avoid collisions with the environment and humans. In reinforcement learning, safety is even more fundamental for exploring an environment without causing any damage. While there are many proposed solutions to the safe exploration problem, only a few of them can deal with the complexity of the real world. This paper introduces a new formulation of safe exploration for reinforcement learning of various robotic tasks. Our approach applies to a wide class of robotic platforms and enforces safety even under complex collision constraints learned from data by exploring the tangent space of the constraint manifold. Our proposed approach achieves state-of-the-art performance in simulated high-dimensional and dynamic tasks while avoiding collisions with the environment. We show safe real-world deployment of our learned controller on a TIAGo++ robot, achieving remarkable performance in manipulation and human-robot interaction tasks.  ( 2 min )
    Paused Agent Replay Refresh. (arXiv:2209.13398v1 [cs.LG])
    Reinforcement learning algorithms have become more complex since the invention of target networks. Unfortunately, target networks have not kept up with this increased complexity, instead requiring approximate solutions to be computationally feasible. These approximations increase noise in the Q-value targets and in the replay sampling distribution. Paused Agent Replay Refresh (PARR) is a drop-in replacement for target networks that supports more complex learning algorithms without this need for approximation. Using a basic Q-network architecture, and refreshing the novelty values, target values, and replay sampling distribution, PARR gets 2500 points in Montezuma's Revenge after only 30.9 million Atari frames. Finally, interpreting PARR in the context of carbon-based learning offers a new reason for sleep.  ( 2 min )
    EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations. (arXiv:2209.13064v1 [cs.CV])
    We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning. For data, code and leaderboards: this http URL  ( 2 min )
    Stacking Ensemble Learning in Deep Domain Adaptation for Ophthalmic Image Classification. (arXiv:2209.13420v1 [cs.CV])
    Domain adaptation is an attractive approach given the availability of a large amount of labeled data with similar properties but different domains. It is effective in image classification tasks where obtaining sufficient label data is challenging. We propose a novel method, named SELDA, for stacking ensemble learning via extending three domain adaptation methods for effectively solving real-world problems. The major assumption is that when base domain adaptation models are combined, we can obtain a more accurate and robust model by exploiting the ability of each of the base models. We extend Maximum Mean Discrepancy (MMD), Low-rank coding, and Correlation Alignment (CORAL) to compute the adaptation loss in three base models. Also, we utilize a two-fully connected layer network as a meta-model to stack the output predictions of these three well-performing domain adaptation models to obtain high accuracy in ophthalmic image classification tasks. The experimental results using Age-Related Eye Disease Study (AREDS) benchmark ophthalmic dataset demonstrate the effectiveness of the proposed model.  ( 2 min )
    Exploring the Algorithm-Dependent Generalization of AUPRC Optimization with List Stability. (arXiv:2209.13262v1 [cs.LG])
    Stochastic optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning. Although various algorithms have been extensively studied for AUPRC optimization, the generalization is only guaranteed in the multi-query case. In this work, we present the first trial in the single-query generalization of stochastic AUPRC optimization. For sharper generalization bounds, we focus on algorithm-dependent generalization. There are both algorithmic and theoretical obstacles to our destination. From an algorithmic perspective, we notice that the majority of existing stochastic estimators are biased only when the sampling strategy is biased, and is leave-one-out unstable due to the non-decomposability. To address these issues, we propose a sampling-rate-invariant unbiased stochastic estimator with superior stability. On top of this, the AUPRC optimization is formulated as a composition optimization problem, and a stochastic algorithm is proposed to solve this problem. From a theoretical perspective, standard techniques of the algorithm-dependent generalization analysis cannot be directly applied to such a listwise compositional optimization problem. To fill this gap, we extend the model stability from instancewise losses to listwise losses and bridge the corresponding generalization and stability. Additionally, we construct state transition matrices to describe the recurrence of the stability, and simplify calculations by matrix spectrum. Practically, experimental results on three image retrieval datasets on speak to the effectiveness and soundness of our framework.  ( 3 min )
    Im2Oil: Stroke-Based Oil Painting Rendering with Linearly Controllable Fineness Via Adaptive Sampling. (arXiv:2209.13219v1 [cs.CV])
    This paper proposes a novel stroke-based rendering (SBR) method that translates images into vivid oil paintings. Previous SBR techniques usually formulate the oil painting problem as pixel-wise approximation. Different from this technique route, we treat oil painting creation as an adaptive sampling problem. Firstly, we compute a probability density map based on the texture complexity of the input image. Then we use the Voronoi algorithm to sample a set of pixels as the stroke anchors. Next, we search and generate an individual oil stroke at each anchor. Finally, we place all the strokes on the canvas to obtain the oil painting. By adjusting the hyper-parameter maximum sampling probability, we can control the oil painting fineness in a linear manner. Comparison with existing state-of-the-art oil painting techniques shows that our results have higher fidelity and more realistic textures. A user opinion test demonstrates that people behave more preference toward our oil paintings than the results of other methods. More interesting results and the code are in https://github.com/TZYSJTU/Im2Oil.  ( 3 min )
    Deep learning and machine learning for Malaria detection: overview, challenges and future directions. (arXiv:2209.13292v1 [cs.LG])
    To have the greatest impact, public health initiatives must be made using evidence-based decision-making. Machine learning Algorithms are created to gather, store, process, and analyse data to provide knowledge and guide decisions. A crucial part of any surveillance system is image analysis. The communities of computer vision and machine learning has ended up curious about it as of late. This study uses a variety of machine learning and image processing approaches to detect and forecast the malarial illness. In our research, we discovered the potential of deep learning techniques as smart tools with broader applicability for malaria detection, which benefits physicians by assisting in the diagnosis of the condition. We examine the common confinements of deep learning for computer frameworks and organising, counting need of preparing data, preparing overhead, realtime execution, and explain ability, and uncover future inquire about bearings focusing on these restrictions.  ( 2 min )
    The use of deep learning in interventional radiotherapy (brachytherapy): a review with a focus on open source and open data. (arXiv:2205.07516v2 [physics.med-ph] UPDATED)
    Deep learning advanced to one of the most important technologies in almost all medical fields. Especially in areas, related to medical imaging it plays a big role. However, in interventional radiotherapy (brachytherapy) deep learning is still in an early phase. In this review, first, we investigated and scrutinised the role of deep learning in all processes of interventional radiotherapy and directly related fields. Additionally we summarised the most recent developments. To reproduce results of deep learning algorithms both source code and training data must be available. Therefore, a second focus of this work was on the analysis of the availability of open source, open data and open models. In our analysis, we were able to show that deep learning plays already a major role in some areas of interventional radiotherapy, but is still hardly presented in others. Nevertheless, its impact is increasing with the years, partly self-propelled but also influenced by closely related fields. Open source, data and models are growing in number but are still scarce and unevenly distributed among different research groups. The reluctance in publishing code, data and models limits reproducibility and restricts evaluation to mono-institutional datasets. Summarised, deep learning will change positively the workflow of interventional radiotherapy but there is room for improvement when it comes to reproducible results and standardised evaluation methods.  ( 3 min )
    Fast online ranking with fairness of exposure. (arXiv:2209.13019v1 [cs.IR])
    As recommender systems become increasingly central for sorting and prioritizing the content available online, they have a growing impact on the opportunities or revenue of their items producers. For instance, they influence which recruiter a resume is recommended to, or to whom and how much a music track, video or news article is being exposed. This calls for recommendation approaches that not only maximize (a proxy of) user satisfaction, but also consider some notion of fairness in the exposure of items or groups of items. Formally, such recommendations are usually obtained by maximizing a concave objective function in the space of randomized rankings. When the total exposure of an item is defined as the sum of its exposure over users, the optimal rankings of every users become coupled, which makes the optimization process challenging. Existing approaches to find these rankings either solve the global optimization problem in a batch setting, i.e., for all users at once, which makes them inapplicable at scale, or are based on heuristics that have weak theoretical guarantees. In this paper, we propose the first efficient online algorithm to optimize concave objective functions in the space of rankings which applies to every concave and smooth objective function, such as the ones found for fairness of exposure. Based on online variants of the Frank-Wolfe algorithm, we show that our algorithm is computationally fast, generating rankings on-the-fly with computation cost dominated by the sort operation, memory efficient, and has strong theoretical guarantees. Compared to baseline policies that only maximize user-side performance, our algorithm allows to incorporate complex fairness of exposure criteria in the recommendations with negligible computational overhead.  ( 3 min )
    Market Making with Scaled Beta Policies. (arXiv:2207.03352v4 [q-fin.TR] UPDATED)
    This paper introduces a new representation for the actions of a market maker in an order-driven market. This representation uses scaled beta distributions, and generalises three approaches taken in the artificial intelligence for market making literature: single price-level selection, ladder strategies and "market making at the touch". Ladder strategies place uniform volume across an interval of contiguous prices. Scaled beta distribution based policies generalise these, allowing volume to be skewed across the price interval. We demonstrate that this flexibility is useful for inventory management, one of the key challenges faced by a market maker. In this paper, we conduct three main experiments: first, we compare our more flexible beta-based actions with the special case of ladder strategies; then, we investigate the performance of simple fixed distributions; and finally, we devise and evaluate a simple and intuitive dynamic control policy that adjusts actions in a continuous manner depending on the signed inventory that the market maker has acquired. All empirical evaluations use a high-fidelity limit order book simulator based on historical data with 50 levels on each side.  ( 3 min )
    Electron energy loss spectroscopy database synthesis and automation of core-loss edge recognition by deep-learning neural networks. (arXiv:2209.13026v1 [cond-mat.mtrl-sci])
    The ionization edges encoded in the electron energy loss spectroscopy (EELS) spectra enable advanced material analysis including composition analyses and elemental quantifications. The development of the parallel EELS instrument and fast, sensitive detectors have greatly improved the acquisition speed of EELS spectra. However, the traditional way of core-loss edge recognition is experience based and human labor dependent, which limits the processing speed. So far, the low signal-noise ratio and the low jump ratio of the core-loss edges on the raw EELS spectra have been challenging for the automation of edge recognition. In this work, a convolutional-bidirectional long short-term memory neural network (CNN-BiLSTM) is proposed to automate the detection and elemental identification of core-loss edges from raw spectra. An EELS spectral database is synthesized by using our forward model to assist in the training and validation of the neural network. To make the synthesized spectra resemble the real spectra, we collected a large library of experimentally acquired EELS core edges. In synthesize the training library, the edges are modeled by fitting the multi-gaussian model to the real edges from experiments, and the noise and instrumental imperfectness are simulated and added. The well-trained CNN-BiLSTM network is tested against both the simulated spectra and real spectra collected from experiments. The high accuracy of the network, 94.9 %, proves that, without complicated preprocessing of the raw spectra, the proposed CNN-BiLSTM network achieves the automation of core-loss edge recognition for EELS spectra with high accuracy.  ( 3 min )
    Is your forecaster smarter than an energy engineer: a deep dive into electricity price forecasting. (arXiv:2209.13411v1 [cs.LG])
    The field of electricity price forecasting has seen significant advances in the last years, including the development of new, more accurate forecast models. These models leverage statistical relationships in previously observed data to predict the future; however, there is a lack of analysis explaining these models, which limits their real world applicability in critical infrastructure. In this paper, using data from the Belgian electricity markets, we explore a state-of-the-art forecasting model to understand if its predictions can be trusted in more general settings than the limited context it is trained in. If the model produces poor predictions in extreme conditions or if its predictions are inconsistent with reality, it cannot be relied upon in real-world where these forecasts are used in downstream decision-making activities. Our results show that, despite being largely accurate enough in general, even state of the art forecasts struggle with remaining consistent with reality.  ( 2 min )
    Explainable Global Fairness Verification of Tree-Based Classifiers. (arXiv:2209.13179v1 [cs.LG])
    We present a new approach to the global fairness verification of tree-based classifiers. Given a tree-based classifier and a set of sensitive features potentially leading to discrimination, our analysis synthesizes sufficient conditions for fairness, expressed as a set of traditional propositional logic formulas, which are readily understandable by human experts. The verified fairness guarantees are global, in that the formulas predicate over all the possible inputs of the classifier, rather than just a few specific test instances. Our analysis is formally proved both sound and complete. Experimental results on public datasets show that the analysis is precise, explainable to human experts and efficient enough for practical adoption.  ( 2 min )
    FedStack: Personalized activity monitoring using stacked federated learning. (arXiv:2209.13080v1 [cs.LG])
    Recent advances in remote patient monitoring (RPM) systems can recognize various human activities to measure vital signs, including subtle motions from superficial vessels. There is a growing interest in applying artificial intelligence (AI) to this area of healthcare by addressing known limitations and challenges such as predicting and classifying vital signs and physical movements, which are considered crucial tasks. Federated learning is a relatively new AI technique designed to enhance data privacy by decentralizing traditional machine learning modeling. However, traditional federated learning requires identical architectural models to be trained across the local clients and global servers. This limits global model architecture due to the lack of local models heterogeneity. To overcome this, a novel federated learning architecture, FedStack, which supports ensembling heterogeneous architectural client models was proposed in this study. This work offers a protected privacy system for hospitalized in-patients in a decentralized approach and identifies optimum sensor placement. The proposed architecture was applied to a mobile health sensor benchmark dataset from 10 different subjects to classify 12 routine activities. Three AI models, ANN, CNN, and Bi-LSTM were trained on individual subject data. The federated learning architecture was applied to these models to build local and global models capable of state of the art performances. The local CNN model outperformed ANN and Bi-LSTM models on each subject data. Our proposed work has demonstrated better performance for heterogeneous stacking of the local models compared to homogeneous stacking. This work sets the stage to build an enhanced RPM system that incorporates client privacy to assist with clinical observations for patients in an acute mental health facility and ultimately help to prevent unexpected death.  ( 3 min )
    Controlling mean exit time of stochastic dynamical systems based on quasipotential and machine learning. (arXiv:2209.13098v1 [stat.ML])
    The mean exit time escaping basin of attraction in the presence of white noise is of practical importance in various scientific fields. In this work, we propose a strategy to control mean exit time of general stochastic dynamical systems to achieve a desired value based on the quasipotential concept and machine learning. Specifically, we develop a neural network architecture to compute the global quasipotential function. Then we design a systematic iterated numerical algorithm to calculate the controller for a given mean exit time. Moreover, we identify the most probable path between metastable attractors with help of the effective Hamilton-Jacobi scheme and the trained neural network. Numerical experiments demonstrate that our control strategy is effective and sufficiently accurate.  ( 2 min )
    Design of experiments for the calibration of history-dependent models via deep reinforcement learning and an enhanced Kalman filter. (arXiv:2209.13126v1 [cs.LG])
    Experimental data is costly to obtain, which makes it difficult to calibrate complex models. For many models an experimental design that produces the best calibration given a limited experimental budget is not obvious. This paper introduces a deep reinforcement learning (RL) algorithm for design of experiments that maximizes the information gain measured by Kullback-Leibler (KL) divergence obtained via the Kalman filter (KF). This combination enables experimental design for rapid online experiments where traditional methods are too costly. We formulate possible configurations of experiments as a decision tree and a Markov decision process (MDP), where a finite choice of actions is available at each incremental step. Once an action is taken, a variety of measurements are used to update the state of the experiment. This new data leads to a Bayesian update of the parameters by the KF, which is used to enhance the state representation. In contrast to the Nash-Sutcliffe efficiency (NSE) index, which requires additional sampling to test hypotheses for forward predictions, the KF can lower the cost of experiments by directly estimating the values of new data acquired through additional actions. In this work our applications focus on mechanical testing of materials. Numerical experiments with complex, history-dependent models are used to verify the implementation and benchmark the performance of the RL-designed experiments.  ( 3 min )
    Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization. (arXiv:2209.13046v1 [cs.LG])
    Hindsight goal relabeling has become a foundational technique for multi-goal reinforcement learning (RL). The idea is quite simple: any arbitrary trajectory can be seen as an expert demonstration for reaching the trajectory's end state. Intuitively, this procedure trains a goal-conditioned policy to imitate a sub-optimal expert. However, this connection between imitation and hindsight relabeling is not well understood. Modern imitation learning algorithms are described in the language of divergence minimization, and yet it remains an open problem how to recast hindsight goal relabeling into that framework. In this work, we develop a unified objective for goal-reaching that explains such a connection, from which we can derive goal-conditioned supervised learning (GCSL) and the reward function in hindsight experience replay (HER) from first principles. Experimentally, we find that despite recent advances in goal-conditioned behaviour cloning (BC), multi-goal Q-learning can still outperform BC-like methods; moreover, a vanilla combination of both actually hurts model performance. Under our framework, we study when BC is expected to help, and empirically validate our findings. Our work further bridges goal-reaching and generative modeling, illustrating the nuances and new pathways of extending the success of generative models to RL.  ( 2 min )
    Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments. (arXiv:2209.13048v1 [cs.LG])
    Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The meta-policy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a sub-optimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL using Demonstrations (EMRLD) that exploit this information even if sub-optimal to obtain guidance during training. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy that demonstrates monotone performance improvements. We also develop a warm started variant called EMRLD-WS that is particularly efficient for sub-optimal demonstration data. Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot.  ( 2 min )
    DCE: Offline Reinforcement Learning With Double Conservative Estimates. (arXiv:2209.13132v1 [cs.LG])
    Offline Reinforcement Learning has attracted much interest in solving the application challenge for traditional reinforcement learning. Offline reinforcement learning uses previously-collected datasets to train agents without any interaction. For addressing the overestimation of OOD (out-of-distribution) actions, conservative estimates give a low value for all inputs. Previous conservative estimation methods are usually difficult to avoid the impact of OOD actions on Q-value estimates. In addition, these algorithms usually need to lose some computational efficiency to achieve the purpose of conservative estimation. In this paper, we propose a simple conservative estimation method, double conservative estimates (DCE), which use two conservative estimation method to constraint policy. Our algorithm introduces V-function to avoid the error of in-distribution action while implicit achieving conservative estimation. In addition, our algorithm uses a controllable penalty term changing the degree of conservatism in training. We theoretically show how this method influences the estimation of OOD actions and in-distribution actions. Our experiment separately shows that two conservative estimation methods impact the estimation of all state-action. DCE demonstrates the state-of-the-art performance on D4RL.  ( 2 min )
    PARSRec: Explainable Personalized Attention-fused Recurrent Sequential Recommendation Using Session Partial Actions. (arXiv:2209.13015v1 [cs.IR])
    The emerging meta- and multi-verse landscape is yet another step towards the more prevalent use of already ubiquitous online markets. In such markets, recommender systems play critical roles by offering items of interest to the users, thereby narrowing down a vast search space that comprises hundreds of thousands of products. Recommender systems are usually designed to learn common user behaviors and rely on them for inference. This approach, while effective, is oblivious to subtle idiosyncrasies that differentiate humans from each other. Focusing on this observation, we propose an architecture that relies on common patterns as well as individual behaviors to tailor its recommendations for each person. Simulations under a controlled environment show that our proposed model learns interpretable personalized user behaviors. Our empirical results on Nielsen Consumer Panel dataset indicate that the proposed approach achieves up to 27.9% performance improvement compared to the state-of-the-art.  ( 2 min )
    Dynamic Unicast-Multicast Scheduling for Age-Optimal Information Dissemination in Vehicular Networks. (arXiv:2209.13006v1 [cs.NI])
    This paper investigates the problem of minimizing the age-of-information (AoI) and transmit power consumption in a vehicular network, where a roadside unit (RSU) provides timely updates about a set of physical processes to vehicles. Each vehicle is interested in maintaining the freshness of its information status about one or more physical processes. A framework is proposed to optimize the decisions to unicast, multicast, broadcast, or not transmit updates to vehicles as well as power allocations to minimize the AoI and the RSU's power consumption over a time horizon. The formulated problem is a mixed-integer nonlinear programming problem (MINLP), thus a global optimal solution is difficult to achieve. In this context, we first develop an ant colony optimization (ACO) solution which provides near-optimal performance and thus serves as an efficient benchmark. Then, for real-time implementation, we develop a deep reinforcement learning (DRL) framework that captures the vehicles' demands and channel conditions in the state space and assigns processes to vehicles through dynamic unicast-multicast scheduling actions. Complexity analysis of the proposed algorithms is presented. Simulation results depict interesting trade-offs between AoI and power consumption as a function of the network parameters.  ( 2 min )
    Quantum Speedups of Optimizing Approximately Convex Functions with Applications to Logarithmic Regret Stochastic Convex Bandits. (arXiv:2209.12897v1 [quant-ph])
    We initiate the study of quantum algorithms for optimizing approximately convex functions. Given a convex set ${\cal K}\subseteq\mathbb{R}^{n}$ and a function $F\colon\mathbb{R}^{n}\to\mathbb{R}$ such that there exists a convex function $f\colon\mathcal{K}\to\mathbb{R}$ satisfying $\sup_{x\in{\cal K}}|F(x)-f(x)|\leq \epsilon/n$, our quantum algorithm finds an $x^{*}\in{\cal K}$ such that $F(x^{*})-\min_{x\in{\cal K}} F(x)\leq\epsilon$ using $\tilde{O}(n^{3})$ quantum evaluation queries to $F$. This achieves a polynomial quantum speedup compared to the best-known classical algorithms. As an application, we give a quantum algorithm for zeroth-order stochastic convex bandits with $\tilde{O}(n^{5}\log^{2} T)$ regret, an exponential speedup in $T$ compared to the classical $\Omega(\sqrt{T})$ lower bound. Technically, we achieve quantum speedup in $n$ by exploiting a quantum framework of simulated annealing and adopting a quantum version of the hit-and-run walk. Our speedup in $T$ for zeroth-order stochastic convex bandits is due to a quadratic quantum speedup in multiplicative error of mean estimation.  ( 2 min )
    Survey on Fairness Notions and Related Tensions. (arXiv:2209.13012v1 [cs.CY])
    Automated decision systems are increasingly used to take consequential decisions in problems such as job hiring and loan granting with the hope of replacing subjective human decisions with objective machine learning (ML) algorithms. ML-based decision systems, however, are found to be prone to bias which result in yet unfair decisions. Several notions of fairness have been defined in the literature to capture the different subtleties of this ethical and social concept (e.g. statistical parity, equal opportunity, etc.). Fairness requirements to be satisfied while learning models created several types of tensions among the different notions of fairness, but also with other desirable properties such as privacy and classification accuracy. This paper surveys the commonly used fairness notions and discusses the tensions that exist among them and with privacy and accuracy. Different methods to address the fairness-accuracy trade-off (classified into four approaches, namely, pre-processing, in-processing, post-processing, and hybrid) are reviewed. The survey is consolidated with experimental analysis carried out on fairness benchmark datasets to illustrate the relationship between fairness measures and accuracy on real-world scenarios.  ( 2 min )
    Defining and Characterizing Reward Hacking. (arXiv:2209.13085v1 [cs.LG])
    We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$. We say that a proxy is unhackable if increasing the expected proxy return can never decrease the expected true return. Intuitively, it might be possible to create an unhackable proxy by leaving some terms out of the reward function (making it "narrower") or overlooking fine-grained distinctions between roughly equivalent outcomes, but we show this is usually not the case. A key insight is that the linearity of reward (in state-action visit counts) makes unhackability a very strong condition. In particular, for the set of all stochastic policies, two reward functions can only be unhackable if one of them is constant. We thus turn our attention to deterministic policies and finite sets of stochastic policies, where non-trivial unhackable pairs always exist, and establish necessary and sufficient conditions for the existence of simplifications, an important special case of unhackability. Our results reveal a tension between using reward functions to specify narrow tasks and aligning AI systems with human values.  ( 2 min )
    A Comprehensive Review of Trends, Applications and Challenges In Out-of-Distribution Detection. (arXiv:2209.12935v1 [cs.LG])
    With recent advancements in artificial intelligence, its applications can be seen in every aspect of humans' daily life. From voice assistants to mobile healthcare and autonomous driving, we rely on the performance of AI methods for many critical tasks; therefore, it is essential to assert the performance of models in proper means to prevent damage. One of the shortfalls of AI models in general, and deep machine learning in particular, is a drop in performance when faced with shifts in the distribution of data. Nonetheless, these shifts are always expected in real-world applications; thus, a field of study has emerged, focusing on detecting out-of-distribution data subsets and enabling a more comprehensive generalization. Furthermore, as many deep learning based models have achieved near-perfect results on benchmark datasets, the need to evaluate these models' reliability and trustworthiness for pushing towards real-world applications is felt more strongly than ever. This has given rise to a growing number of studies in the field of out-of-distribution detection and domain generalization, which begs the need for surveys that compare these studies from various perspectives and highlight their straightens and weaknesses. This paper presents a survey that, in addition to reviewing more than 70 papers in this field, presents challenges and directions for future works and offers a unifying look into various types of data shifts and solutions for better generalization.  ( 3 min )
    FaRO 2: an Open Source, Configurable Smart City Framework for Real-Time Distributed Vision and Biometric Systems. (arXiv:2209.12962v1 [cs.CV])
    Recent global growth in the interest of smart cities has led to trillions of dollars of investment toward research and development. These connected cities have the potential to create a symbiosis of technology and society and revolutionize the cost of living, safety, ecological sustainability, and quality of life of societies on a world-wide scale. Some key components of the smart city construct are connected smart grids, self-driving cars, federated learning systems, smart utilities, large-scale public transit, and proactive surveillance systems. While exciting in prospect, these technologies and their subsequent integration cannot be attempted without addressing the potential societal impacts of such a high degree of automation and data sharing. Additionally, the feasibility of coordinating so many disparate tasks will require a fast, extensible, unifying framework. To that end, we propose FaRO2, a completely reimagined successor to FaRO1, built from the ground up. FaRO2 affords all of the same functionality as its predecessor, serving as a unified biometric API harness that allows for seamless evaluation, deployment, and simple pipeline creation for heterogeneous biometric software. FaRO2 additionally provides a fully declarative capability for defining and coordinating custom machine learning and sensor pipelines, allowing the distribution of processes across otherwise incompatible hardware and networks. FaRO2 ultimately provides a way to quickly configure, hot-swap, and expand large coordinated or federated systems online without interruptions for maintenance. Because much of the data collected in a smart city contains Personally Identifying Information (PII), FaRO2 also provides built-in tools and layers to ensure secure and encrypted streaming, storage, and access of PII data across distributed systems.  ( 3 min )
    Why neural networks find simple solutions: the many regularizers of geometric complexity. (arXiv:2209.13083v1 [cs.LG])
    In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.  ( 2 min )
    Developing Machine-Learned Potentials for Coarse-Grained Molecular Simulations: Challenges and Pitfalls. (arXiv:2209.12948v1 [physics.comp-ph])
    Coarse graining (CG) enables the investigation of molecular properties for larger systems and at longer timescales than the ones attainable at the atomistic resolution. Machine learning techniques have been recently proposed to learn CG particle interactions, i.e. develop CG force fields. Graph representations of molecules and supervised training of a graph convolutional neural network architecture are used to learn the potential of mean force through a force matching scheme. In this work, the force acting on each CG particle is correlated to a learned representation of its local environment that goes under the name of SchNet, constructed via continuous filter convolutions. We explore the application of SchNet models to obtain a CG potential for liquid benzene, investigating the effect of model architecture and hyperparameters on the thermodynamic, dynamical, and structural properties of the simulated CG systems, reporting and discussing challenges encountered and future directions envisioned.  ( 2 min )
    ERASE-Net: Efficient Segmentation Networks for Automotive Radar Signals. (arXiv:2209.12940v1 [cs.RO])
    Among various sensors for assisted and autonomous driving systems, automotive radar has been considered as a robust and low-cost solution even in adverse weather or lighting conditions. With the recent development of radar technologies and open-sourced annotated data sets, semantic segmentation with radar signals has become very promising. However, existing methods are either computationally expensive or discard significant amounts of valuable information from raw 3D radar signals by reducing them to 2D planes via averaging. In this work, we introduce ERASE-Net, an Efficient RAdar SEgmentation Network to segment the raw radar signals semantically. The core of our approach is the novel detect-then-segment method for raw radar signals. It first detects the center point of each object, then extracts a compact radar signal representation, and finally performs semantic segmentation. We show that our method can achieve superior performance on radar semantic segmentation task compared to the state-of-the-art (SOTA) technique. Furthermore, our approach requires up to 20x less computational resources. Finally, we show that the proposed ERASE-Net can be compressed by 40% without significant loss in performance, significantly more than the SOTA network, which makes it a more promising candidate for practical automotive applications.  ( 2 min )
    Investigation of Machine Learning-based Coarse-Grained Mapping Schemes for Organic Molecules. (arXiv:2209.12946v1 [physics.comp-ph])
    Due to the wide range of timescales that are present in macromolecular systems, hierarchical multiscale strategies are necessary for their computational study. Coarse-graining (CG) allows to establish a link between different system resolutions and provides the backbone for the development of robust multiscale simulations and analyses. The CG mapping process is typically system- and application-specific, and it relies on chemical intuition. In this work, we explored the application of a Machine Learning strategy, based on Variational Autoencoders, for the development of suitable mapping schemes from the atomistic to the coarse-grained space of molecules with increasing chemical complexity. An extensive evaluation of the effect of the model hyperparameters on the training process and on the final output was performed, and an existing method was extended with the definition of different loss functions and the implementation of a selection criterion that ensures physical consistency of the output. The relationship between the input feature choice and the reconstruction accuracy was analyzed, supporting the need to introduce rotational invariance into the system. Strengths and limitations of the approach, both in the mapping and in the backmapping steps, are highlighted and critically discussed.  ( 3 min )
    The effectiveness of factorization and similarity blending. (arXiv:2209.13011v1 [cs.IR])
    Collaborative Filtering (CF) is a widely used technique which allows to leverage past users' preferences data to identify behavioural patterns and exploit them to predict custom recommendations. In this work, we illustrate our review of different CF techniques in the context of the Computational Intelligence Lab (CIL) CF project at ETH Z\"urich. After evaluating the performances of the individual models, we show that blending factorization-based and similarity-based approaches can lead to a significant error decrease (-9.4%) on the best-performing stand-alone model. Moreover, we propose a novel stochastic extension of a similarity model, SCSR, which consistently reduce the asymptotic complexity of the original algorithm.  ( 2 min )
    Evaluation of Medical Image Segmentation Models for Uncertain, Small or Empty Reference Annotations. (arXiv:2209.13008v1 [cs.CV])
    Performance metrics for medical image segmentation models are used to measure agreement between the reference annotation and the prediction. A common set of metrics is used in the development of such models to make results more comparable. However, there is a mismatch between the distributions in public data sets and cases encountered in clinical practice. Many common metrics fail to measure the impact of this mismatch, especially for clinical data sets containing uncertain, small or empty reference annotation. Thus, models may not be validated for clinically meaningful agreement by such metrics. Dimensions of evaluating clinical value include independence from reference annotation volume size, consideration of uncertainty of reference annotations, reward of volumetric and/or location agreement and reward of correct classification of empty reference annotations. Unlike common public data sets, our in-house data set is more representative. It contains uncertain, small or empty reference annotations. We examine publicly available metrics on the predictions of a deep learning framework in order to identify for which settings common metrics provide clinical meaningful results. We compare to a public benchmark data set without uncertain, small or empty reference annotations. The code will be published.  ( 3 min )
    Predicting Protein-Ligand Binding Affinity via Joint Global-Local Interaction Modeling. (arXiv:2209.13014v1 [q-bio.BM])
    The prediction of protein-ligand binding affinity is of great significance for discovering lead compounds in drug research. Facing this challenging task, most existing prediction methods rely on the topological and/or spatial structure of molecules and the local interactions while ignoring the multi-level inter-molecular interactions between proteins and ligands, which often lead to sub-optimal performance. To solve this issue, we propose a novel global-local interaction (GLI) framework to predict protein-ligand binding affinity. In particular, our GLI framework considers the inter-molecular interactions between proteins and ligands, which involve not only the high-energy short-range interactions between closed atoms but also the low-energy long-range interactions between non-bonded atoms. For each pair of protein and ligand, our GLI embeds the long-range interactions globally and aggregates local short-range interactions, respectively. Such a joint global-local interaction modeling strategy helps to improve prediction accuracy, and the whole framework is compatible with various neural network-based modules. Experiments demonstrate that our GLI framework outperforms state-of-the-art methods with simple neural network architectures and moderate computational costs.  ( 2 min )
    Liquid Structural State-Space Models. (arXiv:2209.12951v1 [cs.LG])
    A proper parametrization of state transition matrices of linear state-space models (SSMs) followed by standard nonlinearities enables them to efficiently learn representations from sequential data, establishing the state-of-the-art on a large series of long-range sequence modeling benchmarks. In this paper, we show that we can improve further when the structural SSM such as S4 is given by a linear liquid time-constant (LTC) state-space model. LTC neural networks are causal continuous-time neural networks with an input-dependent state transition module, which makes them learn to adapt to incoming inputs at inference. We show that by using a diagonal plus low-rank decomposition of the state transition matrix introduced in S4, and a few simplifications, the LTC-based structural state-space model, dubbed Liquid-S4, achieves the new state-of-the-art generalization across sequence modeling tasks with long-term dependencies such as image, text, audio, and medical time-series, with an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4. The additional gain in performance is the direct result of the Liquid-S4's kernel structure that takes into account the similarities of the input sequence samples during training and inference.  ( 2 min )
    Optical Neural Ordinary Differential Equations. (arXiv:2209.12898v1 [cs.LG])
    Increasing the layer number of on-chip photonic neural networks (PNNs) is essential to improve its model performance. However, the successively cascading of network hidden layers results in larger integrated photonic chip areas. To address this issue, we propose the optical neural ordinary differential equations (ON-ODE) architecture that parameterizes the continuous dynamics of hidden layers with optical ODE solvers. The ON-ODE comprises the PNNs followed by the photonic integrator and optical feedback loop, which can be configured to represent residual neural networks (ResNet) and recurrent neural networks with effectively reduced chip area occupancy. For the interference-based optoelectronic nonlinear hidden layer, the numerical experiments demonstrate that the single hidden layer ON-ODE can achieve approximately the same accuracy as the two-layer optical ResNet in image classification tasks. Besides, the ONODE improves the model classification accuracy for the diffraction-based all-optical linear hidden layer. The time-dependent dynamics property of ON-ODE is further applied for trajectory prediction with high accuracy.  ( 2 min )
    Towards Simple and Efficient Task-Adaptive Pre-training for Text Classification. (arXiv:2209.12943v1 [cs.CL])
    Language models are pre-trained using large corpora of generic data like book corpus, common crawl and Wikipedia, which is essential for the model to understand the linguistic characteristics of the language. New studies suggest using Domain Adaptive Pre-training (DAPT) and Task-Adaptive Pre-training (TAPT) as an intermediate step before the final finetuning task. This step helps cover the target domain vocabulary and improves the model performance on the downstream task. In this work, we study the impact of training only the embedding layer on the model's performance during TAPT and task-specific finetuning. Based on our study, we propose a simple approach to make the intermediate step of TAPT for BERT-based models more efficient by performing selective pre-training of BERT layers. We show that training only the BERT embedding layer during TAPT is sufficient to adapt to the vocabulary of the target domain and achieve comparable performance. Our approach is computationally efficient, with 78\% fewer parameters trained during TAPT. The proposed embedding layer finetuning approach can also be an efficient domain adaptation technique.  ( 2 min )
    Biologically-Plausible Determinant Maximization Neural Networks for Blind Separation of Correlated Sources. (arXiv:2209.12894v1 [eess.SP])
    Extraction of latent sources of complex stimuli is critical for making sense of the world. While the brain solves this blind source separation (BSS) problem continuously, its algorithms remain unknown. Previous work on biologically-plausible BSS algorithms assumed that observed signals are linear mixtures of statistically independent or uncorrelated sources, limiting the domain of applicability of these algorithms. To overcome this limitation, we propose novel biologically-plausible neural networks for the blind separation of potentially dependent/correlated sources. Differing from previous work, we assume some general geometric, not statistical, conditions on the source vectors allowing separation of potentially dependent/correlated sources. Concretely, we assume that the source vectors are sufficiently scattered in their domains which can be described by certain polytopes. Then, we consider recovery of these sources by the Det-Max criterion, which maximizes the determinant of the output correlation matrix to enforce a similar spread for the source estimates. Starting from this normative principle, and using a weighted similarity matching approach that enables arbitrary linear transformations adaptable by local learning rules, we derive two-layer biologically-plausible neural network algorithms that can separate mixtures into sources coming from a variety of source domains. We demonstrate that our algorithms outperform other biologically-plausible BSS algorithms on correlated source separation problems.  ( 3 min )
    Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile. (arXiv:2209.12982v1 [cs.AR])
    Most of today's computer vision pipelines are built around deep neural networks, where convolution operations require most of the generally high compute effort. The Winograd convolution algorithm computes convolutions with fewer MACs compared to the standard algorithm, reducing the operation count by a factor of 2.25x for 3x3 convolutions when using the version with 2x2-sized tiles $F_2$. Even though the gain is significant, the Winograd algorithm with larger tile sizes, i.e., $F_4$, offers even more potential in improving throughput and energy efficiency, as it reduces the required MACs by 4x. Unfortunately, the Winograd algorithm with larger tile sizes introduces numerical issues that prevent its use on integer domain-specific accelerators and higher computational overhead to transform input and output data between spatial and Winograd domains. To unlock the full potential of Winograd $F_4$, we propose a novel tap-wise quantization method that overcomes the numerical issues of using larger tiles, enabling integer-only inference. Moreover, we present custom hardware units that process the Winograd transformations in a power- and area-efficient way, and we show how to integrate such custom modules in an industrial-grade, programmable DSA. An extensive experimental evaluation on a large set of state-of-the-art computer vision benchmarks reveals that the tap-wise quantization algorithm makes the quantized Winograd $F_4$ network almost as accurate as the FP32 baseline. The Winograd-enhanced DSA achieves up to 1.85x gain in energy efficiency and up to 1.83x end-to-end speed-up for state-of-the-art segmentation and detection networks.  ( 3 min )
    Public Wisdom Matters! Discourse-Aware Hyperbolic Fourier Co-Attention for Social-Text Classification. (arXiv:2209.13017v1 [cs.CL])
    Social media has become the fulcrum of all forms of communication. Classifying social texts such as fake news, rumour, sarcasm, etc. has gained significant attention. The surface-level signals expressed by a social-text itself may not be adequate for such tasks; therefore, recent methods attempted to incorporate other intrinsic signals such as user behavior and the underlying graph structure. Oftentimes, the `public wisdom' expressed through the comments/replies to a social-text acts as a surrogate of crowd-sourced view and may provide us with complementary signals. State-of-the-art methods on social-text classification tend to ignore such a rich hierarchical signal. Here, we propose Hyphen, a discourse-aware hyperbolic spectral co-attention network. Hyphen is a fusion of hyperbolic graph representation learning with a novel Fourier co-attention mechanism in an attempt to generalise the social-text classification tasks by incorporating public discourse. We parse public discourse as an Abstract Meaning Representation (AMR) graph and use the powerful hyperbolic geometric representation to model graphs with hierarchical structure. Finally, we equip it with a novel Fourier co-attention mechanism to capture the correlation between the source post and public discourse. Extensive experiments on four different social-text classification tasks, namely detecting fake news, hate speech, rumour, and sarcasm, show that Hyphen generalises well, and achieves state-of-the-art results on ten benchmark datasets. We also employ a sentence-level fact-checked and annotated dataset to evaluate how Hyphen is capable of producing explanations as analogous evidence to the final prediction.  ( 3 min )

  • Open

    [D] Dumb question about sklearn classification report
    So, I get this classification report right. I get precision, recall and F1 score (and some average which I don't know what to do about yet). But when I look at ML papers, they only report one metric per performance measure. I have a binary classification problem. Which numbers from my report should I report in my final analysis? like when I want to report my model performance using recall, how can I report this with one number when I got two? .... lol submitted by /u/javagarbagecollector [link] [comments]  ( 88 min )
    [D] How to use AI to get more dates on dating apps
    I have an idea on how to use AI to get more dates on dating apps, the instructions are below, any new ideas are welcome. 1 - First download a pre-trained language prediction model (like GPT-2, that is available for free on hugging face). ​ 2 - Then fine-tune the model on a dataset of “informal” chats between two people, like a dataset of Tinder messages or something. ​ 3 - Create an classifier encoder that receives a sentence as input and predicts the latent representation of the sentence for sentiment analysis. ​ 4 - Now, suppose you are talking to someone on a dating app, every time the person sends you a message, you feed the message into the program, the goal is to simulate potential conversations using the language prediction model, and use the classifier encoder to “clip” what you want, for example, you could optimize the probability of the other person saying something with the meaning of “omg you are so sweet”, and the language prediction model will figure out automatically what you have to say to maximize the probability of the other person saying this. ​ 5 - That is it, of course, you have to analyze what character the model is creating while talking to the other person, so on a in person date, you could keep the same personality. submitted by /u/QLaHPD [link] [comments]  ( 105 min )
    [D] Why are speech to text models trained on phrases instead of single words?
    What is the purpose of training LSTM/RNN models for speech to text on phrases like from LibreSpeech? What if the user says a single word? Wouldn’t the model return the closest phrase as the output, giving wildly wrong output? submitted by /u/Proof_Hyena4223 [link] [comments]  ( 90 min )
    [News] Speech-to-Speech: Use your own voice to control an AI voice with Resemble AI
    Just released a new way to create synthetic media using AI Voices. Speech-to-Speech by Resemble AI will allow you to control your AI voice with any audio file/mic input you provide it with. Here's a quick video showing how it works: https://youtu.be/cXtgdsWw1xI https://www.resemble.ai/speech-to-speech/ ​ https://preview.redd.it/btxe8w0vegq91.png?width=2560&format=png&auto=webp&s=f54f791edc4d1a5b0b330932c9deacd1f75e645d submitted by /u/resembleai [link] [comments]  ( 103 min )
    [R] Learning to Learn with Generative Models of Neural Network Checkpoints
    submitted by /u/TobusFire [link] [comments]  ( 90 min )
    [D]Can a ML model be trained using other ML models as input data to try and develop novel architectures?
    You'll have to excuse me if this is a daft question. I'm quite new to ML, but a thought occured to me about how a ML model might be able to assist in developing new undiscovered architectures. So I thought I'd ask. Tried to find some examples on Google but was returned very limited information. The overall things I'm getting at is, could a ML model design itself a more efficient architecture? Thanks. submitted by /u/Fibonacci1664 [link] [comments]  ( 90 min )
    [Research] Searching: Visualization of Trajectory of Gaussian States
    Hello, i am looking for a way to visualize the State Trajectory of a learned Kalman-Filter. A State is a Multivariate Gaussian with Diagonal Covariance Matrix. The Dimension of the Gaussian States is d > 10. I want to compare the States of fully observed trajectories with predicted trajectories. Are there any ways to visualize those trajectories in a meaningful way? Thank you in advance! submitted by /u/Metallfrosch [link] [comments]  ( 88 min )
    [D] How to use Categorical Cross Entropy for Multi-Label Classification?
    Say my target with classes `A, B, C, D, E` is `[0, 1, 1, 0, 0]`. And my output layer is of B x N where N is the number of classes. How do I use Categorical Cross Entropy for this? submitted by /u/sarmientoj24 [link] [comments]  ( 106 min )
    [D] Dreambooth Stable Diffusion training in just 12.5 GB VRAM, using the 8bit adam optimizer from bitsandbytes along with xformers while being 2 times faster.
    Tested on Nvidia A10G, took 15-20 mins to train. We can finally run on colab notebooks. Colab: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/dreambooth/DreamBooth\_Stable\_Diffusion.ipynb Code: https://github.com/ShivamShrirao/diffusers/blob/main/examples/dreambooth/ More details https://github.com/huggingface/diffusers/pull/554#issuecomment-1259522002 https://preview.redd.it/y19vz2ecpeq91.png?width=949&format=png&auto=webp&s=8e5c8f0bf0b4b6dd1ae92fb6df5713d9bb094096 submitted by /u/0x00groot [link] [comments]  ( 89 min )
    [D] Is Midjourney AI more-or-less the same architecture as DALL-E 2? Can I read about the model in detail somewhere or is there anything published in this regard?
    I would like to study how these modern image generators work, and Midjourney really caught my eye. As far as I know, DALL-E 2 uses a combination of transformers and diffusion networks, which is quite fascinating, but is it the same with Midjourney? It's not a GAN, right? Or is it? Is there any published work for it similar to DALL-E 2? submitted by /u/narkoface [link] [comments]  ( 91 min )
    [P] Efficient Few-shot Learning with Sentence Transformers
    Hi there, it's Lewis here from the open-source team at Hugging Face 🤗 I'm excited to share new research on few-shot learning with language models that we've been working on with Intel 🧑‍🔬. We've also open-source a library that let's you train our models with a few lines of code 👉: https://github.com/huggingface/setfit tl;dr we found a way to apply pretrained Sentence Transformers in regimes where one has little labeled data. The method is illustrated below, and involves a two-stage training process: ​ Fine-tune the Sentence Transformer with a a few labeled examples (e.g. 8 per class) using a contrastive loss Freeze the weights of the tuned Sentence Transformer and train a simple classification head (e.g. logistic regression) https://preview.redd.it/lw6o49vcrcq91.png?width=971&format=png&auto=webp&s=a8ab0de8a4c44e9cc8f015184e57e1a64fbb8e97 Surprisingly, this simple technique outperforms GPT-3 on the RAFT benchmark, despite using models that are 350x smaller! This means you can now do few-shot learning in around 30s on Google Colab (or even your CPU if are willing to wait a few minutes) 🤓 For more details, check out our blog post: https://huggingface.co/blog/setfit submitted by /u/lewtun [link] [comments]  ( 90 min )
    [D] Matrix Dot Product of and [B, N] and [N x N] in Tensor
    I have a pre-computed co-occurence matrix in shape of [NxN] where N is the number of classes I want to utilize this info on the last layer of my multi-label classification of size [B, N]. Is dot product the best way to do it? How do I use dot product in Tensor with [B, N] and [N, N]? submitted by /u/sarmientoj24 [link] [comments]  ( 88 min )
    [D] Replacement Options for the Stellargraph Library
    I've been using the Stellargraph library consistently. Mostly for the GraphSAGE implementation. But it's been two years since their last release and the previous release (1.2.1) requires a version of Python that is either 3.6.x, 3.7.x, or 3.8.x, which is frustrating now that we're on 3.10.x. Are there actively developed libraries that support GraphSAGE and work well with Keras? submitted by /u/UnknownBinary [link] [comments]  ( 107 min )
    [D] Is there a way to filter out "low quality" text sequences for a text classification task? Is there even a way to define "low quality?"
    Hi. I'm currently working on a text classification task in the e-commerce domain where the objective is to receive a product name as input and output an item category. I've noticed that there are many cases where input texts aren't particularly informative (e.g., some are simply product codes like FALLSW12302) and would like to filter these samples out. I'm currently testing two models on this task - one that performs more classical classification and another that casts this as a text generation problem and uses sequence-to-sequence generation to predict labels) - and I've thought of using the confidence score or perplexity of the model predictions. However, I'm wondering if there are any better methods to proceed with this. If not I may resort to creating my own binary classification dataset and training a classifier on that to filter out samples. Thanks. submitted by /u/Seankala [link] [comments]  ( 90 min )
  • Open

    Will AI take over humans one day?
    submitted by /u/wisereputationmkr [link] [comments]  ( 87 min )
    DREAMBOOTH Tutorial: Train Stable Diffusion With Your Images Using Google's AI!
    submitted by /u/PuppetHere [link] [comments]  ( 87 min )
    Speech-to-Speech: Use your own voice to control an AI voice with Resemble AI
    Hello Redditors! I'm very excited to announce the launch of Speech-to-Speech, which will allow you to control your AI voice with any audio file/mic input you provide. Let me know your thoughts. https://youtu.be/cXtgdsWw1xI https://www.resemble.ai/speech-to-speech/ https://preview.redd.it/k0lgqp6yfgq91.png?width=2560&format=png&auto=webp&s=7dc1c05e5477b201091e32d3e8217a47cf1d6f63 submitted by /u/resembleai [link] [comments]  ( 92 min )
    Artificial intelligence reduces a 100,000-equation quantum physics problem to only four equations
    submitted by /u/Black_RL [link] [comments]  ( 91 min )
    Is the future of AI Chinese?
    submitted by /u/BiologyNerd100 [link] [comments]  ( 86 min )
    AI generated support for Iranain Revolutionaries
    submitted by /u/volfmont [link] [comments]  ( 87 min )
    5G Humanoid AI Robot For 170K USD To Automate Service Industry Tasks | New Nvidia AI Creates 3D Renderings | OpenAI Open-Sources "Whisper" AI Model | Autonomous Microrobots
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    How dead celebrities would look today with artificial intelligence
    submitted by /u/magenta_placenta [link] [comments]  ( 87 min )
    Question regarding merging a model with long term memory from previous messages
    Could someone please point me where I could learn how to implement a model with long term memory and the ability to not go off topic? I was thinking of making a database that stores every input and do a round check targeting specific words before the output. This might seem to generic but that's the only solution I could come up ​ Example of what I mean by this is: user: My favourite color is red, what is yours? <--- Now this is stored in a database model: Red is my favourite color too * after chatting for a while the user will test the model memory * user: what is my favourite color? model: your favourite color is red <-- This is the desired output, or at least one of it's forms ​ Of course it is not limited to the user input but as well as what facts the model might say. submitted by /u/MeNootka [link] [comments]  ( 87 min )
    Anonymous Internet commenter muses on the moral/ethical backlash toward AI generated art (Stable Diffusion, etc.) and accusations of plagiarism that are currently dominating social media discussion
    submitted by /u/DraconicLegacy [link] [comments]  ( 91 min )
    Upscale all my photos on iPhone?
    Hey, Is there anysoftware to upscale all photos on my iPhone(good quality photos can be skipped). submitted by /u/ArgyleDiamonds [link] [comments]  ( 87 min )
    Announcing the Future Fund's AI Worldview Prize - EA Forum
    submitted by /u/estasfuera [link] [comments]  ( 86 min )
    Creating cinema: the new frontier of generative AI. This technology through creative power of the machine and artistic inspiration of man for cinema of the future. People will be able to write and generate their own dreamy stories right at home without actors, crew, etc. An evolution of creativity.
    submitted by /u/globeworldmap [link] [comments]  ( 89 min )
    AI Now Well-Set to Alter the Laws of Physics
    submitted by /u/kaykaymarieog [link] [comments]  ( 86 min )
    Make a good prompt workflow for AI images and resource links for Stable ...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Are there enough safety mechanisms/contingencies in place to ensure AI improves safely? If not, how do we get the general public to care about these issues?
    submitted by /u/tuccigucci_ [link] [comments]  ( 87 min )
    Best way to change a body of text to add slang?
    A couple of my friends and I are looking into changing the text of an absolutely massive literary work to include regional pronunciations/slang/idioms, and we figure it would be much, much easier to go about this with an AI program rather than doing it by hand. What would be the easiest, cheapest way to get an AI to recognize and replace text this way? submitted by /u/GLaDOSunit [link] [comments]  ( 87 min )
    "a contest of magic between the two greatest mages in the history of the cosmos"
    submitted by /u/Agaeon [link] [comments]  ( 91 min )
    Open source local GPT-3 alternative that can train on custom sets?
    I want to scrape all of my personal reddit history and other ramblings through time and train a chat bot on them. Any suggestions? I'd prefer something that runs locally, but if there is something already put together on colab and wouldn't be hindered by using free tpus that would work as well. I'd prefer to stay away from any type of api to token access system, however. I appreciate any guidance or consideration. submitted by /u/a4mula [link] [comments]  ( 87 min )
  • Open

    Need a Fast, Safe & Flexible App? Go for Cloud-Based Mobile Apps
    Are you afraid of the security and the safety of the apps? If yes! Then let me tell you that this is a common concern as everyone as everyone is looking for a fast, safe, and flexible app as security is the major concern that comes in mind of startups or entrepreneurs. The post Need a Fast, Safe & Flexible App? Go for Cloud-Based Mobile Apps appeared first on Data Science Central.  ( 21 min )
    7 Key Steps to Comply with California Consumer Privacy Act (CCPA)
    The CCPA entitles consumers to know what personal information is being collected and how it is further shared to be used by third parties. Moreover, it is well within the consumers’ rights to stop any business from sharing their data and remove it completely. The post 7 Key Steps to Comply with California Consumer Privacy Act (CCPA) appeared first on Data Science Central.  ( 22 min )
    Cloudy Skies: The Rise of Federated Containers and Scrutiny
    The Cloud has been a dominant paradigm over the last decade but is now attracting regulatory scrutiny.  The post Cloudy Skies: The Rise of Federated Containers and Scrutiny appeared first on Data Science Central.  ( 19 min )
    Internet of Things Security: Safeguarding Connected Devices and Networks in IoT Era
    The IOT security product industry promises to become a safe road for digital commercialization. IOT security products safeguard networks and interconnected devices. It caters to various business needs such as data encryption, authentication and subsequent, regulatory compliance. The post Internet of Things Security: Safeguarding Connected Devices and Networks in IoT Era appeared first on Data Science Central.  ( 19 min )
    The Similarities of Solving Data Problems and Rubik’s Cubes
    In 1974, two distinct but interestingly similar milestones were achieved that would greatly affect the lives of data engineers: the Rubik’s Cube was invented, and IBM released the first relational database. Since its original rise in the 1980s, the Rubik’s Cube has become the world’s most popular puzzle toy. The post The Similarities of Solving Data Problems and Rubik’s Cubes appeared first on Data Science Central.  ( 23 min )
    What Careers are Available After Blockchain Certifications?
    Blockchain experts are in demand. Due to its multiple uses, it needs people handling this new technology. Like any other great profession, these aren't for everyone. You must have or acquire talents by becoming a certified blockchain professional and give reasons to recruiters to hire you. The post What Careers are Available After Blockchain Certifications? appeared first on Data Science Central.  ( 20 min )
    Pakistan Serves As a Great Reminder for Climate Justice
    A few months ago, Pakistan also faced one of the worst heatwaves in the world, with at one stage, the top 5 of the ten hottest places on earth were in Pakistan. The post Pakistan Serves As a Great Reminder for Climate Justice appeared first on Data Science Central.  ( 20 min )
    10 steps to data profiling for successful data discovery: Part II
    Before embarking on the data profiling exercise, an analyst must prepare by going through a data profiling analysis. The post 10 steps to data profiling for successful data discovery: Part II appeared first on Data Science Central.  ( 21 min )
    Point – Counterpoint on Why Organizations Suck at AI
    I love this infographic recently floating around LinkedIn.  Sorry, don’t know to whom to give credit, but it does provide an interesting depiction of how senior management thinks AI works and the realities of what’s required to make AI work. The post Point – Counterpoint on Why Organizations Suck at AI appeared first on Data Science Central.  ( 22 min )
    New Book: Intuitive Machine Learning
    Intuitive Machine Learning with focus on explainable AI, human-friendly intelligence, powerful visualizations and applications. By Vincent Granville Ph.D, published in September 2022. PDF format, 156 pages. Version 1.0 with Python code. The book is available here. This book covers the foundations of machine learning, with modern approaches to solving complex problems. Emphasis is on scalability, automation,… Read More »New Book: Intuitive Machine Learning The post New Book: Intuitive Machine Learning appeared first on Data Science Central.  ( 19 min )
    Allegrograph: From Lisp to SHACL
    This is an interview with Dr. Jans Aasman, CEO of Franz, Inc. and designer of the Allegrograph knowledge graph engine. In this interview, we cover everything from the role of Lisp (and Lispers), the versatility of RDF hypergraphs, the value of Allegrograph, and the future of artificial intelligence, machine learning and inferential logic in the graph space. The post Allegrograph: From Lisp to SHACL appeared first on Data Science Central.  ( 18 min )
    Privacy Center: The Key to Meeting Data Privacy Obligations
    Organizations must opt for a more centralized approach to automate their privacy functions to reduce risk and build transparency and trust with their consumers. The post Privacy Center: The Key to Meeting Data Privacy Obligations appeared first on Data Science Central.  ( 21 min )
  • Open

    Agility Robotics's "Cassie" bipedal robot can run 100 meters in 25s (also does stairs, & 5K run on 1 battery)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Use multi armed bandits to pick the correct word
    Let's say I want to find target words/information in some structured document texts (like receipts), by using the MAB method. My idea is to treat each word as a bandit, with the target words having high rewards. During training phase the feedback is given by annotated data (which words are our target words), and during online learning phase it is provided by human checking. Can this work? submitted by /u/dannytty [link] [comments]  ( 87 min )
    Has anyone built a perfect Deep Reinforcement Learning Connect 4 bot?!
    I have been trying to solve Connect 4 several times now, but have succeeded yet. I tried several variants of DQN, but it seems like the performence plateaus at the level of an intermediate player. I have not implemented MCTS yet and am wondering, if that could be the issue.. would be really nice if someone had succeeded here and could share his code with me :) submitted by /u/spadel_ [link] [comments]  ( 104 min )
    I want to do vectorisation for my custom gym env,how can i do this does using dummyvecenv wrapper be enough or do i need to change the whole env to incorporate vector actions and rewards?
    currently i am doing this # list of envs num_envs = 3 envs = [lambda: NeuroRL4(label_name) for i in range(num_envs)] ​ # Vec Env envs = DummyVecEnv(envs) model = DDPG("MlpPolicy", envs, action_noise=action_noise, verbose=1) model.learn(total_timesteps=1, log_interval=1) model.save("sb3_envs_ddpg_model") ​ But its giving eror that you must use only one env when doing episodic training. submitted by /u/Historical-Stock-750 [link] [comments]  ( 88 min )
    Why we use diagonal gaussian rather than multivariate guassian (with full covariance matrix)
    TSIA Is that proved empirically better? or is there any theory related to that? submitted by /u/ad26kr [link] [comments]  ( 87 min )
  • Open

    Index your Dropbox content using the Dropbox connector for Amazon Kendra
    Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should […]  ( 7 min )
    Provision and manage ML environments with Amazon SageMaker Canvas using AWS CDK and AWS Service Catalog
    The proliferation of machine learning (ML) across a wide range of use cases is becoming prevalent in every industry. However, this outpaces the increase in the number of ML practitioners who have traditionally been responsible for implementing these technical solutions to realize business outcomes. In today’s enterprise, there is a need for machine learning to […]  ( 9 min )
    New features for Amazon SageMaker Pipelines and the Amazon SageMaker SDK
    Amazon SageMaker Pipelines allows data scientists and machine learning (ML) engineers to automate training workflows, which helps you create a repeatable process to orchestrate model development steps for rapid experimentation and model retraining. You can automate the entire model build workflow, including data preparation, feature engineering, model training, model tuning, and model validation, and catalog […]  ( 13 min )
    Reduce the time taken to deploy your models to Amazon SageMaker for testing
    Data scientists often train their models locally and look for a proper hosting service to deploy their models. Unfortunately, there’s no one set mechanism or guide to deploying pre-trained models to the cloud. In this post, we look at deploying trained models to Amazon SageMaker hosting to reduce your deployment time. SageMaker is a fully […]  ( 7 min )
  • Open

    Neurodegenerative disease can progress in newly identified patterns
    A machine-learning method finds patterns of health decline in ALS, informing future clinical trial designs and mechanism discovery. The technique also extends to Alzheimer’s and Parkinson’s.  ( 8 min )
    New program to support translational research in AI, data science, and machine learning
    The MIT-Pillar AI Collective will cultivate prospective entrepreneurs and drive innovation.  ( 5 min )
  • Open

    Quantization for Fast and Environmentally Sustainable Reinforcement Learning
    Posted by Srivatsan Krishnan, Student Researcher, and Aleksandra Faust, Senior Staff Research Scientist, Google Research, Brain Team Deep reinforcement learning (RL) continues to make great strides in solving real-world sequential decision-making problems such as balloon navigation, nuclear physics, robotics, and games. Despite its promise, one of its limiting factors is long training times. While the current approach to speed up RL training on complex and difficult tasks leverages distributed training scaling up to hundreds or even thousands of computing nodes, it still requires the use of significant hardware resources which makes RL training expensive, while increasing its environmental impact. However, recent work [1, 2] indicates that performance optimizations on existing hardware …  ( 23 min )
  • Open

    Top Artificial Intelligence Tools For Content Writers
    Content writers are responsible for producing websites and blogs that provide readers with information on a specific topic. Their job is to…  ( 12 min )
  • Open

    5G Humanoid AI Robot For 170K USD To Automate Service Industry Tasks | New Nvidia AI Creates 3D Renderings | OpenAI Open-Sources "Whisper" AI Model | Autonomous Microrobots
    submitted by /u/kenickh [link] [comments]  ( 87 min )
  • Open

    Visualizing correlations with graphs
    Yesterday I found a statistics textbook for geologists [1] for $1 at a library book sale. When I thumbed through the book an image similar to the one below caught my eye. This image approximates Figure 15.2 in [1], The nodes represent six factors of the thickness of rock formations and the edges are labeled […] Visualizing correlations with graphs first appeared on John D. Cook.  ( 5 min )
  • Open

    ProtoShotXAI: Using Prototypical Few-Shot Architecture for Explainable AI. (arXiv:2110.11597v2 [cs.LG] UPDATED)
    Unexplainable black-box models create scenarios where anomalies cause deleterious responses, thus creating unacceptable risks. These risks have motivated the field of eXplainable Artificial Intelligence (XAI) to improve trust by evaluating local interpretability in black-box neural networks. Unfortunately, the ground truth is unavailable for the model's decision, so evaluation is limited to qualitative assessment. Further, interpretability may lead to inaccurate conclusions about the model or a false sense of trust. We propose to improve XAI from the vantage point of the user's trust by exploring a black-box model's latent feature space. We present an approach, ProtoShotXAI, that uses a Prototypical few-shot network to explore the contrastive manifold between nonlinear features of different classes. A user explores the manifold by perturbing the input features of a query sample and recording the response for a subset of exemplars from any class. Our approach is the first locally interpretable XAI model that can be extended to, and demonstrated on, few-shot networks. We compare ProtoShotXAI to the state-of-the-art XAI approaches on MNIST, Omniglot, and ImageNet to demonstrate, both quantitatively and qualitatively, that ProtoShotXAI provides more flexibility for model exploration. Finally, ProtoShotXAI also demonstrates novel explainabilty and detectabilty on adversarial samples.  ( 3 min )
    MolGraph: a Python package for the implementation of small molecular graphs and graph neural networks with TensorFlow and Keras. (arXiv:2208.09944v3 [cs.LG] UPDATED)
    Molecular machine learning (ML) has proven important for tackling various molecular problems, including the prediction of protein-drug interactions and blood brain-barrier permeability. Since relatively recently, so-called graph neural networks (GNNs) have been implemented for molecular ML, showing comparable or superior performance to descriptor-based approaches. Although various tools and packages exist to apply GNNs for molecular ML, a new GNN package, named MolGraph, was developed in this work with the motivation to create GNNs highly compatible with the TensorFlow and Keras application programming interface (API). As MolGraph focuses specifically and exclusively on molecular ML, a chemistry module was implemented to accommodate the generation of small molecular graphs $\unicode{x2014}$ which could then be inputted to the GNNs for molecular ML. To validate the GNNs, they were benchmarked against the datasets of MoleculeNet, as well as three chromatographic retention time datasets. The results on these benchmarks show that the GNNs performed as expected. Additionally, the GNNs proved useful for molecular identification and improved interpretability of chromatographic retention time data. MolGraph is available at https://github.com/akensert/molgraph.  ( 3 min )
    Dynamical softassign and adaptive parameter tuning for graph matching. (arXiv:2208.08233v2 [math.CO] UPDATED)
    This paper studies a framework, projected fixed-point method, for graph matching. The framework contains a class of popular graph matching algorithms, including graduated assignment (GA), integer projected fixed-point method (IPFP) and doubly stochastic projected fixed-point method (DSPFP). We propose an adaptive strategy to tune the step size parameter in this framework. Such a strategy improves these algorithms in efficiency and accuracy. Further, it guarantees the convergence of the underlying algorithms. Some preliminary analysis based on distance geometry seems to support that the optimal step size parameter has a high probability of 1 when graphs are fully connected. Secondly, it is observed that a popular projection method, softassign, is sensitive to graphs' cardinality(size). We proposed a dynamical softassign algorithm that is robust to graphs' cardinality. Combining the adaptive step size and the dynamical softassign, we propose a novel graph matching algorithm: the adaptive projected fixed-point method with dynamical softassign. Various experiments demonstrate that the proposed algorithm is significantly faster than several other state-of-art algorithms with no loss of accuracy.  ( 2 min )
    Motley: Benchmarking Heterogeneity and Personalization in Federated Learning. (arXiv:2206.09262v6 [cs.LG] UPDATED)
    Personalized federated learning considers learning models unique to each client in a heterogeneous network. The resulting client-specific models have been purported to improve metrics such as accuracy, fairness, and robustness in federated networks. However, despite a plethora of work in this area, it remains unclear: (1) which personalization techniques are most effective in various settings, and (2) how important personalization truly is for realistic federated applications. To better answer these questions, we propose Motley, a benchmark for personalized federated learning. Motley consists of a suite of cross-device and cross-silo federated datasets from varied problem domains, as well as thorough evaluation metrics for better understanding the possible impacts of personalization. We establish baselines on the benchmark by comparing a number of representative personalized federated learning methods. These initial results highlight strengths and weaknesses of existing approaches, and raise several open questions for the community. Motley aims to provide a reproducible means with which to advance developments in personalized and heterogeneity-aware federated learning, as well as the related areas of transfer learning, meta-learning, and multi-task learning.  ( 3 min )
    Automatic Sleep Scoring from Large-scale Multi-channel Pediatric EEG. (arXiv:2207.06921v2 [eess.SP] UPDATED)
    Sleep is particularly important to the health of infants, children, and adolescents, and sleep scoring is the first step to accurate diagnosis and treatment of potentially life-threatening conditions. But pediatric sleep is severely under-researched compared to adult sleep in the context of machine learning for health, and sleep scoring algorithms developed for adults usually perform poorly on infants. Here, we present the first automated sleep scoring results on a recent large-scale pediatric sleep study dataset that was collected during standard clinical care. We develop a transformer-based supervised learning model that learns to classify five sleep stages from millions of multi-channel electroencephalogram (EEG) sleep epochs with 78% overall accuracy. Further, we conduct an in-depth analysis of the model performance based on patient demographics and EEG channels. The results point to the growing need for machine learning research on pediatric sleep.  ( 2 min )
    Graph Rationalization with Environment-based Augmentations. (arXiv:2206.02886v2 [cs.LG] UPDATED)
    Rationale is defined as a subset of input features that best explains or supports the prediction by machine learning models. Rationale identification has improved the generalizability and interpretability of neural networks on vision and language data. In graph applications such as molecule and polymer property prediction, identifying representative subgraph structures named as graph rationales plays an essential role in the performance of graph neural networks. Existing graph pooling and/or distribution intervention methods suffer from lack of examples to learn to identify optimal graph rationales. In this work, we introduce a new augmentation operation called environment replacement that automatically creates virtual data examples to improve rationale identification. We propose an efficient framework that performs rationale-environment separation and representation learning on the real and augmented examples in latent spaces to avoid the high complexity of explicit graph decoding and encoding. Comparing against recent techniques, experiments on seven molecular and four polymer real datasets demonstrate the effectiveness and efficiency of the proposed augmentation-based graph rationalization framework.  ( 2 min )
    ALMA: Hierarchical Learning for Composite Multi-Agent Tasks. (arXiv:2205.14205v2 [cs.LG] UPDATED)
    Despite significant progress on multi-agent reinforcement learning (MARL) in recent years, coordination in complex domains remains a challenge. Work in MARL often focuses on solving tasks where agents interact with all other agents and entities in the environment; however, we observe that real-world tasks are often composed of several isolated instances of local agent interactions (subtasks), and each agent can meaningfully focus on one subtask to the exclusion of all else in the environment. In these composite tasks, successful policies can often be decomposed into two levels of decision-making: agents are allocated to specific subtasks and each agent acts productively towards their assigned subtask alone. This decomposed decision making provides a strong structural inductive bias, significantly reduces agent observation spaces, and encourages subtask-specific policies to be reused and composed during training, as opposed to treating each new composition of subtasks as unique. We introduce ALMA, a general learning method for taking advantage of these structured tasks. ALMA simultaneously learns a high-level subtask allocation policy and low-level agent policies. We demonstrate that ALMA learns sophisticated coordination behavior in a number of challenging environments, outperforming strong baselines. ALMA's modularity also enables it to better generalize to new environment configurations. Finally, we find that while ALMA can integrate separately trained allocation and action policies, the best performance is obtained only by training all components jointly. Our code is available at https://github.com/shariqiqbal2810/ALMA  ( 3 min )
    Geometric Regularization from Overparameterization. (arXiv:2202.09276v2 [cs.LG] UPDATED)
    The volume of the distribution of weight sets associated with a loss value may be the source of implicit regularization from overparameterization due to the phenomenon of contracting volume with increasing dimensions for geometric figures demonstrated by hyperspheres. We introduce the geometric regularization conjecture and extract to an explanation for the double descent phenomenon by considering a similar property resulting from shrinking intrinsic dimensionality of the distribution of potential weight set updates available along training path, where if that distribution retracts across a volume verses dimensionality curve peak when approaching the global minima we could expect geometric regularization to re-emerge. We illustrate how data fidelity representational complexity may influence model capacity double descent interpolation thresholds. The existence of epoch and model capacity double descent curves originating from different geometric forms may imply universality of closed n-manifolds having dimensionally adjusted n-sphere volumetric correspondence.  ( 2 min )
    Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt. (arXiv:2206.07137v3 [cs.LG] UPDATED)
    Training on web-scale data can take months. But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but principled technique which selects approximately those points for training that most reduce the model's generalization loss. As a result, RHO-LOSS mitigates the weaknesses of existing data selection methods: techniques from the optimization literature typically select 'hard' (e.g. high loss) points, but such points are often noisy (not learnable) or less task-relevant. Conversely, curriculum learning prioritizes 'easy' points, but such points need not be trained on once learned. In contrast, RHO-LOSS selects points that are learnable, worth learning, and not yet learnt. RHO-LOSS trains in far fewer steps than prior art, improves accuracy, and speeds up training on a wide range of datasets, hyperparameters, and architectures (MLPs, CNNs, and BERT). On the large web-scraped image dataset Clothing-1M, RHO-LOSS trains in 18x fewer steps and reaches 2% higher final accuracy than uniform data shuffling.  ( 3 min )
    RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-Text Adversarial Attacks. (arXiv:2207.06858v2 [cs.SD] UPDATED)
    This paper introduces a new synthesis-based defense algorithm for counteracting with a varieties of adversarial attacks developed for challenging the performance of the cutting-edge speech-to-text transcription systems. Our algorithm implements a Sobolev-based GAN and proposes a novel regularizer for effectively controlling over the functionality of the entire generative model, particularly the discriminator network during training. Our achieved results upon carrying out numerous experiments on the victim DeepSpeech, Kaldi, and Lingvo speech transcription systems corroborate the remarkable performance of our defense approach against a comprehensive range of targeted and non-targeted adversarial attacks.  ( 2 min )
    A Near-Optimal Algorithm for Univariate Zeroth-Order Budget Convex Optimization. (arXiv:2208.06720v2 [math.OC] UPDATED)
    This paper studies a natural generalization of the problem of minimizing a univariate convex function $f$ by querying its values sequentially. At each time-step $t$, the optimizer can invest a budget $b_t$ in a query point $X_t$ of their choice to obtain a fuzzy evaluation of $f$ at $X_t$ whose accuracy depends on the amount of budget invested in $X_t$ across times. This setting is motivated by the minimization of objectives whose values can only be determined approximately through lengthy or expensive computations. We design an any-time parameter-free algorithm called Dyadic Search, for which we prove near-optimal optimization error guarantees. As a byproduct of our analysis, we show that the classical dependence on the global Lipschitz constant in the error bounds is an artifact of the granularity of the budget. Finally, we illustrate our theoretical findings with numerical simulations.  ( 2 min )
    Multimodal Attention-based Deep Learning for Alzheimer's Disease Diagnosis. (arXiv:2206.08826v2 [cs.LG] UPDATED)
    Alzheimer's Disease (AD) is the most common neurodegenerative disorder with one of the most complex pathogeneses, making effective and clinically actionable decision support difficult. The objective of this study was to develop a novel multimodal deep learning framework to aid medical professionals in AD diagnosis. We present a Multimodal Alzheimer's Disease Diagnosis framework (MADDi) to accurately detect the presence of AD and mild cognitive impairment (MCI) from imaging, genetic, and clinical data. MADDi is novel in that we use cross-modal attention, which captures interactions between modalities - a method not previously explored in this domain. We perform multi-class classification, a challenging task considering the strong similarities between MCI and AD. We compare with previous state-of-the-art models, evaluate the importance of attention, and examine the contribution of each modality to the model's performance. MADDi classifies MCI, AD, and controls with 96.88% accuracy on a held-out test set. When examining the contribution of different attention schemes, we found that the combination of cross-modal attention with self-attention performed the best, and no attention layers in the model performed the worst, with a 7.9% difference in F1-Scores. Our experiments underlined the importance of structured clinical data to help machine learning models contextualize and interpret the remaining modalities. Extensive ablation studies showed that any multimodal mixture of input features without access to structured clinical information suffered marked performance losses. This study demonstrates the merit of combining multiple input modalities via cross-modal attention to deliver highly accurate AD diagnostic decision support.  ( 3 min )
    Fast Lifelong Adaptive Inverse Reinforcement Learning from Demonstrations. (arXiv:2209.11908v1 [cs.LG])
    Learning from Demonstration (LfD) approaches empower end-users to teach robots novel tasks via demonstrations of the desired behaviors, democratizing access to robotics. However, current LfD frameworks are not capable of fast adaptation to heterogeneous human demonstrations nor the large-scale deployment in ubiquitous robotics applications. In this paper, we propose a novel LfD framework, Fast Lifelong Adaptive Inverse Reinforcement learning (FLAIR). Our approach (1) leverages learned strategies to construct policy mixtures for fast adaptation to new demonstrations, allowing for quick end-user personalization; (2) distills common knowledge across demonstrations, achieving accurate task inference; and (3) expands its model only when needed in lifelong deployments, maintaining a concise set of prototypical strategies that can approximate all behaviors via policy mixtures. We empirically validate that FLAIR achieves adaptability (i.e., the robot adapts to heterogeneous, user-specific task preferences), efficiency (i.e., the robot achieves sample-efficient adaptation), and scalability (i.e., the model grows sublinearly with the number of demonstrations while maintaining high performance). FLAIR surpasses benchmarks across three continuous control tasks with an average 57% improvement in policy returns and an average 78% fewer episodes required for demonstration modeling using policy mixtures. Finally, we demonstrate the success of FLAIR in a real-robot table tennis task.  ( 2 min )
    FedVLN: Privacy-preserving Federated Vision-and-Language Navigation. (arXiv:2203.14936v3 [cs.AI] UPDATED)
    Data privacy is a central problem for embodied agents that can perceive the environment, communicate with humans, and act in the real world. While helping humans complete tasks, the agent may observe and process sensitive information of users, such as house environments, human activities, etc. In this work, we introduce privacy-preserving embodied agent learning for the task of Vision-and-Language Navigation (VLN), where an embodied agent navigates house environments by following natural language instructions. We view each house environment as a local client, which shares nothing other than local updates with the cloud server and other clients, and propose a novel federated vision-and-language navigation (FedVLN) framework to protect data privacy during both training and pre-exploration. Particularly, we propose a decentralized training strategy to limit the data of each client to its local model training and a federated pre-exploration method to do partial model aggregation to improve model generalizability to unseen environments. Extensive results on R2R and RxR datasets show that under our FedVLN framework, decentralized VLN models achieve comparable results with centralized training while protecting seen environment privacy, and federated pre-exploration significantly outperforms centralized pre-exploration while preserving unseen environment privacy.  ( 3 min )
    An Application of Online Learning to Spacecraft Memory Dump Optimization. (arXiv:2202.06617v2 [cs.LG] UPDATED)
    In this paper, we present a real-world application of online learning with expert advice to the field of Space Operations, testing our theory on real-life data coming from the Copernicus Sentinel-6 satellite. We show that in Spacecraft Memory Dump Optimization, a lightweight Follow-The-Leader algorithm leads to an increase in performance of over $60\%$ when compared to traditional techniques.  ( 2 min )
    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming. (arXiv:2209.08901v2 [math.OC] UPDATED)
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose an exact approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.  ( 3 min )
    EF-BV: A Unified Theory of Error Feedback and Variance Reduction Mechanisms for Biased and Unbiased Compression in Distributed Optimization. (arXiv:2205.04180v2 [cs.LG] UPDATED)
    In distributed or federated optimization and learning, communication between the different computing units is often the bottleneck and gradient compression is widely used to reduce the number of bits sent within each communication round of iterative methods. There are two classes of compression operators and separate algorithms making use of them. In the case of unbiased random compressors with bounded variance (e.g., rand-k), the DIANA algorithm of Mishchenko et al. (2019), which implements a variance reduction technique for handling the variance introduced by compression, is the current state of the art. In the case of biased and contractive compressors (e.g., top-k), the EF21 algorithm of Richt\'arik et al. (2021), which instead implements an error-feedback mechanism, is the current state of the art. These two classes of compression schemes and algorithms are distinct, with different analyses and proof techniques. In this paper, we unify them into a single framework and propose a new algorithm, recovering DIANA and EF21 as particular cases. Our general approach works with a new, larger class of compressors, which has two parameters, the bias and the variance, and includes unbiased and biased compressors as particular cases. This allows us to inherit the best of the two worlds: like EF21 and unlike DIANA, biased compressors, like top-k, whose good performance in practice is recognized, can be used. And like DIANA and unlike EF21, independent randomness at the compressors allows to mitigate the effects of compression, with the convergence rate improving when the number of parallel workers is large. This is the first time that an algorithm with all these features is proposed. We prove its linear convergence under certain conditions. Our approach takes a step towards better understanding of two so-far distinct worlds of communication-efficient distributed learning.  ( 3 min )
    RORL: Robust Offline Reinforcement Learning via Conservative Smoothing. (arXiv:2206.02829v2 [cs.LG] UPDATED)
    Offline reinforcement learning (RL) provides a promising direction to exploit the massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these OOD states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.  ( 2 min )
    Modeling Mask Uncertainty in Hyperspectral Image Reconstruction. (arXiv:2112.15362v4 [eess.IV] UPDATED)
    Recently, hyperspectral imaging (HSI) has attracted increasing research attention, especially for the ones based on a coded aperture snapshot spectral imaging (CASSI) system. Existing deep HSI reconstruction models are generally trained on paired data to retrieve original signals upon 2D compressed measurements given by a particular optical hardware mask in CASSI, during which the mask largely impacts the reconstruction performance and could work as a "model hyperparameter" governing on data augmentations. This mask-specific training style will lead to a hardware miscalibration issue, which sets up barriers to deploying deep HSI models among different hardware and noisy environments. To address this challenge, we introduce mask uncertainty for HSI with a complete variational Bayesian learning treatment and explicitly model it through a mask decomposition inspired by real hardware. Specifically, we propose a novel Graph-based Self-Tuning (GST) network to reason uncertainties adapting to varying spatial structures of masks among different hardware. Moreover, we develop a bilevel optimization framework to balance HSI reconstruction and uncertainty estimation, accounting for the hyperparameter property of masks. Extensive experimental results and model discussions validate the effectiveness (over 33/30 dB) of the proposed GST method under two miscalibration scenarios and demonstrate a highly competitive performance compared with the state-of-the-art well-calibrated methods. Our code and pre-trained model are available at https://github.com/Jiamian-Wang/mask_uncertainty_spectral_SCI  ( 3 min )
    Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks. (arXiv:2205.09653v2 [stat.ML] UPDATED)
    We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These kernel order parameters collectively define the hidden layer activation distribution, the evolution of the neural tangent kernel, and consequently output predictions. We show that the field theory derivation recovers the recursive stochastic process of infinite-width feature learning networks obtained from Yang and Hu (2021) with Tensor Programs . For deep linear networks, these kernels satisfy a set of algebraic matrix equations. For nonlinear networks, we provide an alternating sampling procedure to self-consistently solve for the kernel order parameters. We provide comparisons of the self-consistent solution to various approximation schemes including the static NTK approximation, gradient independence assumption, and leading order perturbation theory, showing that each of these approximations can break down in regimes where general self-consistent solutions still provide an accurate description. Lastly, we provide experiments in more realistic settings which demonstrate that the loss and kernel dynamics of CNNs at fixed feature learning strength is preserved across different widths on a CIFAR classification task.
    Applying Machine Learning to Life Insurance: some knowledge sharing to master it. (arXiv:2209.02057v2 [stat.ML] UPDATED)
    Machine Learning permeates many industries, which brings new source of benefits for companies. However within the life insurance industry, Machine Learning is not widely used in practice as over the past years statistical models have shown their efficiency for risk assessment. Thus insurers may face difficulties to assess the value of the artificial intelligence. Focusing on the modification of the life insurance industry over time highlights the stake of using Machine Learning for insurers and benefits that it can bring by unleashing data value. This paper reviews traditional actuarial methodologies for survival modeling and extends them with Machine Learning techniques. It points out differences with regular machine learning models and emphasizes importance of specific implementations to face censored data with machine learning models family.In complement to this article, a Python library has been developed. Different open-source Machine Learning algorithms have been adjusted to adapt the specificities of life insurance data, namely censoring and truncation. Such models can be easily applied from this SCOR library to accurately model life insurance risks.
    Efficient Reconstruction of Stochastic Pedigrees: Some Steps From Theory to Practice. (arXiv:2204.04573v2 [q-bio.PE] UPDATED)
    In an extant population, how much information do extant individuals provide on the pedigree of their ancestors? Recent work by Kim, Mossel, Ramnarayan and Turner (2020) studied this question under a number of simplifying assumptions, including random mating, fixed length inheritance blocks and sufficiently large founding population. They showed that under these conditions if the average number of offspring is a sufficiently large constant, then it is possible to recover a large fraction of the pedigree structure and genetic content by an algorithm they named REC-GEN. We are interested in studying the performance of REC-GEN on simulated data generated according to the model. As a first step, we improve the running time of the algorithm. However, we observe that even the faster version of the algorithm does not do well in any simulations in recovering the pedigree beyond 2 generations. We claim that this is due to the inbreeding present in any setting where the algorithm can be run, even on simulated data. To support the claim we show that a main step of the algorithm, called ancestral reconstruction, performs accurately in a idealized setting with no inbreeding but performs poorly in random mating populations. To overcome the poor behavior of REC-GEN we introduce a Belief-Propagation based heuristic that accounts for the inbreeding and performs much better in our simulations.
    Random Feature Amplification: Feature Learning and Generalization in Neural Networks. (arXiv:2202.07626v3 [cs.LG] UPDATED)
    In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.
    When to Classify Events in Open Times Series?. (arXiv:2204.00392v2 [cs.LG] UPDATED)
    In numerous applications, for instance in predictive maintenance, there is a pression to predict events ahead of time with as much accuracy as possible while not delaying the decision unduly. This translates in the optimization of a trade-off between earliness and accuracy of the decisions, that has been the subject of research for time series of finite length and with a unique label. And this has led to powerful algorithms for Early Classification of Time Series (ECTS). This paper, for the first time, investigates such a trade-off when events of different classes occur in a streaming fashion, with no predefined end. In the Early Classification in Open Time Series problem (ECOTS), the task is to predict events, i.e. their class and time interval, at the moment that optimizes the accuracy vs. earliness trade-off. Interestingly, we find that ECTS algorithms can be sensibly adapted in a principled way to this new problem. We illustrate our methodology by transforming two state-of-the-art ECTS algorithms for the ECOTS scenario. Among the wide variety of applications that this new approach opens up, we develop a predictive maintenance use case that optimizes alarm triggering times, thus demonstrating the power of this new approach.  ( 3 min )
    Exploiting the Relationship Between Kendall's Rank Correlation and Cosine Similarity for Attribution Protection. (arXiv:2205.07279v2 [cs.LG] UPDATED)
    Model attributions are important in deep neural networks as they aid practitioners in understanding the models, but recent studies reveal that attributions can be easily perturbed by adding imperceptible noise to the input. The non-differentiable Kendall's rank correlation is a key performance index for attribution protection. In this paper, we first show that the expected Kendall's rank correlation is positively correlated to cosine similarity and then indicate that the direction of attribution is the key to attribution robustness. Based on these findings, we explore the vector space of attribution to explain the shortcomings of attribution defense methods using $\ell_p$ norm and propose integrated gradient regularizer (IGR), which maximizes the cosine similarity between natural and perturbed attributions. Our analysis further exposes that IGR encourages neurons with the same activation states for natural samples and the corresponding perturbed samples, which is shown to induce robustness to gradient-based attribution methods. Our experiments on different models and datasets confirm our analysis on attribution protection and demonstrate a decent improvement in adversarial robustness.  ( 2 min )
    AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time. (arXiv:2204.13307v3 [cs.LG] UPDATED)
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with temporal difference (TD) learning agents. We wrap MCTS for the first time around TD n-tuple networks and we use this wrapping only at test time to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other learning-from-scratch algorithms could only defeat Edax up to level 2).  ( 3 min )
    How Does Data Freshness Affect Real-time Supervised Learning?. (arXiv:2208.06948v2 [cs.NI] UPDATED)
    In this paper, we analyze the impact of data freshness on real-time supervised learning, where a neural network is trained to infer a time-varying target (e.g., the position of the vehicle in front) based on features (e.g., video frames) observed at a sensing node (e.g., camera or lidar). One might expect that the performance of real-time supervised learning degrades monotonically as the feature becomes stale. Using an information-theoretic analysis, we show that this is true if the feature and target data sequence can be closely approximated as a Markov chain; it is not true if the data sequence is far from Markovian. Hence, the prediction error of real-time supervised learning is a function of the Age of Information (AoI), where the function could be non-monotonic. Several experiments are conducted to illustrate the monotonic and non-monotonic behaviors of the prediction error. To minimize the inference error in real-time, we propose a new "selection-from-buffer" model for sending the features, which is more general than the "generate-at-will" model used in earlier studies. By using Gittins and Whittle indices, low-complexity scheduling strategies are developed to minimize the inference error, where a new connection between the Gittins index theory and Age of Information (AoI) minimization is discovered. These scheduling results hold (i) for minimizing general AoI functions (monotonic or non-monotonic) and (ii) for general feature transmission time distributions. Data-driven evaluations are presented to illustrate the benefits of the proposed scheduling algorithms.  ( 3 min )
    Fair Incentives for Repeated Engagement. (arXiv:2111.00002v2 [cs.GT] UPDATED)
    We study a decision-maker's problem of finding optimal monetary incentive schemes when faced with agents whose participation decisions (stochastically) depend on the incentive they receive. Our focus is on policies constrained to fulfill two fairness properties that preclude outcomes wherein different groups of agents experience different treatment on average. We formulate the problem as a high-dimensional stochastic optimization problem, and study it through the use of a closely related deterministic variant. We show that the optimal static solution to this deterministic variant is asymptotically optimal for the dynamic problem under fairness constraints. Though solving for the optimal static solution gives rise to a non-convex optimization problem, we uncover a structural property that allows us to design a tractable, fast-converging heuristic policy. Traditional schemes for stakeholder retention ignore fairness constraints; indeed, the goal in these is to use differentiation to incentivize repeated engagement with the system. Our work (i) shows that even in the absence of explicit discrimination, dynamic policies may unintentionally discriminate between agents of different types by varying the type composition of the system, and (ii) presents an asymptotically optimal policy to avoid such discriminatory outcomes.
    Contextual Squeeze-and-Excitation for Efficient Few-Shot Image Classification. (arXiv:2206.09843v2 [cs.CV] UPDATED)
    Recent years have seen a growth in user-centric applications that require effective knowledge transfer across tasks in the low-data regime. An example is personalization, where a pretrained system is adapted by learning on small amounts of labeled data belonging to a specific user. This setting requires high accuracy under low computational complexity, therefore the Pareto frontier of accuracy vs. adaptation cost plays a crucial role. In this paper we push this Pareto frontier in the few-shot image classification setting with a key contribution: a new adaptive block called Contextual Squeeze-and-Excitation (CaSE) that adjusts a pretrained neural network on a new task to significantly improve performance with a single forward pass of the user data (context). We use meta-trained CaSE blocks to conditionally adapt the body of a network and a fine-tuning routine to adapt a linear head, defining a method called UpperCaSE. UpperCaSE achieves a new state-of-the-art accuracy relative to meta-learners on the 26 datasets of VTAB+MD and on a challenging real-world personalization benchmark (ORBIT), narrowing the gap with leading fine-tuning methods with the benefit of orders of magnitude lower adaptation cost.
    On the speed of uniform convergence in Mercer's theorem. (arXiv:2205.00487v2 [cs.LG] UPDATED)
    The classical Mercer's theorem claims that a continuous positive definite kernel $K({\mathbf x}, {\mathbf y})$ on a compact set can be represented as $\sum_{i=1}^\infty \lambda_i\phi_i({\mathbf x})\phi_i({\mathbf y})$ where $\{(\lambda_i,\phi_i)\}$ are eigenvalue-eigenvector pairs of the corresponding integral operator. This infinite representation is known to converge uniformly to the kernel $K$. We estimate the speed of this convergence in terms of the decay rate of eigenvalues and demonstrate that for $2m$ times differentiable kernels the first $N$ terms of the series approximate $K$ as $\mathcal{O}\big((\sum_{i=N+1}^\infty\lambda_i)^{\frac{m}{m+n}}\big)$ or $\mathcal{O}\big((\sum_{i=N+1}^\infty\lambda^2_i)^{\frac{m}{2m+n}}\big)$. Finally, we demonstrate some applications of our results to a spectral charaterization of integral operators with continuous roots and other powers.
    MLExchange: A web-based platform enabling exchangeable machine learning workflows for scientific studies. (arXiv:2208.09751v3 [cs.LG] UPDATED)
    Machine learning (ML) algorithms are showing a growing trend in helping the scientific communities across different disciplines and institutions to address large and diverse data problems. However, many available ML tools are programmatically demanding and computationally costly. The MLExchange project aims to build a collaborative platform equipped with enabling tools that allow scientists and facility users who do not have a profound ML background to use ML and computational resources in scientific discovery. At the high level, we are targeting a full user experience where managing and exchanging ML algorithms, workflows, and data are readily available through web applications. Since each component is an independent container, the whole platform or its individual service(s) can be easily deployed at servers of different scales, ranging from a personal device (laptop, smart phone, etc.) to high performance clusters (HPC) accessed (simultaneously) by many users. Thus, MLExchange renders flexible using scenarios -- users could either access the services and resources from a remote server or run the whole platform or its individual service(s) within their local network.
    SCALE: Online Self-Supervised Lifelong Learning without Prior Knowledge. (arXiv:2208.11266v2 [cs.LG] UPDATED)
    Unsupervised lifelong learning refers to the ability to learn over time while memorizing previous patterns without supervision. Previous works assumed strong prior knowledge about the incoming data (e.g., knowing the class boundaries) which can be impossible to obtain in complex and unpredictable environments. In this paper, motivated by real-world scenarios, we formally define the online unsupervised lifelong learning problem with class-incremental streaming data, which is non-iid and single-pass. The problem is more challenging than existing lifelong learning problems due to the absence of labels and prior knowledge. To address the issue, we propose Self-Supervised ContrAstive Lifelong LEarning (SCALE) which extracts and memorizes knowledge on-the-fly. SCALE is designed around three major components: a pseudo-supervised contrastive loss, a self-supervised forgetting loss, and an online memory update for uniform subset selection. All three components are designed to work collaboratively to maximize learning performance. Our loss functions leverage pairwise similarity thus remove the dependency on supervision or prior knowledge. We perform comprehensive experiments of SCALE under iid and four non-iid data streams. SCALE outperforms the best state-of-the-art algorithm on all settings with improvements of up to 3.83%, 2.77% and 5.86% kNN accuracy on CIFAR-10, CIFAR-100 and SubImageNet datasets.
    Error-correcting neural networks for semi-Lagrangian advection in the level-set method. (arXiv:2110.11611v3 [cs.LG] UPDATED)
    We present a machine learning framework that blends image super-resolution technologies with passive, scalar transport in the level-set method. Here, we investigate whether we can compute on-the-fly, data-driven corrections to minimize numerical viscosity in the coarse-mesh evolution of an interface. The proposed system's starting point is the semi-Lagrangian formulation. And, to reduce numerical dissipation, we introduce an error-quantifying multilayer perceptron. The role of this neural network is to improve the numerically estimated surface trajectory. To do so, it processes localized level-set, velocity, and positional data in a single time frame for select vertices near the moving front. Our main contribution is thus a novel machine-learning-augmented transport algorithm that operates alongside selective redistancing and alternates with conventional advection to keep the adjusted interface trajectory smooth. Consequently, our procedure is more efficient than full-scan convolutional-based applications because it concentrates computational effort only around the free boundary. Also, we show through various tests that our strategy is effective at counteracting both numerical diffusion and mass loss. In simple advection problems, for example, our method can achieve the same precision as the baseline scheme at twice the resolution but at a fraction of the cost. Similarly, our hybrid technique can produce feasible solidification fronts for crystallization processes. On the other hand, tangential shear flows and highly deforming simulations can precipitate bias artifacts and inference deterioration. Likewise, stringent design velocity constraints can limit our solver's application to problems involving rapid interface changes. In the latter cases, we have identified several opportunities to enhance robustness without forgoing our approach's basic concept.
    FedOBD: Opportunistic Block Dropout for Efficiently Training Large-scale Neural Networks through Federated Learning. (arXiv:2208.05174v2 [cs.LG] UPDATED)
    Large-scale neural networks possess considerable expressive power. They are well-suited for complex learning tasks in industrial applications. However, large-scale models pose significant challenges for training under the current Federated Learning (FL) paradigm. Existing approaches for efficient FL training often leverage model parameter dropout. However, manipulating individual model parameters is not only inefficient in meaningfully reducing the communication overhead when training large-scale FL models, but may also be detrimental to the scaling efforts and model performance as shown by recent research. To address these issues, we propose the Federated Opportunistic Block Dropout (FedOBD) approach. The key novelty is that it decomposes large-scale models into semantic blocks so that FL participants can opportunistically upload quantized blocks, which are deemed to be significant towards training the model, to the FL server for aggregation. Extensive experiments evaluating FedOBD against five state-of-the-art approaches based on multiple real-world datasets show that it reduces the overall communication overhead by more than 70% compared to the best performing baseline approach, while achieving the highest test accuracy. To the best of our knowledge, FedOBD is the first approach to perform dropout on FL models at the block level rather than at the individual parameter level.
    Interpretable Machine Learning Models for Modal Split Prediction in Transportation Systems. (arXiv:2203.14191v2 [cs.LG] UPDATED)
    Modal split prediction in transportation networks has the potential to support network operators in managing traffic congestion and improving transit service reliability. We focus on the problem of hourly prediction of the fraction of travelers choosing one mode of transportation over another using high-dimensional travel time data. We use logistic regression as base model and employ various regularization techniques for variable selection to prevent overfitting and resolve multicollinearity issues. Importantly, we interpret the prediction accuracy results with respect to the inherent variability of modal splits and travelers' aggregate responsiveness to changes in travel time. By visualizing model parameters, we conclude that the subset of segments found important for predictive accuracy changes from hour-to-hour and include segments that are topologically central and/or highly congested. We apply our approach to the San Francisco Bay Area freeway and rapid transit network and demonstrate superior prediction accuracy and interpretability of our method compared to pre-specified variable selection methods.
    A Concise Framework of Memory Efficient Training via Dual Activation Precision. (arXiv:2208.04187v2 [cs.LG] UPDATED)
    Activation compressed training~(ACT) has been shown to be a promising way to reduce the memory cost of training deep neural networks~(DNNs). However, existing work of ACT relies on searching for optimal bit-width during DNN training to reduce the quantization noise, which makes the procedure complicated and less transparent. To this end, we propose a simple and effective method to compress DNN training. Our method is motivated by an instructive observation: \emph{DNN backward propagation mainly utilizes the low-frequency component~(LFC) of the activation maps, while the majority of memory is for caching the high-frequency component~(HFC) during the training}. This indicates the HFC of activation maps is highly redundant and compressible during DNN training, which inspires our proposed Dual Activation Precision~(DIVISION). During the training, DIVISION preserves the high-precision copy of LFC and compresses the HFC into a light-weight copy with low numerical precision. This can significantly reduce the memory cost without negatively affecting the precision of backward propagation such that DIVISION maintains competitive model accuracy. Experimental results show DIVISION achieves over $10\times$ compression of activation maps, and significantly higher training throughput than state-of-the-art ACT methods, without loss of model accuracy.
    Robust Reinforcement Learning as a Stackelberg Game via Adaptively-Regularized Adversarial Training. (arXiv:2202.09514v2 [cs.LG] UPDATED)
    Robust Reinforcement Learning (RL) focuses on improving performances under model errors or adversarial attacks, which facilitates the real-life deployment of RL agents. Robust Adversarial Reinforcement Learning (RARL) is one of the most popular frameworks for robust RL. However, most of the existing literature models RARL as a zero-sum simultaneous game with Nash equilibrium as the solution concept, which could overlook the sequential nature of RL deployments, produce overly conservative agents, and induce training instability. In this paper, we introduce a novel hierarchical formulation of robust RL - a general-sum Stackelberg game model called RRL-Stack - to formalize the sequential nature and provide extra flexibility for robust training. We develop the Stackelberg Policy Gradient algorithm to solve RRL-Stack, leveraging the Stackelberg learning dynamics by considering the adversary's response. Our method generates challenging yet solvable adversarial environments which benefit RL agents' robust learning. Our algorithm demonstrates better training stability and robustness against different testing conditions in the single-agent robotics control and multi-agent highway merging tasks.
    Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret. (arXiv:2205.12418v2 [cs.LG] UPDATED)
    We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies $\pi^{\text{O}}$ and $\pi^{\text{E}}$: $\pi^{\text{O}}$ ("O" for "online") interacts with more risk-tolerant users from the first tier and minimizes regret by balancing exploration and exploitation as usual, while $\pi^{\text{E}}$ ("E" for "exploit") exclusively focuses on exploitation for risk-averse users from the second tier utilizing the data collected so far. An important question is whether such a separation yields advantages over the standard online setting (i.e., $\pi^{\text{E}}=\pi^{\text{O}}$) for the risk-averse users. We individually consider the gap-independent vs.~gap-dependent settings. For the former, we prove that the separation is indeed not beneficial from a minimax perspective. For the latter, we show that if choosing Pessimistic Value Iteration as the exploitation algorithm to produce $\pi^{\text{E}}$, we can achieve a constant regret for risk-averse users independent of the number of episodes $K$, which is in sharp contrast to the $\Omega(\log K)$ regret for any online RL algorithms in the same setting, while the regret of $\pi^{\text{O}}$ (almost) maintains its online regret optimality and does not need to compromise for the success of $\pi^{\text{E}}$.
    VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning. (arXiv:2202.10324v2 [cs.CV] UPDATED)
    We propose VRL3, a powerful data-driven framework with a simple design for solving challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major obstacles in taking a data-driven approach, and present a suite of design principles, novel findings, and critical insights about data-driven visual DRL. Our framework has three stages: in stage 1, we leverage non-RL datasets (e.g. ImageNet) to learn task-agnostic visual representations; in stage 2, we use offline RL data (e.g. a limited number of expert demonstrations) to convert the task-agnostic representations into more powerful task-specific representations; in stage 3, we fine-tune the agent with online RL. On a set of challenging hand manipulation tasks with sparse reward and realistic visual inputs, compared to the previous SOTA, VRL3 achieves an average of 780% better sample efficiency. And on the hardest task, VRL3 is 1220% more sample efficient (2440% when using a wider encoder) and solves the task with only 10% of the computation. These significant results clearly demonstrate the great potential of data-driven deep reinforcement learning.
    A Framework for Adversarial Streaming via Differential Privacy and Difference Estimators. (arXiv:2107.14527v2 [cs.DS] UPDATED)
    Classical streaming algorithms operate under the (not always reasonable) assumption that the input stream is fixed in advance. Recently, there is a growing interest in designing robust streaming algorithms that provide provable guarantees even when the input stream is chosen adaptively as the execution progresses. We propose a new framework for robust streaming that combines techniques from two recently suggested frameworks by Hassidim et al. [NeurIPS 2020] and by Woodruff and Zhou [FOCS 2021]. These recently suggested frameworks rely on very different ideas, each with its own strengths and weaknesses. We combine these two frameworks into a single hybrid framework that obtains the ``best of both worlds'', thereby solving a question left open by Woodruff and Zhou.
    How do Variational Autoencoders Learn? Insights from Representational Similarity. (arXiv:2205.08399v3 [cs.LG] UPDATED)
    The ability of Variational Autoencoders (VAEs) to learn disentangled representations has made them popular for practical applications. However, their behaviour is not yet fully understood. For example, the questions of when they can provide disentangled representations, or suffer from posterior collapse are still areas of active research. Despite this, there are no layerwise comparisons of the representations learned by VAEs, which would further our understanding of these models. In this paper, we thus look into the internal behaviour of VAEs using representational similarity techniques. Specifically, using the CKA and Procrustes similarities, we found that the encoders' representations are learned long before the decoders', and this behaviour is independent of hyperparameters, learning objectives, and datasets. Moreover, the encoders' representations in all but the mean and variance layers are similar across hyperparameters and learning objectives.
    Towards Auditing Unsupervised Learning Algorithms and Human Processes For Fairness. (arXiv:2209.11762v1 [cs.AI])
    Existing work on fairness typically focuses on making known machine learning algorithms fairer. Fair variants of classification, clustering, outlier detection and other styles of algorithms exist. However, an understudied area is the topic of auditing an algorithm's output to determine fairness. Existing work has explored the two group classification problem for binary protected status variables using standard definitions of statistical parity. Here we build upon the area of auditing by exploring the multi-group setting under more complex definitions of fairness.
    A unified framework for dataset shift diagnostics. (arXiv:2205.08340v2 [stat.ML] UPDATED)
    Most machine learning (ML) methods assume that the data used in the training phase comes from the target population. However, in practice one often faces dataset shift, which, if not properly taken into account, may decrease the predictive performance of the ML models. In general, if the practitioner knows which type of shift is taking place -- e.g., covariate shift or label shift -- they may apply transfer learning methods to obtain better predictions. Unfortunately, current methods for detecting shift are only designed to detect specific types of shift or cannot formally test their presence. We introduce a general and unified framework that gives insights on how to improve prediction methods by detecting the presence of different types of shift and quantifying how strong they are. Our approach can be used for any data type (tabular/image/text) and both for classification and regression tasks. Moreover, it uses formal hypotheses tests that controls false alarms. We illustrate how our framework is useful in practice using both artificial and real datasets, including an example of how our framework leads to insights that indeed improve the predictive power of a supervised model. Our package for dataset shift detection can be found in https://github.com/felipemaiapolo/detectshift.
    Task-Agnostic Graph Explanations. (arXiv:2202.08335v2 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have emerged as powerful tools to encode graph-structured data. Due to their broad applications, there is an increasing need to develop tools to explain how GNNs make decisions given graph-structured data. Existing learning-based GNN explanation approaches are task-specific in training and hence suffer from crucial drawbacks. Specifically, they are incapable of producing explanations for a multitask prediction model with a single explainer. They are also unable to provide explanations in cases where the GNN is trained in a self-supervised manner, and the resulting representations are used in future downstream tasks. To address these limitations, we propose a Task-Agnostic GNN Explainer (TAGE) that is independent of downstream models and trained under self-supervision with no knowledge of downstream tasks. TAGE enables the explanation of GNN embedding models with unseen downstream tasks and allows efficient explanation of multitask models. Our extensive experiments show that TAGE can significantly speed up the explanation efficiency by using the same model to explain predictions for multiple downstream tasks while achieving explanation quality as good as or even better than current state-of-the-art GNN explanation approaches. Our code is pubicly available as part of the DIG library at https://github.com/divelab/DIG/tree/main/dig/xgraph/TAGE/.
    Learning Bidirectional Translation between Descriptions and Actions with Small Paired Data. (arXiv:2203.04218v2 [cs.RO] UPDATED)
    This study achieved bidirectional translation between descriptions and actions using small paired data from different modalities. The ability to mutually generate descriptions and actions is essential for robots to collaborate with humans in their daily lives, which generally requires a large dataset that maintains comprehensive pairs of both modality data. However, a paired dataset is expensive to construct and difficult to collect. To address this issue, this study proposes a two-stage training method for bidirectional translation. In the proposed method, we train recurrent autoencoders (RAEs) for descriptions and actions with a large amount of non-paired data. Then, we finetune the entire model to bind their intermediate representations using small paired data. Because the data used for pre-training do not require pairing, behavior-only data or a large language corpus can be used. We experimentally evaluated our method using a paired dataset consisting of motion-captured actions and descriptions. The results showed that our method performed well, even when the amount of paired data to train was small. The visualization of the intermediate representations of each RAE showed that similar actions were encoded in a clustered position and the corresponding feature vectors were well aligned.
    Unsupervised Model-based Pre-training for Data-efficient Control from Pixels. (arXiv:2209.12016v1 [cs.AI])
    Controlling artificial agents from visual sensory data is an arduous task. Reinforcement learning (RL) algorithms can succeed in this but require large amounts of interactions between the agent and the environment. To alleviate the issue, unsupervised RL proposes to employ self-supervised interaction and learning, for adapting faster to future tasks. Yet, whether current unsupervised strategies improve generalization capabilities is still unclear, especially in visual control settings. In this work, we design an effective unsupervised RL strategy for data-efficient visual control. First, we show that world models pre-trained with data collected using unsupervised RL can facilitate adaptation for future tasks. Then, we analyze several design choices to adapt efficiently, effectively reusing the agents' pre-trained components, and learning and planning in imagination, with our hybrid planner, which we dub Dyna-MPC. By combining the findings of a large-scale empirical study, we establish an approach that strongly improves performance on the Unsupervised RL Benchmark, requiring 20$\times$ less data to match the performance of supervised methods. The approach also demonstrates robust performance on the Real-Word RL benchmark, hinting that the approach generalizes to noisy environments.
    Feature Encodings for Gradient Boosting with Automunge. (arXiv:2209.12309v1 [cs.LG])
    Selecting a default feature encoding strategy for gradient boosted learning may consider metrics of training duration and achieved predictive performance associated with the feature representations. The Automunge library for dataframe preprocessing offers a default of binarization for categoric features and z-score normalization for numeric. The presented study sought to validate those defaults by way of benchmarking on a series of diverse data sets by encoding variations with tuned gradient boosted learning. We found that on average our chosen defaults were top performers both from a tuning duration and a model performance standpoint. Another key finding was that one hot encoding did not perform in a manner consistent with suitability to serve as a categoric default in comparison to categoric binarization. We present here these and further benchmarks.
    Gradient Optimization for Single-State RMDPs. (arXiv:2209.12295v1 [cs.LG])
    As modern problems such as autonomous driving, control of robotic components, and medical diagnostics have become increasingly difficult to solve analytically, data-driven decision-making has seen a large gain in interest. Where there are problems with more dimensions of complexity than can be understood by people, data-driven solutions are a strong option. Many of these methods belong to a subdivision of machine learning known as reinforcement learning. Unfortunately, data-driven models often come with uncertainty in how they will perform in the worst of scenarios. Since the solutions are not derived analytically many times, these models will fail unpredictably. In fields such as autonomous driving and medicine, the consequences of these failures could be catastrophic. Various methods are being explored to resolve this issue and one of them is known as adversarial learning. It pits two models against each other by having one model optimize its goals as the opposite of the other model's goals. This type of training has the potential to find models which perform reliably in complex and high stakes settings, although it is not certain when this type of training will work. The goal is to gain insight about when these types of models will reach stable solutions.
    Unsupervised Reward Shaping for a Robotic Sequential Picking Task from Visual Observations in a Logistics Scenario. (arXiv:2209.12350v1 [cs.RO])
    We focus on an unloading problem, typical of the logistics sector, modeled as a sequential pick-and-place task. In this type of task, modern machine learning techniques have shown to work better than classic systems since they are more adaptable to stochasticity and better able to cope with large uncertainties. More specifically, supervised and imitation learning have achieved outstanding results in this regard, with the shortcoming of requiring some form of supervision which is not always obtainable for all settings. On the other hand, reinforcement learning (RL) requires much milder form of supervision but still remains impracticable due to its inefficiency. In this paper, we propose and theoretically motivate a novel Unsupervised Reward Shaping algorithm from expert's observations which relaxes the level of supervision required by the agent and works on improving RL performance in our task.
    Valuation of Public Bus Electrification with Open Data. (arXiv:2209.12107v1 [eess.SY])
    This research provides a novel framework to estimate the economic, environmental, and social values of electrifying public transit buses, for cities across the world, based on open-source data. Electric buses are a compelling candidate to replace diesel buses for the environmental and social benefits. However, the state-of-art models to evaluate the value of bus electrification are limited in applicability because they require granular and bespoke data on bus operation that can be difficult to procure. Our valuation tool uses General Transit Feed Specification, a standard data format used by transit agencies worldwide, to provide high-level guidance on developing a prioritization strategy for electrifying a bus fleet. We develop physics-informed machine learning models to evaluate the energy consumption, the carbon emissions, the health impacts, and the total cost of ownership for each transit route. We demonstrate the scalability of our tool with a case study of the bus lines in the Greater Boston and Milan metropolitan areas.
    Explainable Reinforcement Learning via Model Transforms. (arXiv:2209.12006v1 [cs.AI])
    Understanding emerging behaviors of reinforcement learning (RL) agents may be difficult since such agents are often trained in complex environments using highly complex decision making procedures. This has given rise to a variety of approaches to explainability in RL that aim to reconcile discrepancies that may arise between the behavior of an agent and the behavior that is anticipated by an observer. Most recent approaches have relied either on domain knowledge, that may not always be available, on an analysis of the agent's policy, or on an analysis of specific elements of the underlying environment, typically modeled as a Markov Decision Process (MDP). Our key claim is that even if the underlying MDP is not fully known (e.g., the transition probabilities have not been accurately learned) or is not maintained by the agent (i.e., when using model-free methods), it can nevertheless be exploited to automatically generate explanations. For this purpose, we suggest using formal MDP abstractions and transforms, previously used in the literature for expediting the search for optimal policies, to automatically produce explanations. Since such transforms are typically based on a symbolic representation of the environment, they may represent meaningful explanations for gaps between the anticipated and actual agent behavior. We formally define this problem, suggest a class of transforms that can be used for explaining emergent behaviors, and suggest methods that enable efficient search for an explanation. We demonstrate the approach on a set of standard benchmarks.
    A Review on Deep Learning in Medical Image Reconstruction. (arXiv:1906.10643v2 [eess.IV] UPDATED)
    Medical imaging is crucial in modern clinics to guide the diagnosis and treatment of diseases. Medical image reconstruction is one of the most fundamental and important components of medical imaging, whose major objective is to acquire high-quality medical images for clinical usage at the minimal cost and risk to the patients. Mathematical models in medical image reconstruction or, more generally, image restoration in computer vision, have been playing a prominent role. Earlier mathematical models are mostly designed by human knowledge or hypothesis on the image to be reconstructed, and we shall call these models handcrafted models. Later, handcrafted plus data-driven modeling started to emerge which still mostly relies on human designs, while part of the model is learned from the observed data. More recently, as more data and computation resources are made available, deep learning based models (or deep models) pushed the data-driven modeling to the extreme where the models are mostly based on learning with minimal human designs. Both handcrafted and data-driven modeling have their own advantages and disadvantages. One of the major research trends in medical imaging is to combine handcrafted modeling with deep modeling so that we can enjoy benefits from both approaches. The major part of this article is to provide a conceptual review of some recent works on deep modeling from the unrolling dynamics viewpoint. This viewpoint stimulates new designs of neural network architectures with inspirations from optimization algorithms and numerical differential equations. Given the popularity of deep modeling, there are still vast remaining challenges in the field, as well as opportunities which we shall discuss at the end of this article.
    An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem. (arXiv:2209.12108v1 [cs.LG])
    We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully adaptive}$ setting, where the algorithm can make updates after every comparison. The "batched" dueling bandit problem is motivated by large-scale applications like web search ranking and recommendation systems, where performing sequential updates may be infeasible. In this work, we ask: $\textit{is there a solution using only a few adaptive rounds that matches the asymptotic regret bounds of the best sequential algorithms for $K$-armed dueling bandits?}$ We answer this in the affirmative $\textit{under the Condorcet condition}$, a standard setting of the $K$-armed dueling bandit problem. We obtain asymptotic regret of $O(K^2\log^2(K)) + O(K\log(T))$ in $O(\log(T))$ rounds, where $T$ is the time horizon. Our regret bounds nearly match the best regret bounds known in the fully sequential setting under the Condorcet condition. Finally, in computational experiments over a variety of real-world datasets, we observe that our algorithm using $O(\log(T))$ rounds achieves almost the same performance as fully sequential algorithms (that use $T$ rounds).
    Communication-Efficient {Federated} Learning Using Censored Heavy Ball Descent. (arXiv:2209.11944v1 [cs.LG])
    Distributed machine learning enables scalability and computational offloading, but requires significant levels of communication. Consequently, communication efficiency in distributed learning settings is an important consideration, especially when the communications are wireless and battery-driven devices are employed. In this paper we develop a censoring-based heavy ball (CHB) method for distributed learning in a server-worker architecture. Each worker self-censors unless its local gradient is sufficiently different from the previously transmitted one. The significant practical advantages of the HB method for learning problems are well known, but the question of reducing communications has not been addressed. CHB takes advantage of the HB smoothing to eliminate reporting small changes, and provably achieves a linear convergence rate equivalent to that of the classical HB method for smooth and strongly convex objective functions. The convergence guarantee of CHB is theoretically justified for both convex and nonconvex cases. In addition we prove that, under some conditions, at least half of all communications can be eliminated without any impact on convergence rate. Extensive numerical results validate the communication efficiency of CHB on both synthetic and real datasets, for convex, nonconvex, and nondifferentiable cases. Given a target accuracy, CHB can significantly reduce the number of communications compared to existing algorithms, achieving the same accuracy without slowing down the optimization process.
    Constitutive model characterization and discovery using physics-informed deep learning. (arXiv:2203.09789v2 [cs.LG] UPDATED)
    Classically, the mechanical response of materials is described through constitutive models, often in the form of constrained ordinary differential equations. These models have a very limited number of parameters, yet, they are extremely efficient in reproducing complex responses observed in experiments. Additionally, in their discretized form, they are computationally very efficient, often resulting in a simple algebraic relation, and therefore they have been extensively used within large-scale explicit and implicit finite element models. However, it is very challenging to formulate new constitutive models, particularly for materials with complex microstructures such as composites. A recent trend in constitutive modeling leverages complex neural network architectures to construct complex material responses where a constitutive model does not yet exist. Whilst very accurate, they suffer from two deficiencies. First, they are interpolation models and often do poorly in extrapolation. Second, due to their complex architecture and numerous parameters, they are inefficient to be used as a constitutive model within a large-scale finite element model. In this study, we propose a novel approach based on the physics-informed learning machines for the characterization and discovery of constitutive models. Unlike data-driven constitutive models, we leverage foundations of elastoplasticity theory as regularization terms in the total loss function to find parametric constitutive models that are also theoretically sound. We demonstrate that our proposed framework can efficiently identify the underlying constitutive model describing different datasets from the von Mises family.
    Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning. (arXiv:2209.08842v2 [cs.LG] UPDATED)
    Exploration is critical for deep reinforcement learning in complex environments with high-dimensional observations and sparse rewards. To address this problem, recent approaches proposed to leverage intrinsic rewards to improve exploration, such as novelty-based exploration and prediction-based exploration. However, many intrinsic reward modules require sophisticated structures and representation learning, resulting in prohibitive computational complexity and unstable performance. In this paper, we propose Rewarding Episodic Visitation Discrepancy (REVD), a computation-efficient and quantified exploration method. More specifically, REVD provides intrinsic rewards by evaluating the R\'enyi divergence-based visitation discrepancy between episodes. To make efficient divergence estimation, a k-nearest neighbor estimator is utilized with a randomly-initialized state encoder. Finally, the REVD is tested on Atari games and PyBullet Robotics Environments. Extensive experiments demonstrate that REVD can significantly improves the sample efficiency of reinforcement learning algorithms and outperforms the benchmarking methods.
    Capacity dependent analysis for functional online learning algorithms. (arXiv:2209.12198v1 [stat.ML])
    This article provides convergence analysis of online stochastic gradient descent algorithms for functional linear models. Adopting the characterizations of the slope function regularity, the kernel space capacity, and the capacity of the sampling process covariance operator, significant improvement on the convergence rates is achieved. Both prediction problems and estimation problems are studied, where we show that capacity assumption can alleviate the saturation of the convergence rate as the regularity of the target function increases. We show that with properly selected kernel, capacity assumptions can fully compensate for the regularity assumptions for prediction problems (but not for estimation problems). This demonstrates the significant difference between the prediction problems and the estimation problems in functional data analysis.
    Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction. (arXiv:2209.12212v1 [cs.IR])
    Recent studies on Click-Through Rate (CTR) prediction has reached new levels by modeling longer user behavior sequences. Among others, the two-stage methods stand out as the state-of-the-art (SOTA) solution for industrial applications. The two-stage methods first train a retrieval model to truncate the long behavior sequence beforehand and then use the truncated sequences to train a CTR model. However, the retrieval model and the CTR model are trained separately. So the retrieved subsequences in the CTR model is inaccurate, which degrades the final performance. In this paper, we propose an end-to-end paradigm to model long behavior sequences, which is able to achieve superior performance along with remarkable cost-efficiency compared to existing models. Our contribution is three-fold: First, we propose a hashing-based efficient target attention (TA) network named ETA-Net to enable end-to-end user behavior retrieval based on low-cost bit-wise operations. The proposed ETA-Net can reduce the complexity of standard TA by orders of magnitude for sequential data modeling. Second, we propose a general system architecture as one viable solution to deploy ETA-Net on industrial systems. Particularly, ETA-Net has been deployed on the recommender system of Taobao, and brought 1.8% lift on CTR and 3.1% lift on Gross Merchandise Value (GMV) compared to the SOTA two-stage methods. Third, we conduct extensive experiments on both offline datasets and online A/B test. The results verify that the proposed model outperforms existing CTR models considerably, in terms of both CTR prediction performance and online cost-efficiency. ETA-Net now serves the main traffic of Taobao, delivering services to hundreds of millions of users towards billions of items every day.
    On Gender Bias in Fake News. (arXiv:2209.11984v1 [cs.CY])
    Data science research into fake news has gathered much momentum in recent years, arguably facilitated by the emergence of large public benchmark datasets. While it has been well-established within media studies that gender bias is an issue that pervades news media, there has been very little exploration into the relationship between gender bias and fake news. In this work, we provide the first empirical analysis of gender bias vis-a-vis fake news, leveraging simple and transparent lexicon-based methods over public benchmark datasets. Our analysis establishes the increased prevalance of gender bias in fake news across three facets viz., abundance, affect and proximal words. The insights from our analysis provide a strong argument that gender bias needs to be an important consideration in research into fake news.
    Deep Feature Selection Using a Novel Complementary Feature Mask. (arXiv:2209.12282v1 [cs.LG])
    Feature selection has drawn much attention over the last decades in machine learning because it can reduce data dimensionality while maintaining the original physical meaning of features, which enables better interpretability than feature extraction. However, most existing feature selection approaches, especially deep-learning-based, often focus on the features with great importance scores only but neglect those with less importance scores during training as well as the order of important candidate features. This can be risky since some important and relevant features might be unfortunately ignored during training, leading to suboptimal solutions or misleading selections. In our work, we deal with feature selection by exploiting the features with less importance scores and propose a feature selection framework based on a novel complementary feature mask. Our method is generic and can be easily integrated into existing deep-learning-based feature selection approaches to improve their performance as well. Experiments have been conducted on benchmarking datasets and shown that the proposed method can select more representative and informative features than the state of the art.  ( 2 min )
    Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series. (arXiv:2202.02403v3 [cs.LG] UPDATED)
    Real-world time-series datasets often violate the assumptions of standard supervised learning for forecasting -- their distributions evolve over time, rendering the conventional training and model selection procedures suboptimal. In this paper, we propose a novel method, Self-Adaptive Forecasting (SAF), to modify the training of time-series forecasting models to improve their performance on forecasting tasks with such non-stationary time-series data. SAF integrates a self-adaptation stage prior to forecasting based on `backcasting', i.e. predicting masked inputs backward in time. This is a form of test-time training that creates a self-supervised learning problem on test samples before performing the prediction task. In this way, our method enables efficient adaptation of encoded representations to evolving distributions, leading to superior generalization. SAF can be integrated with any canonical encoder-decoder based time-series architecture such as recurrent neural networks or attention-based architectures. On synthetic and real-world datasets in domains where time-series data are known to be notoriously non-stationary, such as healthcare and finance, we demonstrate a significant benefit of SAF in improving forecasting accuracy.  ( 2 min )
    MGIC: Multigrid-in-Channels Neural Network Architectures. (arXiv:2011.09128v4 [cs.CV] UPDATED)
    We present a multigrid-in-channels (MGIC) approach that tackles the quadratic growth of the number of parameters with respect to the number of channels in standard convolutional neural networks (CNNs). Thereby our approach addresses the redundancy in CNNs that is also exposed by the recent success of lightweight CNNs. Lightweight CNNs can achieve comparable accuracy to standard CNNs with fewer parameters; however, the number of weights still scales quadratically with the CNN's width. Our MGIC architectures replace each CNN block with an MGIC counterpart that utilizes a hierarchy of nested grouped convolutions of small group size to address this. Hence, our proposed architectures scale linearly with respect to the network's width while retaining full coupling of the channels as in standard CNNs. Our extensive experiments on image classification, segmentation, and point cloud classification show that applying this strategy to different architectures like ResNet and MobileNetV3 reduces the number of parameters while obtaining similar or better accuracy.  ( 3 min )
    BED: A Real-Time Object Detection System for Edge Devices. (arXiv:2202.07503v4 [cs.CV] UPDATED)
    Deploying deep neural networks~(DNNs) on edge devices provides efficient and effective solutions for the real-world tasks. Edge devices have been used for collecting a large volume of data efficiently in different domains. DNNs have been an effective tool for data processing and analysis. However, designing DNNs on edge devices is challenging due to the limited computational resources and memory. To tackle this challenge, we demonstrate Object Detection System for Edge Devices~(BED) on the MAX78000 DNN accelerator. It integrates on-device DNN inference with a camera and an LCD display for image acquisition and detection exhibition, respectively. BED is a concise, effective and detailed solution, including model training, quantization, synthesis and deployment. The entire repository is open-sourced on Github, including a Graphical User Interface~(GUI) for on-chip debugging. Experiment results indicate that BED can produce accurate detection with a 300-KB tiny DNN model, which takes only 91.9 ms of inference time and 1.845 mJ of energy. The real-time detection is available at YouTube.  ( 3 min )
    Open-Ended Diverse Solution Discovery with Regulated Behavior Patterns for Cross-Domain Adaptation. (arXiv:2209.12029v1 [cs.LG])
    While Reinforcement Learning can achieve impressive results for complex tasks, the learned policies are generally prone to fail in downstream tasks with even minor model mismatch or unexpected perturbations. Recent works have demonstrated that a policy population with diverse behavior characteristics can generalize to downstream environments with various discrepancies. However, such policies might result in catastrophic damage during the deployment in practical scenarios like real-world systems due to the unrestricted behaviors of trained policies. Furthermore, training diverse policies without regulation of the behavior can result in inadequate feasible policies for extrapolating to a wide range of test conditions with dynamics shifts. In this work, we aim to train diverse policies under the regularization of the behavior patterns. We motivate our paradigm by observing the inverse dynamics in the environment with partial state information and propose Diversity in Regulation(DiR) training diverse policies with regulated behaviors to discover desired patterns that benefit the generalization. Considerable empirical results on various variations of different environments indicate that our method attains improvements over other diversity-driven counterparts.  ( 2 min )
    Whodunit? Learning to Contrast for Authorship Attribution. (arXiv:2209.11887v1 [cs.CL])
    Authorship attribution is the task of identifying the author of a given text. Most existing approaches use manually designed features that capture a dataset's content and style. However, this dataset-dependent approach yields inconsistent performance. Thus, we propose to fine-tune pre-trained language representations using a combination of contrastive learning and supervised learning (Contra-X). We show that Contra-X advances the state-of-the-art on multiple human and machine authorship attribution benchmarks, enabling improvements of up to 6.8%. We also show Contra-X to be consistently superior to cross-entropy fine-tuning across different data regimes. Crucially, we present qualitative and quantitative analyses of these improvements. Our learned representations form highly separable clusters for different authors. However, we find that contrastive learning improves overall accuracy at the cost of sacrificing performance for some authors. Resolving this tension will be an important direction for future work. To the best of our knowledge, we are the first to analyze the effect of combining contrastive learning with cross-entropy fine-tuning for authorship attribution.  ( 2 min )
    Online Allocation and Learning in the Presence of Strategic Agents. (arXiv:2209.12112v1 [cs.GT])
    We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them. The agents' valuations for the item in each round are assumed to be i.i.d. but their distribution is a priori unknown to the central planner. Therefore, the central planner needs to implicitly learn these distributions from the observed values in order to pick a good allocation policy. However, an added challenge here is that the agents are strategic with incentives to misreport their valuations in order to receive better allocations. This sets our work apart both from the online auction design settings which typically assume known valuation distributions and/or involve payments, and from the online learning settings that do not consider strategic agents. To that end, our main contribution is an online learning based allocation mechanism that is approximately Bayesian incentive compatible, and when all agents are truthful, guarantees a sublinear regret for individual agents' utility compared to that under the optimal offline allocation policy.  ( 2 min )
    One-Shot Learning of Stochastic Differential Equations with Computational Graph Completion. (arXiv:2209.12086v1 [stat.ML])
    We consider the problem of learning Stochastic Differential Equations of the form $dX_t = f(X_t)dt+\sigma(X_t)dW_t $ from one sample trajectory. This problem is more challenging than learning deterministic dynamical systems because one sample trajectory only provides indirect information on the unknown functions $f$, $\sigma$, and stochastic process $dW_t$ representing the drift, the diffusion, and the stochastic forcing terms, respectively. We propose a simple kernel-based solution to this problem that can be decomposed as follows: (1) Represent the time-increment map $X_t \rightarrow X_{t+dt}$ as a Computational Graph in which $f$, $\sigma$ and $dW_t$ appear as unknown functions and random variables. (2) Complete the graph (approximate unknown functions and random variables) via Maximum a Posteriori Estimation (given the data) with Gaussian Process (GP) priors on the unknown functions. (3) Learn the covariance functions (kernels) of the GP priors from data with randomized cross-validation. Numerical experiments illustrate the efficacy, robustness, and scope of our method.  ( 2 min )
    Consistency of Constrained Spectral Clustering under Graph Induced Fair Planted Partitions. (arXiv:2105.03714v2 [cs.LG] UPDATED)
    Spectral clustering is popular among practitioners and theoreticians alike. While performance guarantees for spectral clustering are well understood, recent studies have focused on enforcing ``fairness'' in clusters, requiring them to be ``balanced'' with respect to a categorical sensitive node attribute (e.g. the race distribution in clusters must match the race distribution in the population). In this paper, we consider a setting where sensitive attributes indirectly manifest in an auxiliary \textit{representation graph} rather than being directly observed. This graph specifies node pairs that can represent each other with respect to sensitive attributes and is observed in addition to the usual \textit{similarity graph}. Our goal is to find clusters in the similarity graph while respecting a new individual-level fairness constraint encoded by the representation graph. We develop variants of unnormalized and normalized spectral clustering for this task and analyze their performance under a \emph{fair} planted partition model induced by the representation graph. This model uses both the cluster membership of the nodes and the structure of the representation graph to generate random similarity graphs. To the best of our knowledge, these are the first consistency results for constrained spectral clustering under an individual-level fairness constraint. Numerical results corroborate our theoretical findings.  ( 3 min )
    Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamics. (arXiv:2209.12095v1 [physics.soc-ph])
    Urbanization and its problems require an in-depth and comprehensive understanding of urban dynamics, especially the complex and diversified lifestyles in modern cities. Digitally acquired data can accurately capture complex human activity, but it lacks the interpretability of demographic data. In this paper, we study a privacy-enhanced dataset of the mobility visitation patterns of 1.2 million people to 1.1 million places in 11 metro areas in the U.S. to detect the latent mobility behaviors and lifestyles in the largest American cities. Despite the considerable complexity of mobility visitations, we found that lifestyles can be automatically decomposed into only 12 latent interpretable activity behaviors on how people combine shopping, eating, working, or using their free time. Rather than describing individuals with a single lifestyle, we find that city dwellers' behavior is a mixture of those behaviors. Those detected latent activity behaviors are equally present across cities and cannot be fully explained by main demographic features. Finally, we find those latent behaviors are associated with dynamics like experienced income segregation, transportation, or healthy behaviors in cities, even after controlling for demographic features. Our results signal the importance of complementing traditional census data with activity behaviors to understand urban dynamics.  ( 2 min )
    Graph Representation Learning for Energy Demand Data: Application to Joint Energy System Planning under Emissions Constraints. (arXiv:2209.12035v1 [cs.LG])
    A rapid transformation of current electric power and natural gas (NG) infrastructure is imperative to meet the mid-century goal of CO2 emissions reduction requires. This necessitates a long-term planning of the joint power-NG system under representative demand and supply patterns, operational constraints, and policy considerations. Our work is motivated by the computational and practical challenges associated with solving the generation and transmission expansion problem (GTEP) for joint planning of power-NG systems. Specifically, we focus on efficiently extracting a set of representative days from power and NG data in respective networks and using this set to reduce the computational burden required to solve the GTEP. We propose a Graph Autoencoder for Multiple time resolution Energy Systems (GAMES) to capture the spatio-temporal demand patterns in interdependent networks and account for differences in the temporal resolution of available data. The resulting embeddings are used in a clustering algorithm to select representative days. We evaluate the effectiveness of our approach in solving a GTEP formulation calibrated for the joint power-NG system in New England. This formulation accounts for the physical interdependencies between power and NG systems, including the joint emissions constraint. Our results show that the set of representative days obtained from GAMES not only allows us to tractably solve the GTEP formulation, but also achieves a lower cost of implementing the joint planning decisions.  ( 3 min )
    An Empirical Exploration of Cross-domain Alignment between Language and Electroencephalogram. (arXiv:2208.06348v3 [q-bio.NC] UPDATED)
    Electroencephalography (EEG) and language have been widely explored independently for many downstream tasks (e.g., sentiment analysis, relation detection, etc.). Multimodal approaches that study both domains have not been well explored, even though in recent years, multimodal learning has been seen to be more powerful than its unimodal counterparts. In this study, we want to explore the relationship and dependency between EEG and language, i.e., how one domain reflects and represents the other. To study the relationship at the representation level, we introduced MTAM, a MultimodalTransformer Alignment Model, to observe coordinated representations between the two modalities, and thus employ the transformed representations for downstream applications. We used various relationship alignment-seeking techniques, such as Canonical Correlation Analysis and Wasserstein Distance, as loss functions to transfigure low-level language and EEG features to high-level transformed features. On downstream applications, sentiment analysis and relation detection, we achieved new state-of-the-art results on two datasets, ZuCo and K-EmoCon. Our method achieved an F1-score improvement of 16.5% on sentiment analysis for K-EmoCon, 27% on sentiment analysis of ZuCo, and 31.1% on relation detection of ZuCo. In addition, we provide interpretations of the performance improvement by: (1) visualizing the original feature distribution and the transformed feature distribution, showing the effectiveness of the alignment module for discovering and encoding the relationship between EEG and language; (2) visualizing word-level and sentence-level EEG-language alignment weights, showing the influence of different language semantics as well as EEG frequency features; and (3) visualizing brain topographical maps to provide an intuitive demonstration of the connectivity of EEG and language response in the brain regions.  ( 3 min )
    TransPOS: Transformers for Consolidating Different POS Tagset Datasets. (arXiv:2209.11959v1 [cs.CL])
    In hope of expanding training data, researchers often want to merge two or more datasets that are created using different labeling schemes. This paper considers two datasets that label part-of-speech (POS) tags under different tagging schemes and leverage the supervised labels of one dataset to help generate labels for the other dataset. This paper further discusses the theoretical difficulties of this approach and proposes a novel supervised architecture employing Transformers to tackle the problem of consolidating two completely disjoint datasets. The results diverge from initial expectations and discourage exploration into the use of disjoint labels to consolidate datasets with different labels.  ( 2 min )
    Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning. (arXiv:2112.12458v2 [cs.LG] UPDATED)
    Multi-agent reinforcement learning (MARL) enables us to create adaptive agents in challenging environments, even when the agents have limited observation. Modern MARL methods have focused on finding factorized value functions. While successful, the resulting methods have convoluted network structures. We take a radically different approach and build on the structure of independent Q-learners. Our algorithm LAN leverages a dueling architecture to represent decentralized policies as separate individual advantage functions w.r.t.\ a centralized critic that is cast aside after training. The critic works as a stabilizer that coordinates the learning and to formulate DQN targets. This enables LAN to keep the number of parameters of its centralized network independent in the number of agents, without imposing additional constraints like monotonic value functions. When evaluated on the SMAC, LAN shows SOTA performance overall and scores more than 80\% wins in two super-hard maps where even QPLEX does not obtain almost any wins. Moreover, when the number of agents becomes large, LAN uses significantly fewer parameters than QPLEX or even QMIX. We thus show that LAN's structure forms a key improvement that helps MARL methods remain scalable.  ( 3 min )
    On Variance Estimation of Random Forests. (arXiv:2202.09008v3 [stat.ML] UPDATED)
    Ensemble methods, such as random forests, are popular in applications due to their high predictive accuracy. Existing literature views a random forest prediction as an infinite-order incomplete U-statistic to quantify its uncertainty. However, these methods focus on a small subsampling size of each tree, which is theoretically valid but practically limited. This paper develops an unbiased variance estimator based on incomplete U-statistics, which allows the tree size to be comparable with the overall sample size, making statistical inference possible in a broader range of real applications. Simulation results demonstrate that our estimators enjoy lower bias and more accurate coverage rate without additional computational costs. We also propose a local smoothing procedure to reduce the variation of our estimator, which shows improved numerical performance when the number of trees is relatively small. Further, we investigate the ratio consistency of our proposed variance estimator under specific scenarios. In particular, we develop a new "double U-statistic" formulation to analyze the Hoeffding decomposition of the estimator's variance.  ( 2 min )
    Highly Scalable Task Grouping for Deep Multi-Task Learning in Prediction of Epigenetic Events. (arXiv:2209.11892v1 [cs.LG])
    Deep neural networks trained for predicting cellular events from DNA sequence have become emerging tools to help elucidate the biological mechanism underlying the associations identified in genome-wide association studies. To enhance the training, multi-task learning (MTL) has been commonly exploited in previous works where trained networks were needed for multiple profiles differing in either event modality or cell type. All existing works adopted a simple MTL framework where all tasks share a single feature extraction network. Such a strategy even though effective to certain extent leads to substantial negative transfer, meaning the existence of large portion of tasks for which models obtained through MTL perform worse than those by single task learning. There have been methods developed to address such negative transfer in other domains, such as computer vision. However, these methods are generally difficult to scale up to handle large amount of tasks. In this paper, we propose a highly scalable task grouping framework to address negative transfer by only jointly training tasks that are potentially beneficial to each other. The proposed method exploits the network weights associated with task specific classification heads that can be cheaply obtained by one-time joint training of all tasks. Our results using a dataset consisting of 367 epigenetic profiles demonstrate the effectiveness of the proposed approach and its superiority over baseline methods.  ( 3 min )
    Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem. (arXiv:2209.12013v1 [cs.LG])
    Bandits with knapsacks (BwK) is an influential model of sequential decision-making under uncertainty that incorporates resource consumption constraints. In each round, the decision-maker observes an outcome consisting of a reward and a vector of nonnegative resource consumptions, and the budget of each resource is decremented by its consumption. In this paper we introduce a natural generalization of the stochastic BwK problem that allows non-monotonic resource utilization. In each round, the decision-maker observes an outcome consisting of a reward and a vector of resource drifts that can be positive, negative or zero, and the budget of each resource is incremented by its drift. Our main result is a Markov decision process (MDP) policy that has constant regret against a linear programming (LP) relaxation when the decision-maker knows the true outcome distributions. We build upon this to develop a learning algorithm that has logarithmic regret against the same LP relaxation when the decision-maker does not know the true outcome distributions. We also present a reduction from BwK to our model that shows our regret bound matches existing results.  ( 2 min )
    Enhancing Claim Classification with Feature Extraction from Anomaly-Detection-Derived Routine and Peculiarity Profiles. (arXiv:2209.11763v1 [cs.LG])
    Usage-based insurance is becoming the new standard in vehicle insurance; it is therefore relevant to find efficient ways of using insureds' driving data. Applying anomaly detection to vehicles' trip summaries, we develop a method allowing to derive a "routine" and a "peculiarity" anomaly profile for each vehicle. To this end, anomaly detection algorithms are used to compute a routine and a peculiarity anomaly score for each trip a vehicle makes. The former measures the anomaly degree of the trip compared to the other trips made by the concerned vehicle, while the latter measures its anomaly degree compared to trips made by any vehicle. The resulting anomaly scores vectors are used as routine and peculiarity profiles. Features are then extracted from these profiles, for which we investigate the predictive power in the claim classification framework. Using real data, we find that features extracted from the vehicles' peculiarity profile improve classification.  ( 2 min )
    Raising the Bar on the Evaluation of Out-of-Distribution Detection. (arXiv:2209.11960v1 [cs.CV])
    In image classification, a lot of development has happened in detecting out-of-distribution (OoD) data. However, most OoD detection methods are evaluated on a standard set of datasets, arbitrarily different from training data. There is no clear definition of what forms a ``good" OoD dataset. Furthermore, the state-of-the-art OoD detection methods already achieve near perfect results on these standard benchmarks. In this paper, we define 2 categories of OoD data using the subtly different concepts of perceptual/visual and semantic similarity to in-distribution (iD) data. We define Near OoD samples as perceptually similar but semantically different from iD samples, and Shifted samples as points which are visually different but semantically akin to iD data. We then propose a GAN based framework for generating OoD samples from each of these 2 categories, given an iD dataset. Through extensive experiments on MNIST, CIFAR-10/100 and ImageNet, we show that a) state-of-the-art OoD detection methods which perform exceedingly well on conventional benchmarks are significantly less robust to our proposed benchmark. Moreover, b) models performing well on our setup also perform well on conventional real-world OoD detection benchmarks and vice versa, thereby indicating that one might not even need a separate OoD set, to reliably evaluate performance in OoD detection.  ( 2 min )
    Toward Intention Discovery for Early Malice Detection in Bitcoin. (arXiv:2209.12001v1 [cs.LG])
    Bitcoin has been subject to illicit activities more often than probably any other financial assets, due to the pseudo-anonymous nature of its transacting entities. An ideal detection model is expected to achieve all the three properties of (I) early detection, (II) good interpretability, and (III) versatility for various illicit activities. However, existing solutions cannot meet all these requirements, as most of them heavily rely on deep learning without satisfying interpretability and are only available for retrospective analysis of a specific illicit type. First, we present asset transfer paths, which aim to describe addresses' early characteristics. Next, with a decision tree based strategy for feature selection and segmentation, we split the entire observation period into different segments and encode each as a segment vector. After clustering all these segment vectors, we get the global status vectors, essentially the basic unit to describe the whole intention. Finally, a hierarchical self-attention predictor predicts the label for the given address in real time. A survival module tells the predictor when to stop and proposes the status sequence, namely intention. % With the type-dependent selection strategy and global status vectors, our model can be applied to detect various illicit activities with strong interpretability. The well-designed predictor and particular loss functions strengthen the model's prediction speed and interpretability one step further. Extensive experiments on three real-world datasets show that our proposed algorithm outperforms state-of-the-art methods. Besides, additional case studies justify our model can not only explain existing illicit patterns but can also find new suspicious characters.
    Deep Attentive Belief Propagation: Integrating Reasoning and Learning for Solving Constraint Optimization Problems. (arXiv:2209.12000v1 [cs.AI])
    Belief Propagation (BP) is an important message-passing algorithm for various reasoning tasks over graphical models, including solving the Constraint Optimization Problems (COPs). It has been shown that BP can achieve state-of-the-art performance on various benchmarks by mixing old and new messages before sending the new one, i.e., damping. However, existing methods of tuning a static damping factor for BP not only are laborious but also harm their performance. Moreover, existing BP algorithms treat each variable node's neighbors equally when composing a new message, which also limits their exploration ability. To address these issues, we seamlessly integrate BP, Gated Recurrent Units (GRUs), and Graph Attention Networks (GATs) within the message-passing framework to reason about dynamic weights and damping factors for composing new BP messages. Our model, Deep Attentive Belief Propagation (DABP), takes the factor graph and the BP messages in each iteration as the input and infers the optimal weights and damping factors through GRUs and GATs, followed by a multi-head attention layer. Furthermore, unlike existing neural-based BP variants, we propose a novel self-supervised learning algorithm for DABP with a smoothed solution cost, which does not require expensive training labels and also avoids the common out-of-distribution issue through efficient online learning. Extensive experiments show that our model significantly outperforms state-of-the-art baselines.
    Unsupervised domain adaptation for speech recognition with unsupervised error correction. (arXiv:2209.12043v1 [cs.SD])
    The transcription quality of automatic speech recognition (ASR) systems degrades significantly when transcribing audios coming from unseen domains. We propose an unsupervised error correction method for unsupervised ASR domain adaption, aiming to recover transcription errors caused by domain mismatch. Unlike existing correction methods that rely on transcribed audios for training, our approach requires only unlabeled data of the target domains in which a pseudo-labeling technique is applied to generate correction training samples. To reduce over-fitting to the pseudo data, we also propose an encoder-decoder correction model that can take into account additional information such as dialogue context and acoustic features. Experiment results show that our method obtains a significant word error rate (WER) reduction over non-adapted ASR systems. The correction model can also be applied on top of other adaptation approaches to bring an additional improvement of 10% relatively.
    DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting. (arXiv:1905.12256v3 [cs.LG] UPDATED)
    Traffic speed forecasting is one of the core problems in transportation systems. For a more accurate prediction, recent studies started using not only the temporal speed patterns but also the spatial information on the road network through the graph convolutional networks. Even though the road network is highly complex due to its non-Euclidean and directional characteristics, previous approaches mainly focused on modeling the spatial dependencies using the distance only. In this paper, we identify two essential spatial dependencies in traffic forecasting in addition to distance, direction and positional relationship, for designing basic graph elements as the fundamental building blocks. Using the building blocks, we suggest DDP-GCN (Distance, Direction, and Positional relationship Graph Convolutional Network) to incorporate the three spatial relationships into deep neural networks. We evaluate the proposed model with two large-scale real-world datasets, and find positive improvements for long-term forecasting in highly complex urban networks. The improvement can be larger for commute hours, but it can be also limited for short-term forecasting.
    FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks. (arXiv:2107.06419v7 [cs.LG] UPDATED)
    Attention mechanisms, primarily designed to capture pairwise correlations between words, have become the backbone of machine learning, expanding beyond natural language processing into other domains. This growth in adaptation comes at the cost of prohibitively large memory requirements and computational complexity, especially at higher number of input elements. This limitation is due to inherently limited data reuse opportunities and quadratic growth in memory footprints, leading to severe memory-boundedness and limited scalability of input elements. This work addresses these challenges by devising a tailored dataflow optimization, called FLAT, for attention mechanisms without altering their functionality. This dataflow processes costly attention operations through a unique fusion mechanism, transforming the memory footprint quadratic growth to merely a linear one. To realize the full potential of this bespoke mechanism, we propose a tiling approach to enhance the data reuse across attention operations. Our method both mitigates the off-chip bandwidth bottleneck as well as reduces the on-chip memory requirement. FLAT delivers 1.94x (1.76x) speedup and 49% and (42%) of energy savings compared to the state-of-the-art Edge (Cloud) accelerators with no customized dataflow optimization. When on-chip resources are scarce (20 KB-200 KB), FLAT yields, on average, 1.5x end-to-end latency reduction across a diverse range of conventional attention-based models with input sequence lengths ranging from 512-token to 64K-token. Our evaluations demonstrate that state-of-the-art DNN dataflow applied to attention operations reach the efficiency limit for inputs above 512 elements. In contrast, FLAT unblocks transformer models for inputs with up to 64K elements
    Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes. (arXiv:2110.11383v2 [math.OC] UPDATED)
    We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem, in which an agent seeks to maximize a discounted cumulative reward subject to a number of constraints on discounted cumulative utilities. To solve this constrained optimization program, we study an online actor-critic variant of a classic primal-dual method where the gradients of both the primal and dual functions are estimated using samples from a single trajectory generated by the underlying time-varying Markov processes. This online primal-dual natural actor-critic algorithm maintains and iteratively updates three variables: a dual variable (or Lagrangian multiplier), a primal variable (or actor), and a critic variable used to estimate the gradients of both primal and dual variables. These variables are updated simultaneously but on different time scales (using different step sizes) and they are all intertwined with each other. Our main contribution is to derive a finite-time analysis for the convergence of this algorithm to the global optimum of a CMDP problem. Specifically, we show that with a proper choice of step sizes the optimality gap and constraint violation converge to zero in expectation at a rate $\mathcal{O}(1/K^{1/6})$, where K is the number of iterations. To our knowledge, this paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem. We also validate the effectiveness of this algorithm through numerical simulations.
    Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification. (arXiv:2205.13094v3 [cs.LG] UPDATED)
    While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. This is rather surprising, since undersampling algorithms discard excess majority group data. To understand this phenomenon, we ask if learning is fundamentally constrained by a lack of minority group samples. We prove that this is indeed the case in the setting of nonparametric binary classification. Our results show that in the worst case, an algorithm cannot outperform undersampling unless there is a high degree of overlap between the train and test distributions (which is unlikely to be the case in real-world datasets), or if the algorithm leverages additional structure about the distribution shift. In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal. In the case of group-covariate shift we show that there is an undersampling algorithm that is minimax optimal when the overlap between the group distributions is small. We also perform an experimental case study on a label shift dataset and find that in line with our theory, the test accuracy of robust neural network classifiers is constrained by the number of minority samples.
    Latent Variable Method Demonstrator -- Software for Understanding Multivariate Data Analytics Algorithms. (arXiv:2205.08132v2 [stat.ML] UPDATED)
    The ever-increasing quantity of multivariate process data is driving a need for skilled engineers to analyze, interpret, and build models from such data. Multivariate data analytics relies heavily on linear algebra, optimization, and statistics and can be challenging for students to understand given that most curricula do not have strong coverage in the latter three topics. This article describes interactive software - the Latent Variable Demonstrator (LAVADE) - for teaching, learning, and understanding latent variable methods. In this software, users can interactively compare latent variable methods such as Partial Least Squares (PLS), and Principal Component Regression (PCR) with other regression methods such as Least Absolute Shrinkage and Selection Operator (lasso), Ridge Regression (RR), and Elastic Net (EN). LAVADE helps to build intuition on choosing appropriate methods, hyperparameter tuning, and model coefficient interpretation, fostering a conceptual understanding of the algorithms' differences. The software contains a data generation method and three chemical process datasets, allowing for comparing results of datasets with different levels of complexity. LAVADE is released as open-source software so that others can apply and advance the tool for use in teaching or research.
    Clustering-Based Representation Learning through Output Translation and Its Application to Remote--Sensing Images. (arXiv:2107.05948v4 [cs.LG] UPDATED)
    In supervised deep learning, learning good representations for remote--sensing images (RSI) relies on manual annotations. However, in the area of remote sensing, it is hard to obtain huge amounts of labeled data. Recently, self--supervised learning shows its outstanding capability to learn representations of images, especially the methods of instance discrimination. Comparing methods of instance discrimination, clustering--based methods not only view the transformations of the same image as ``positive" samples but also similar images. In this paper, we propose a new clustering-based method for representation learning. We first introduce a quantity to measure representations' discriminativeness and from which we show that even distribution requires the most discriminative representations. This provides a theoretical insight into why evenly distributing the images works well. We notice that only the even distributions that preserve representations' neighborhood relations are desirable. Therefore, we develop an algorithm that translates the outputs of a neural network to achieve the goal of evenly distributing the samples while preserving outputs' neighborhood relations. Extensive experiments have demonstrated that our method can learn representations that are as good as or better than the state of the art approaches, and that our method performs computationally efficiently and robustly on various RSI datasets.
    On Representing Linear Programs by Graph Neural Networks. (arXiv:2209.12288v1 [cs.LG])
    Learning to optimize is a rapidly growing area that aims to solve optimization problems or improve existing optimization algorithms using machine learning (ML). In particular, the graph neural network (GNN) is considered a suitable ML model for optimization problems whose variables and constraints are permutation--invariant, for example, the linear program (LP). While the literature has reported encouraging numerical results, this paper establishes the theoretical foundation of applying GNNs to solving LPs. Given any size limit of LPs, we construct a GNN that maps different LPs to different outputs. We show that properly built GNNs can reliably predict feasibility, boundedness, and an optimal solution for each LP in a broad class. Our proofs are based upon the recently--discovered connections between the Weisfeiler--Lehman isomorphism test and the GNN. To validate our results, we train a simple GNN and present its accuracy in mapping LPs to their feasibilities and solutions.
    GCF: Generalized Causal Forest for Heterogeneous Treatment Effect Estimation in Online Marketplace. (arXiv:2203.10975v2 [stat.ML] UPDATED)
    Uplift modeling is a rapidly growing approach that utilizes causal inference and machine learning methods to directly estimate the heterogeneous treatment effects, which has been widely applied to various online marketplaces to assist large-scale decision-making in recent years. The existing popular models, like causal forest (CF), are limited to either discrete treatments or posing parametric assumptions on the outcome-treatment relationship that may suffer model misspecification. However, continuous treatments (e.g., price, duration) often arise in marketplaces. To alleviate these restrictions, we use a kernel-based doubly robust estimator to recover the non-parametric dose-response functions that can flexibly model continuous treatment effects. Moreover, we propose a generic distance-based splitting criterion to capture the heterogeneity for the continuous treatments. We call the proposed algorithm generalized causal forest (GCF) as it generalizes the use case of CF to a much broader setting. We show the effectiveness of GCF by deriving the asymptotic property of the estimator and comparing it to popular uplift modeling methods on both synthetic and real-world datasets. We implement GCF on Spark and successfully deploy it into a large-scale online pricing system at a leading ride-sharing company. Online A/B testing results further validate the superiority of GCF.
    Solutions to preference manipulation in recommender systems require knowledge of meta-preferences. (arXiv:2209.11801v1 [cs.IR])
    Iterative machine learning algorithms used to power recommender systems often change people's preferences by trying to learn them. Further a recommender can better predict what a user will do by making its users more predictable. Some preference changes on the part of the user are self-induced and desired whether the recommender caused them or not. This paper proposes that solutions to preference manipulation in recommender systems must take into account certain meta-preferences (preferences over another preference) in order to respect the autonomy of the user and not be manipulative.
    Variational Inference as Iterative Projection in a Bayesian Hilbert Space with Application to Robotic State Estimation. (arXiv:2005.07275v3 [cs.LG] UPDATED)
    Variational Bayesian inference is an important machine-learning tool that finds application from statistics to robotics. The goal is to find an approximate probability density function (PDF) from a chosen family that is in some sense 'closest' to the full Bayesian posterior. Closeness is typically defined through the selection of an appropriate loss functional such as the Kullback-Leibler (KL) divergence. In this paper, we explore a new formulation of variational inference by exploiting the fact that (most) PDFs are members of a Bayesian Hilbert space under careful definitions of vector addition, scalar multiplication and an inner product. We show that, under the right conditions, variational inference based on KL divergence can amount to iterative projection, in the Euclidean sense, of the Bayesian posterior onto a subspace corresponding to the selected approximation family. We work through the details of this general framework for the specific case of the Gaussian approximation family and show the equivalence to another Gaussian variational inference approach. We furthermore discuss the implications for systems that exhibit sparsity, which is handled naturally in Bayesian space, and give an example of a high-dimensional robotic state estimation problem that can be handled as a result. We provide some preliminary examples of how the approach could be applied to non-Gaussian inference and discuss the limitations of the approach in detail to encourage follow-on work along these lines.
    Bias-reduced Multi-step Hindsight Experience Replay for Efficient Multi-goal Reinforcement Learning. (arXiv:2102.12962v3 [cs.LG] UPDATED)
    Multi-goal reinforcement learning is widely applied in planning and robot manipulation. Two main challenges in multi-goal reinforcement learning are sparse rewards and sample inefficiency. Hindsight Experience Replay (HER) aims to tackle the two challenges via goal relabeling. However, HER-related works still need millions of samples and a huge computation. In this paper, we propose Multi-step Hindsight Experience Replay (MHER), incorporating multi-step relabeled returns based on $n$-step relabeling to improve sample efficiency. Despite the advantages of $n$-step relabeling, we theoretically and experimentally prove the off-policy $n$-step bias introduced by $n$-step relabeling may lead to poor performance in many environments. To address the above issue, two bias-reduced MHER algorithms, MHER($\lambda$) and Model-based MHER (MMHER) are presented. MHER($\lambda$) exploits the $\lambda$ return while MMHER benefits from model-based value expansions. Experimental results on numerous multi-goal robotic tasks show that our solutions can successfully alleviate off-policy $n$-step bias and achieve significantly higher sample efficiency than HER and Curriculum-guided HER with little additional computation beyond HER.
    Joint Speech Activity and Overlap Detection with Multi-Exit Architecture. (arXiv:2209.11906v1 [cs.SD])
    Overlapped speech detection (OSD) is critical for speech applications in scenario of multi-party conversion. Despite numerous research efforts and progresses, comparing with speech activity detection (VAD), OSD remains an open challenge and its overall performance is far from satisfactory. The majority of prior research typically formulates the OSD problem as a standard classification problem, to identify speech with binary (OSD) or three-class label (joint VAD and OSD) at frame level. In contrast to the mainstream, this study investigates the joint VAD and OSD task from a new perspective. In particular, we propose to extend traditional classification network with multi-exit architecture. Such an architecture empowers our system with unique capability to identify class using either low-level features from early exits or high-level features from last exit. In addition, two training schemes, knowledge distillation and dense connection, are adopted to further boost our system performance. Experimental results on benchmark datasets (AMI and DIHARD-III) validated the effectiveness and generality of our proposed system. Our ablations further reveal the complementary contribution of proposed schemes. With $F_1$ score of 0.792 on AMI and 0.625 on DIHARD-III, our proposed system outperforms several top performing models on these datasets, but also surpasses the current state-of-the-art by large margins across both datasets. Besides the performance benefit, our proposed system offers another appealing potential for quality-complexity trade-offs, which is highly preferred for efficient OSD deployment.
    Are Graph Neural Networks Really Helpful for Knowledge Graph Completion?. (arXiv:2205.10652v2 [cs.AI] UPDATED)
    Knowledge graphs (KGs) facilitate a wide variety of applications due to their ability to store relational knowledge applicable to many areas. Despite great efforts invested in creation and maintenance, even the largest KGs are far from complete. Hence, KG completion (KGC) has become one of the most crucial tasks for KG research. Recently, considerable literature in this space has centered around the use of Graph Neural Networks (GNNs) to learn powerful embeddings which leverage topological structures in the KGs. Specifically, dedicated efforts have been made to extend GNNs, which are commonly designed for simple homogeneous and uni-relational graphs, to the KG context which has diverse and multi-relational connections between entities, by designing more complex aggregation schemes over neighboring nodes (crucial to GNN performance) to appropriately leverage multi-relational information. The success of these methods is naturally attributed to the use of GNNs over simpler multi-layer perceptron (MLP) models, owing to their additional aggregation functionality. In this work, we find that surprisingly, simple MLP models are able to achieve comparable performance to GNNs, suggesting that aggregation may not be as crucial as previously believed. With further exploration, we show careful scoring function and loss function design has a much stronger influence on KGC model performance, and aggregation is not practically required. This suggests a conflation of scoring function design, loss function design, and aggregation in prior work, with promising insights regarding the scalability of state-of-the-art KGC methods today, as well as careful attention to more suitable aggregation designs for KGC tasks tomorrow. The implementation is available online: https://github.com/Juanhui28/Are_MPNNs_helpful.
    How Far Should We Look Back to Achieve Effective Real-Time Time-Series Anomaly Detection?. (arXiv:2102.06560v3 [cs.LG] UPDATED)
    Anomaly detection is the process of identifying unexpected events or ab-normalities in data, and it has been applied in many different areas such as system monitoring, fraud detection, healthcare, intrusion detection, etc. Providing real-time, lightweight, and proactive anomaly detection for time series with neither human intervention nor domain knowledge could be highly valuable since it reduces human effort and enables appropriate countermeasures to be undertaken before a disastrous event occurs. To our knowledge, RePAD (Real-time Proactive Anomaly Detection algorithm) is a generic approach with all above-mentioned features. To achieve real-time and lightweight detection, RePAD utilizes Long Short-Term Memory (LSTM) to detect whether or not each upcoming data point is anomalous based on short-term historical data points. However, it is unclear that how different amounts of historical data points affect the performance of RePAD. Therefore, in this paper, we investigate the impact of different amounts of historical data on RePAD by introducing a set of performance metrics that cover novel detection accuracy measures, time efficiency, readiness, and resource consumption, etc. Empirical experiments based on real-world time series datasets are conducted to evaluate RePAD in different scenarios, and the experimental results are presented and discussed.
    Tradeoffs between convergence rate and noise amplification for momentum-based accelerated optimization algorithms. (arXiv:2209.11920v1 [math.OC])
    We study momentum-based first-order optimization algorithms in which the iterations utilize information from the two previous steps and are subject to an additive white noise. This class of algorithms includes heavy-ball and Nesterov's accelerated methods as special cases. For strongly convex quadratic problems, we use the steady-state variance of the error in the optimization variable to quantify noise amplification and exploit a novel geometric viewpoint to establish analytical lower bounds on the product between the settling time and the smallest/largest achievable noise amplification. For all stabilizing parameters, these bounds scale quadratically with the condition number. We also use the geometric insight developed in the paper to introduce two parameterized families of algorithms that strike a balance between noise amplification and settling time while preserving order-wise Pareto optimality. Finally, for a class of continuous-time gradient flow dynamics, whose suitable discretization yields two-step momentum algorithm, we establish analogous lower bounds that also scale quadratically with the condition number.
    A Stochastic Variance-Reduced Coordinate Descent Algorithm for Learning Sparse Bayesian Network from Discrete High-Dimensional Data. (arXiv:2108.09501v2 [cs.LG] UPDATED)
    This paper addresses the problem of learning a sparse structure Bayesian network from high-dimensional discrete data. Compared to continuous Bayesian networks, learning a discrete Bayesian network is a challenging problem due to the large parameter space. Although many approaches have been developed for learning continuous Bayesian networks, few approaches have been proposed for the discrete ones. In this paper, we address learning Bayesian networks as an optimization problem and propose a score function which guarantees the learnt structure to be a sparse directed acyclic graph. Besides, we implement a block-wised stochastic coordinate descent algorithm to optimize the score function. Specifically, we use a variance reducing method in our optimization algorithm to make the algorithm work efficiently for high-dimensional data. The proposed approach is applied to synthetic data from well-known benchmark networks. The quality, scalability, and robustness of the constructed network are measured. Compared to some competitive approaches, the results reveal that our algorithm outperforms some of the well-known proposed methods.
    Learned Benchmarks for Subseasonal Forecasting. (arXiv:2109.10399v2 [physics.ao-ph] UPDATED)
    We benchmark a subseasonal forecasting toolkit of simple learned models that outperform both operational practice and state-of-the-art machine learning and deep learning methods. These models, introduced by Mouatadid et al. (2022), include (a) Climatology++, an adaptive alternative to climatology that, for precipitation, is 9% more accurate and 250% more skillful than the United States operational Climate Forecasting System (CFSv2); (b) CFSv2++, a learned CFSv2 correction that improves temperature and precipitation accuracy by 7-8% and skill by 50-275%; and (c) Persistence++, an augmented persistence model that combines CFSv2 forecasts with lagged measurements to improve temperature and precipitation accuracy by 6-9% and skill by 40-130%. Across the contiguous U.S., the Climatology++, CFSv2++, and Persistence++ toolkit consistently outperforms standard meteorological baselines, state-of-the-art machine and deep learning methods, and the European Centre for Medium-Range Weather Forecasts ensemble.
    Tuning Frequency Bias in Neural Network Training with Nonuniform Data. (arXiv:2205.14300v2 [cs.LG] UPDATED)
    Small generalization errors of over-parameterized neural networks (NNs) can be partially explained by the frequency biasing phenomenon, where gradient-based algorithms minimize the low-frequency misfit before reducing the high-frequency residuals. Using the Neural Tangent Kernel (NTK), one can provide a theoretically rigorous analysis for training where data are drawn from constant or piecewise-constant probability densities. Since most training data sets are not drawn from such distributions, we use the NTK model and a data-dependent quadrature rule to theoretically quantify the frequency biasing of NN training given fully nonuniform data. By replacing the loss function with a carefully selected Sobolev norm, we can further amplify, dampen, counterbalance, or reverse the intrinsic frequency biasing in NN training.
    Realizable Learning is All You Need. (arXiv:2111.04746v2 [cs.LG] UPDATED)
    The equivalence of realizable and agnostic learnability is a fundamental phenomenon in learning theory. With variants ranging from classical settings like PAC learning and regression to recent trends such as adversarially robust and private learning, it's surprising that we still lack a unified theory; traditional proofs of the equivalence tend to be disparate, and rely on strong model-specific assumptions like uniform convergence and sample compression. In this work, we give the first model-independent framework explaining the equivalence of realizable and agnostic learnability: a three-line blackbox reduction that simplifies, unifies, and extends our understanding across a wide variety of settings. This includes models with no known characterization of learnability such as learning with arbitrary distributional assumptions or general loss, as well as a host of other popular settings such as robust learning, partial learning, fair learning, and the statistical query model. More generally, we argue that the equivalence of realizable and agnostic learning is actually a special case of a broader phenomenon we call property generalization: any desirable property of a learning algorithm (e.g.\ noise tolerance, privacy, stability) that can be satisfied over finite hypothesis classes extends (possibly in some variation) to any learnable hypothesis class.
    On the Stability Analysis of Open Federated Learning Systems. (arXiv:2209.12307v1 [cs.LG])
    We consider the open federated learning (FL) systems, where clients may join and/or leave the system during the FL process. Given the variability of the number of present clients, convergence to a fixed model cannot be guaranteed in open systems. Instead, we resort to a new performance metric that we term the stability of open FL systems, which quantifies the magnitude of the learned model in open systems. Under the assumption that local clients' functions are strongly convex and smooth, we theoretically quantify the radius of stability for two FL algorithms, namely local SGD and local Adam. We observe that this radius relies on several key parameters, including the function condition number as well as the variance of the stochastic gradient. Our theoretical results are further verified by numerical simulations on both synthetic and real-world benchmark data-sets.
    Approximate better, Attack stronger: Adversarial Example Generation via Asymptotically Gaussian Mixture Distribution. (arXiv:2209.11964v1 [cs.LG])
    Strong adversarial examples are the keys to evaluating and enhancing the robustness of deep neural networks. The popular adversarial attack algorithms maximize the non-concave loss function using the gradient ascent. However, the performance of each attack is usually sensitive to, for instance, minor image transformations due to insufficient information (only one input example, few white-box source models and unknown defense strategies). Hence, the crafted adversarial examples are prone to overfit the source model, which limits their transferability to unidentified architectures. In this paper, we propose Multiple Asymptotically Normal Distribution Attacks (MultiANDA), a novel method that explicitly characterizes adversarial perturbations from a learned distribution. Specifically, we approximate the posterior distribution over the perturbations by taking advantage of the asymptotic normality property of stochastic gradient ascent (SGA), then apply the ensemble strategy on this procedure to estimate a Gaussian mixture model for a better exploration of the potential optimization space. Drawing perturbations from the learned distribution allow us to generate any number of adversarial examples for each input. The approximated posterior essentially describes the stationary distribution of SGA iterations, which captures the geometric information around the local optimum. Thus, the samples drawn from the distribution reliably maintain the transferability. Our proposed method outperforms nine state-of-the-art black-box attacks on deep learning models with or without defenses through extensive experiments on seven normally trained and seven defence models.
    Energy-Environment evaluation and Forecast of a Novel Regenerative turboshaft engine combine cycle with DNN application. (arXiv:2209.12020v1 [eess.SP])
    In this integrated study, a turboshaft engine was evaluated by adding inlet air cooling and regenerative cooling based on energy-environment analysis. First, impacts of flight-Mach number, flight altitude, the compression ratio of compressor-1 in the main cycle, the turbine inlet temperature of turbine-1 in the main cycle, temperature fraction of turbine-2, the compression ratio of the accessory cycle, and inlet air temperature variation in inlet air cooling system on some functional performance parameters of Regenerative turboshaft engine cycle equipped with inlet air cooling system such as power-specific fuel consumption, Power output, thermal efficiency, and mass flow rate of Nitride oxides (NOx) including NO and NO2 has been investigated via using hydrogen as fuel working. Consequently, based on the analysis, a model was developed to predict the energy-environment performance of the Regenerative turboshaft engine cycle equipped with a cooling air cooling system based on a deep neural network (DNN) with 2 hidden layers with 625 neurons for each hidden layer. The model proposed to predict the amount of thermal efficiency and the mass flow rate of nitride oxide (NOx) containing NO and NO2. The results demonstrated the accuracy of the integrated DNN model with the proper amount of the MSE, MAE, and RMSD cost function for both predicted outputs to validate both testing and training data. Also, R and R^2 are noticeably calculated very close to 1 for both thermal Efficiency and NOx emission mass flow rate for both validations of thermal efficiency and NOx emission mass flow rate prediction values with its training and its testing data.
    VAESim: A probabilistic approach for self-supervised prototype discovery. (arXiv:2209.12279v1 [cs.CV])
    In medicine, curated image datasets often employ discrete labels to describe what is known to be a continuous spectrum of healthy to pathological conditions, such as e.g. the Alzheimer's Disease Continuum or other areas where the image plays a pivotal point in diagnosis. We propose an architecture for image stratification based on a conditional variational autoencoder. Our framework, VAESim, leverages a continuous latent space to represent the continuum of disorders and finds clusters during training, which can then be used for image/patient stratification. The core of the method learns a set of prototypical vectors, each associated with a cluster. First, we perform a soft assignment of each data sample to the clusters. Then, we reconstruct the sample based on a similarity measure between the sample embedding and the prototypical vectors of the clusters. To update the prototypical embeddings, we use an exponential moving average of the most similar representations between actual prototypes and samples in the batch size. We test our approach on the MNIST-handwritten digit dataset and on a medical benchmark dataset called PneumoniaMNIST. We demonstrate that our method outperforms baselines in terms of kNN accuracy measured on a classification task against a standard VAE (up to 15% improvement in performance) in both datasets, and also performs at par with classification models trained in a fully supervised way. We also demonstrate how our model outperforms current, end-to-end models for unsupervised stratification.
    Adnexal Mass Segmentation with Ultrasound Data Synthesis. (arXiv:2209.12305v1 [eess.IV])
    Ovarian cancer is the most lethal gynaecological malignancy. The disease is most commonly asymptomatic at its early stages and its diagnosis relies on expert evaluation of transvaginal ultrasound images. Ultrasound is the first-line imaging modality for characterising adnexal masses, it requires significant expertise and its analysis is subjective and labour-intensive, therefore open to error. Hence, automating processes to facilitate and standardise the evaluation of scans is desired in clinical practice. Using supervised learning, we have demonstrated that segmentation of adnexal masses is possible, however, prevalence and label imbalance restricts the performance on under-represented classes. To mitigate this we apply a novel pathology-specific data synthesiser. We create synthetic medical images with their corresponding ground truth segmentations by using Poisson image editing to integrate less common masses into other samples. Our approach achieves the best performance across all classes, including an improvement of up to 8% when compared with nnU-Net baseline approaches.
    Transfer learning for self-supervised, blind-spot seismic denoising. (arXiv:2209.12210v1 [physics.geo-ph])
    Noise in seismic data arises from numerous sources and is continually evolving. The use of supervised deep learning procedures for denoising of seismic datasets often results in poor performance: this is due to the lack of noise-free field data to act as training targets and the large difference in characteristics between synthetic and field datasets. Self-supervised, blind-spot networks typically overcome these limitation by training directly on the raw, noisy data. However, such networks often rely on a random noise assumption, and their denoising capabilities quickly decrease in the presence of even minimally-correlated noise. Extending from blind-spots to blind-masks can efficiently suppress coherent noise along a specific direction, but it cannot adapt to the ever-changing properties of noise. To preempt the network's ability to predict the signal and reduce its opportunity to learn the noise properties, we propose an initial, supervised training of the network on a frugally-generated synthetic dataset prior to fine-tuning in a self-supervised manner on the field dataset of interest. Considering the change in peak signal-to-noise ratio, as well as the volume of noise reduced and signal leakage observed, we illustrate the clear benefit in initialising the self-supervised network with the weights from a supervised base-training. This is further supported by a test on a field dataset where the fine-tuned network strikes the best balance between signal preservation and noise reduction. Finally, the use of the unrealistic, frugally-generated synthetic dataset for the supervised base-training includes a number of benefits: minimal prior geological knowledge is required, substantially reduced computational cost for the dataset generation, and a reduced requirement of re-training the network should recording conditions change, to name a few.
    Paraphrasing Is All You Need for Novel Object Captioning. (arXiv:2209.12343v1 [cs.CV])
    Novel object captioning (NOC) aims to describe images containing objects without observing their ground truth captions during training. Due to the absence of caption annotation, captioning models cannot be directly optimized via sequence-to-sequence training or CIDEr optimization. As a result, we present Paraphrasing-to-Captioning (P2C), a two-stage learning framework for NOC, which would heuristically optimize the output captions via paraphrasing. With P2C, the captioning model first learns paraphrasing from a language model pre-trained on text-only corpus, allowing expansion of the word bank for improving linguistic fluency. To further enforce the output caption sufficiently describing the visual content of the input image, we perform self-paraphrasing for the captioning model with fidelity and adequacy objectives introduced. Since no ground truth captions are available for novel object images during training, our P2C leverages cross-modality (image-text) association modules to ensure the above caption characteristics can be properly preserved. In the experiments, we not only show that our P2C achieves state-of-the-art performances on nocaps and COCO Caption datasets, we also verify the effectiveness and flexibility of our learning framework by replacing language and cross-modality association models for NOC. Implementation details and code are available in the supplementary materials.
    Deep Empirical Risk Minimization in finance: looking into the future. (arXiv:2011.09349v3 [stat.ML] UPDATED)
    Many modern computational approaches to classical problems in quantitative finance are formulated as empirical loss minimization (ERM), allowing direct applications of classical results from statistical machine learning. These methods, designed to directly construct the optimal feedback representation of hedging or investment decisions, are analyzed in this framework demonstrating their effectiveness as well as their susceptibility to generalization error. Use of classical techniques shows that over-training renders trained investment decisions to become anticipative, and proves overlearning for large hypothesis spaces. On the other hand, non-asymptotic estimates based on Rademacher complexity show the convergence for sufficiently large training sets. These results emphasize the importance of synthetic data generation and the appropriate calibration of complex models to market data. A numerically studied stylized example illustrates these possibilities, including the importance of problem dimension in the degree of overlearning, and the effectiveness of this approach.
    Cooperative Online Learning with Feedback Graphs. (arXiv:2106.04982v4 [cs.LG] UPDATED)
    We study the interplay between feedback and communication in a cooperative online learning setting where a network of agents solves a task in which the learners' feedback is determined by an arbitrary graph. We characterize regret in terms of the independence number of the strong product between the feedback graph and the communication network. Our analysis recovers as special cases many previously known bounds for distributed online learning with either expert or bandit feedback. A more detailed version of our results also captures the dependence of the regret on the delay caused by the time the information takes to traverse each graph. Experiments run on synthetic data show that the empirical behavior of our algorithm is consistent with the theoretical results.
    Composing Neural Learning and Symbolic Reasoning with an Application to Visual Discrimination. (arXiv:1907.05878v3 [cs.LG] UPDATED)
    We consider the problem of combining machine learning models to perform higher-level cognitive tasks with clear specifications. We propose the novel problem of Visual Discrimination Puzzles (VDP) that requires finding interpretable discriminators that classify images according to a logical specification. Humans can solve these puzzles with ease and they give robust, verifiable, and interpretable discriminators as answers. We propose a compositional neurosymbolic framework that combines a neural network to detect objects and relationships with a symbolic learner that finds interpretable discriminators. We create large classes of VDP datasets involving natural and artificial images and show that our neurosymbolic framework performs favorably compared to several purely neural approaches.
    Value Penalized Q-Learning for Recommender Systems. (arXiv:2110.07923v2 [cs.LG] UPDATED)
    Scaling reinforcement learning (RL) to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS, i.e., improving customers' long-term satisfaction. A key approach to this goal is offline RL, which aims to learn policies from logged data. However, the high-dimensional action space and the non-stationary dynamics in commercial RS intensify distributional shift issues, making it challenging to apply offline RL methods to RS. To alleviate the action distribution shift problem in extracting RL policy from static trajectories, we propose Value Penalized Q-learning (VPQ), an uncertainty-based offline RL algorithm. It penalizes the unstable Q-values in the regression target by uncertainty-aware weights, without the need to estimate the behavior policy, suitable for RS with a large number of items. We derive the penalty weights from the variances across an ensemble of Q-functions. To alleviate distributional shift issues at test time, we further introduce the critic framework to integrate the proposed method with classic RS models. Extensive experiments conducted on two real-world datasets show that the proposed method could serve as a gain plugin for existing RS models.  ( 3 min )
    Partial annotations for the segmentation of large structures with low annotation cost. (arXiv:2209.12216v1 [eess.IV])
    Deep learning methods have been shown to be effective for the automatic segmentation of structures and pathologies in medical imaging. However, they require large annotated datasets, whose manual segmentation is a tedious and time-consuming task, especially for large structures. We present a new method of partial annotations that uses a small set of consecutive annotated slices from each scan with an annotation effort that is equal to that of only few annotated cases. The training with partial annotations is performed by using only annotated blocks, incorporating information about slices outside the structure of interest and modifying a batch loss function to consider only the annotated slices. To facilitate training in a low data regime, we use a two-step optimization process. We tested the method with the popular soft Dice loss for the fetal body segmentation task in two MRI sequences, TRUFI and FIESTA, and compared full annotation regime to partial annotations with a similar annotation effort. For TRUFI data, the use of partial annotations yielded slightly better performance on average compared to full annotations with an increase in Dice score from 0.936 to 0.942, and a substantial decrease in Standard Deviations (STD) of Dice score by 22% and Average Symmetric Surface Distance (ASSD) by 15%. For the FIESTA sequence, partial annotations also yielded a decrease in STD of the Dice score and ASSD metrics by 27.5% and 33% respectively for in-distribution data, and a substantial improvement also in average performance on out-of-distribution data, increasing Dice score from 0.84 to 0.9 and decreasing ASSD from 7.46 to 4.01 mm. The two-step optimization process was helpful for partial annotations for both in-distribution and out-of-distribution data. The partial annotations method with the two-step optimizer is therefore recommended to improve segmentation performance under low data regime.
    Personalizing Text-to-Image Generation via Aesthetic Gradients. (arXiv:2209.12330v1 [cs.CV])
    This work proposes aesthetic gradients, a method to personalize a CLIP-conditioned diffusion model by guiding the generative process towards custom aesthetics defined by the user from a set of images. The approach is validated with qualitative and quantitative experiments, using the recent stable diffusion model and several aesthetically-filtered datasets. Code is released at https://github.com/vicgalle/stable-diffusion-aesthetic-gradients
    Persformer: A Transformer Architecture for Topological Machine Learning. (arXiv:2112.15210v2 [cs.LG] UPDATED)
    One of the main challenges of Topological Data Analysis (TDA) is to extract features from persistent diagrams directly usable by machine learning algorithms. Indeed, persistence diagrams are intrinsically (multi-)sets of points in $\mathbb{R}^2$ and cannot be seen in a straightforward manner as vectors. In this article, we introduce $\texttt{Persformer}$, the first Transformer neural network architecture that accepts persistence diagrams as input. The $\texttt{Persformer}$ architecture significantly outperforms previous topological neural network architectures on classical synthetic and graph benchmark datasets. Moreover, it satisfies a universal approximation theorem. This allows us to introduce the first interpretability method for topological machine learning, which we explore in two examples.
    Deep Reinforcement Learning for Adaptive Mesh Refinement. (arXiv:2209.12351v1 [cs.CE])
    Finite element discretizations of problems in computational physics often rely on adaptive mesh refinement (AMR) to preferentially resolve regions containing important features during simulation. However, these spatial refinement strategies are often heuristic and rely on domain-specific knowledge or trial-and-error. We treat the process of adaptive mesh refinement as a local, sequential decision-making problem under incomplete information, formulating AMR as a partially observable Markov decision process. Using a deep reinforcement learning approach, we train policy networks for AMR strategy directly from numerical simulation. The training process does not require an exact solution or a high-fidelity ground truth to the partial differential equation at hand, nor does it require a pre-computed training dataset. The local nature of our reinforcement learning formulation allows the policy network to be trained inexpensively on much smaller problems than those on which they are deployed. The methodology is not specific to any particular partial differential equation, problem dimension, or numerical discretization, and can flexibly incorporate diverse problem physics. To that end, we apply the approach to a diverse set of partial differential equations, using a variety of high-order discontinuous Galerkin and hybridizable discontinuous Galerkin finite element discretizations. We show that the resultant deep reinforcement learning policies are competitive with common AMR heuristics, generalize well across problem classes, and strike a favorable balance between accuracy and cost such that they often lead to a higher accuracy per problem degree of freedom.
    Deep Network Approximation: Achieving Arbitrary Accuracy with Fixed Number of Neurons. (arXiv:2107.02397v7 [cs.LG] UPDATED)
    This paper develops simple feed-forward neural networks that achieve the universal approximation property for all continuous functions with a fixed finite number of neurons. These neural networks are simple because they are designed with a simple, computable, and continuous activation function $\sigma$ leveraging a triangular-wave function and the softsign function. We first prove that $\sigma$-activated networks with width $36d(2d+1)$ and depth $11$ can approximate any continuous function on a $d$-dimensional hypercube within an arbitrarily small error. Hence, for supervised learning and its related regression problems, the hypothesis space generated by these networks with a size not smaller than $36d(2d+1)\times 11$ is dense in the continuous function space $C([a,b]^d)$ and therefore dense in the Lebesgue spaces $L^p([a,b]^d)$ for $p\in [1,\infty)$. Furthermore, we show that classification functions arising from image and signal classification are in the hypothesis space generated by $\sigma$-activated networks with width $36d(2d+1)$ and depth $12$ when there exist pairwise disjoint bounded closed subsets of $\mathbb{R}^d$ such that the samples of the same class are located in the same subset. Finally, we use numerical experimentation to show that replacing the rectified linear unit (ReLU) activation function by ours would improve the experiment results.
    MedMNIST v2 -- A large-scale lightweight benchmark for 2D and 3D biomedical image classification. (arXiv:2110.14795v2 [cs.CV] UPDATED)
    We introduce MedMNIST v2, a large-scale MNIST-like dataset collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into a small size of 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST v2 is designed to perform classification on lightweight 2D and 3D images with various dataset scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression, and multi-label). The resulting dataset, consisting of 708,069 2D images and 10,214 3D images in total, could support numerous research / educational purposes in biomedical image analysis, computer vision, and machine learning. We benchmark several baseline methods on MedMNIST v2, including 2D / 3D neural networks and open-source / commercial AutoML tools. The data and code are publicly available at https://medmnist.com/.  ( 3 min )
    Bigger&Faster: Two-stage Neural Architecture Search for Quantized Transformer Models. (arXiv:2209.12127v1 [cs.LG])
    Neural architecture search (NAS) for transformers has been used to create state-of-the-art models that target certain latency constraints. In this work we present Bigger&Faster, a novel quantization-aware parameter sharing NAS that finds architectures for 8-bit integer (int8) quantized transformers. Our results show that our method is able to produce BERT models that outperform the current state-of-the-art technique, AutoTinyBERT, at all latency targets we tested, achieving up to a 2.68% accuracy gain. Additionally, although the models found by our technique have a larger number of parameters than their float32 counterparts, due to their parameters being int8, they have significantly smaller memory footprints.
    Application of Deep Learning in Generating Structured Radiology Reports: A Transformer-Based Technique. (arXiv:2209.12177v1 [cs.CL])
    Since radiology reports needed for clinical practice and research are written and stored in free-text narrations, extraction of relative information for further analysis is difficult. In these circumstances, natural language processing (NLP) techniques can facilitate automatic information extraction and transformation of free-text formats to structured data. In recent years, deep learning (DL)-based models have been adapted for NLP experiments with promising results. Despite the significant potential of DL models based on artificial neural networks (ANN) and convolutional neural networks (CNN), the models face some limitations to implement in clinical practice. Transformers, another new DL architecture, have been increasingly applied to improve the process. Therefore, in this study, we propose a transformer-based fine-grained named entity recognition (NER) architecture for clinical information extraction. We collected 88 abdominopelvic sonography reports in free-text formats and annotated them based on our developed information schema. The text-to-text transfer transformer model (T5) and Scifive, a pre-trained domain-specific adaptation of the T5 model, were applied for fine-tuning to extract entities and relations and transform the input into a structured format. Our transformer-based model in this study outperformed previously applied approaches such as ANN and CNN models based on ROUGE-1, ROUGE-2, ROUGE-L, and BLEU scores of 0.816, 0.668, 0.528, and 0.743, respectively, while providing an interpretable structured report.
    Neural Stochastic PDEs: Resolution-Invariant Learning of Continuous Spatiotemporal Dynamics. (arXiv:2110.10249v8 [cs.LG] UPDATED)
    Stochastic partial differential equations (SPDEs) are the mathematical tool of choice for modelling spatiotemporal PDE-dynamics under the influence of randomness. Based on the notion of mild solution of an SPDE, we introduce a novel neural architecture to learn solution operators of PDEs with (possibly stochastic) forcing from partially observed data. The proposed Neural SPDE model provides an extension to two popular classes of physics-inspired architectures. On the one hand, it extends Neural CDEs and variants -- continuous-time analogues of RNNs -- in that it is capable of processing incoming sequential information arriving at arbitrary spatial resolutions. On the other hand, it extends Neural Operators -- generalizations of neural networks to model mappings between spaces of functions -- in that it can parameterize solution operators of SPDEs depending simultaneously on the initial condition and a realization of the driving noise. By performing operations in the spectral domain, we show how a Neural SPDE can be evaluated in two ways, either by calling an ODE solver (emulating a spectral Galerkin scheme), or by solving a fixed point problem. Experiments on various semilinear SPDEs, including the stochastic Navier-Stokes equations, demonstrate how the Neural SPDE model is capable of learning complex spatiotemporal dynamics in a resolution-invariant way, with better accuracy and lighter training data requirements compared to alternative models, and up to 3 orders of magnitude faster than traditional solvers.  ( 3 min )
    DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation. (arXiv:2202.00972v2 [eess.IV] UPDATED)
    Deep learning architecture with convolutional neural network (CNN) achieves outstanding success in the field of computer vision. Where U-Net, an encoder-decoder architecture structured by CNN, makes a great breakthrough in biomedical image segmentation and has been applied in a wide range of practical scenarios. However, the equal design of every downsampling layer in the encoder part and simply stacked convolutions do not allow U-Net to extract sufficient information of features from different depths. The increasing complexity of medical images brings new challenges to the existing methods. In this paper, we propose a deeper and more compact split-attention u-shape network (DCSAU-Net), which efficiently utilises low-level and high-level semantic information based on two novel frameworks: primary feature conservation and compact split-attention block. We evaluate the proposed model on CVC-ClinicDB, 2018 Data Science Bowl, ISIC-2018 and SegPC-2021 datasets. As a result, DCSAU-Net displays better performance than other state-of-the-art (SOTA) methods in terms of the mean Intersection over Union (mIoU) and F1-socre. More significantly, the proposed model demonstrates excellent segmentation performance on challenging images. The code for our work and more technical details can be found at https://github.com/xq141839/DCSAU-Net.  ( 3 min )
    All are Worth Words: a ViT Backbone for Score-based Diffusion Models. (arXiv:2209.12152v1 [cs.CV])
    Vision transformers (ViT) have shown promise in various vision tasks including low-level ones while the U-Net remains dominant in score-based diffusion models. In this paper, we perform a systematical empirical study on the ViT-based architectures in diffusion models. Our results suggest that adding extra long skip connections (like the U-Net) to ViT is crucial to diffusion models. The new ViT architecture, together with other improvements, is referred to as U-ViT. On several popular visual datasets, U-ViT achieves competitive generation results to SOTA U-Net while requiring comparable amount of parameters and computation if not less.
    Exploring Example Influence in Continual Learning. (arXiv:2209.12241v1 [cs.LG])
    Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S, remembering past tasks) and Plasticity (P, adapting to new tasks). Due to the fact that past training data is not available, it is valuable to explore the influence difference on S and P among training examples, which may improve the learning pattern towards better SP. Inspired by Influence Function (IF), we first study example influence via adding perturbation to example weight and computing the influence derivation. To avoid the storage and calculation burden of Hessian inverse in neural networks, we propose a simple yet effective MetaSP algorithm to simulate the two key steps in the computation of IF and obtain the S- and P-aware example influence. Moreover, we propose to fuse two kinds of example influence by solving a dual-objective optimization problem, and obtain a fused influence towards SP Pareto optimality. The fused influence can be used to control the update of model and optimize the storage of rehearsal. Empirical results show that our algorithm significantly outperforms state-of-the-art methods on both task- and class-incremental benchmark CL datasets.
    Composite Layers for Deep Anomaly Detection on 3D Point Clouds. (arXiv:2209.11796v1 [cs.CV])
    Deep neural networks require specific layers to process point clouds, as the scattered and irregular location of points prevents us from using convolutional filters. Here we introduce the composite layer, a new convolutional operator for point clouds. The peculiarity of our composite layer is that it extracts and compresses the spatial information from the position of points before combining it with their feature vectors. Compared to well-known point-convolutional layers such as those of ConvPoint and KPConv, our composite layer provides additional regularization and guarantees greater flexibility in terms of design and number of parameters. To demonstrate the design flexibility, we also define an aggregate composite layer that combines spatial information and features in a nonlinear manner, and we use these layers to implement a convolutional and an aggregate CompositeNet. We train our CompositeNets to perform classification and, most remarkably, unsupervised anomaly detection. Our experiments on synthetic and real-world datasets show that, in both tasks, our CompositeNets outperform ConvPoint and achieve similar results as KPConv despite having a much simpler architecture. Moreover, our CompositeNets substantially outperform existing solutions for anomaly detection on point clouds.
    Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection. (arXiv:2209.12148v1 [cs.CV])
    Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where the learning is conducted on normal examples only. An entire family of successful anomaly detection methods is based on learning to reconstruct masked normal inputs (e.g. patches, future frames, etc.) and exerting the magnitude of the reconstruction error as an indicator for the abnormality level. Unlike other reconstruction-based methods, we present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. The proposed self-supervised block is extremely flexible, enabling information masking at any layer of a neural network and being compatible with a wide range of neural architectures. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, as well as a transformer for channel-wise attention. Furthermore, we show that our block is applicable to a wider variety of tasks, adding anomaly detection in medical images and thermal videos to the previously considered tasks based on RGB images and surveillance videos. We exhibit the generality and flexibility of SSMCTB by integrating it into multiple state-of-the-art neural models for anomaly detection, bringing forth empirical results that confirm considerable performance improvements on five benchmarks: MVTec AD, BRATS, Avenue, ShanghaiTech, and Thermal Rare Event. We release our code and data as open source at https://github.com/ristea/ssmctb.
    High-Resolution Satellite Imagery for Modeling the Impact of Aridification on Crop Production. (arXiv:2209.12238v1 [cs.CV])
    The availability of well-curated datasets has driven the success of Machine Learning (ML) models. Despite the increased access to earth observation data for agriculture, there is a scarcity of curated, labelled datasets, which limits the potential of its use in training ML models for remote sensing (RS) in agriculture. To this end, we introduce a first-of-its-kind dataset, SICKLE, having time-series images at different spatial resolutions from 3 different satellites, annotated with multiple key cropping parameters for paddy cultivation for the Cauvery Delta region in Tamil Nadu, India. The dataset comprises of 2,398 season-wise samples from 388 unique plots distributed across 4 districts of the Delta. The dataset covers multi-spectral, thermal and microwave data between the time period January 2018-March 2021. The paddy samples are annotated with 4 key cropping parameters, i.e. sowing date, transplanting date, harvesting date and crop yield. This is one of the first studies to consider the growing season (using sowing and harvesting dates) as part of a dataset. We also propose a yield prediction strategy that uses time-series data generated based on the observed growing season and the standard seasonal information obtained from Tamil Nadu Agricultural University for the region. The consequent performance improvement highlights the impact of ML techniques that leverage domain knowledge that are consistent with standard practices followed by farmers in a specific region. We benchmark the dataset on 3 separate tasks, namely crop type, phenology date (sowing, transplanting, harvesting) and yield prediction, and develop an end-to-end framework for predicting key crop parameters in a real-world setting.
    An Empirical Study on Cross-X Transfer for Legal Judgment Prediction. (arXiv:2209.12325v1 [cs.CL])
    Cross-lingual transfer learning has proven useful in a variety of Natural Language Processing (NLP) tasks, but it is understudied in the context of legal NLP, and not at all in Legal Judgment Prediction (LJP). We explore transfer learning techniques on LJP using the trilingual Swiss-Judgment-Prediction dataset, including cases written in three languages. We find that cross-lingual transfer improves the overall results across languages, especially when we use adapter-based fine-tuning. Finally, we further improve the model's performance by augmenting the training dataset with machine-translated versions of the original documents, using a 3x larger training corpus. Further on, we perform an analysis exploring the effect of cross-domain and cross-regional transfer, i.e., train a model across domains (legal areas), or regions. We find that in both settings (legal areas, origin regions), models trained across all groups perform overall better, while they also have improved results in the worst-case scenarios. Finally, we report improved results when we ambitiously apply cross-jurisdiction transfer, where we further augment our dataset with Indian legal cases.
    Asset Pricing and Deep Learning. (arXiv:2209.12014v1 [q-fin.ST])
    Traditional machine learning methods have been widely studied in financial innovation. My study focuses on the application of deep learning methods on asset pricing. I investigate various deep learning methods for asset pricing, especially for risk premia measurement. All models take the same set of predictive signals (firm characteristics, systematic risks and macroeconomics). I demonstrate high performance of all kinds of state-of-the-art (SOTA) deep learning methods, and figure out that RNNs with memory mechanism and attention have the best performance in terms of predictivity. Furthermore, I demonstrate large economic gains to investors using deep learning forecasts. The results of my comparative experiments highlight the importance of domain knowledge and financial theory when designing deep learning models. I also show return prediction tasks bring new challenges to deep learning. The time varying distribution causes distribution shift problem, which is essential for financial time series prediction. I demonstrate that deep learning methods can improve asset risk premium measurement. Due to the booming deep learning studies, they can constantly promote the study of underlying financial mechanisms behind asset pricing. I also propose a promising research method that learning from data and figuring out the underlying economic mechanisms through explainable artificial intelligence (AI) methods. My findings not only justify the value of deep learning in blooming fintech development, but also highlight their prospects and advantages over traditional machine learning methods.
    The impacts of various parameters on learning process and machine learning based performance prediction in online coding competitions. (arXiv:2112.14407v3 [cs.HC] UPDATED)
    Various parameters affect the performance of students in online coding competitions. Students' behavior, approach, emotions, and problem difficulty levels significantly impact their performance in online coding competitions. We have organized two coding competitions to understand the effects of the above parameters. We have done the online survey at the end of each coding competition, and it contains questions related to the behavior, approach, and emotions of students during online coding competitions. Students are evaluated based on the time and status of the submissions. We have carried out a detailed analysis to address the impact of students' approach, behavior, and emotions on the learning process in online coding competitions. Two difficulty levels are proposed based on the time and status of submissions. The impact of difficulty levels on machine learning-based performance prediction is presented in this research work. Based on time, the coding solution submissions have two classes "Less than 15 minutes" and "More than 15 minutes". There are three classes, "Complete solution", "Partial solution", and "Not submitted at all," based on the submission status. The appropriate approaches are found for both the coding competitions to submit the solution within 15 minutes. Machine learning classifiers are trained and evaluated for the above classification problems. The impacts of mood, emotions, and difficulty levels on the learning process are also assessed by comparing the results of machine learning models for both coding competitions.  ( 3 min )
    Generating Formal Safety Assurances for High-Dimensional Reachability. (arXiv:2209.12336v1 [cs.RO])
    Providing formal safety and performance guarantees for autonomous systems is becoming increasingly important as they are integrated in our society. Hamilton-Jacobi (HJ) reachability analysis is a popular formal verification tool for providing these guarantees, since it can handle general nonlinear system dynamics, bounded adversarial system disturbances, and state and input constraints. However, it involves solving a PDE, whose computational and memory complexity scales exponentially with respect to the state dimensionality, making its direct use on large-scale systems intractable. A recently proposed method called DeepReach overcomes this challenge by leveraging a sinusoidal neural network PDE solver for high-dimensional reachability problems, whose computational requirements scale with the complexity of the underlying reachable tube rather than the state space dimension. Unfortunately, neural networks can make errors and thus the computed solution may not be safe, which falls short of achieving our overarching goal to provide formal safety assurances. In this work, we propose a method to compute an error bound for the DeepReach solution. This error bound can then be used for reachable tube correction, resulting in a provably safe approximation of the true reachable tube. We also propose a scenario optimization-based approach to compute this error bound for general nonlinear dynamical systems. We demonstrate the efficacy of the proposed approach in obtaining reachable tubes for high-dimensional rocket-landing and multi-vehicle collision-avoidance problems.
    Temporally Extended Successor Representations. (arXiv:2209.12331v1 [cs.LG])
    We present a temporally extended variation of the successor representation, which we term t-SR. t-SR captures the expected state transition dynamics of temporally extended actions by constructing successor representations over primitive action repeats. This form of temporal abstraction does not learn a top-down hierarchy of pertinent task structures, but rather a bottom-up composition of coupled actions and action repetitions. This lessens the amount of decisions required in control without learning a hierarchical policy. As such, t-SR directly considers the time horizon of temporally extended action sequences without the need for predefined or domain-specific options. We show that in environments with dynamic reward structure, t-SR is able to leverage both the flexibility of the successor representation and the abstraction afforded by temporally extended actions. Thus, in a series of sparsely rewarded gridworld environments, t-SR optimally adapts learnt policies far faster than comparable value-based, model-free reinforcement learning methods. We also show that the manner in which t-SR learns to solve these tasks requires the learnt policy to be sampled consistently less often than non-temporally extended policies.
    Towards Demystifying Representation Learning with Non-contrastive Self-supervision. (arXiv:2110.04947v2 [cs.LG] UPDATED)
    Non-contrastive methods of self-supervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image. These approaches have achieved remarkable performance in practice, but the theoretical understanding lags behind. Tian et al. 2021 explained why the representation does not collapse to zero, however, how the feature is learned still remains mysterious. In our work, we prove in a linear network, non-contrastive methods learn a desirable projection matrix and also reduce the sample complexity on downstream tasks. Our analysis suggests that weight decay acts as an implicit threshold that discards the features with high variance under data augmentations, and keeps the features with low variance. Inspired by our theory, we design a simpler and more computationally efficient algorithm DirectCopy by removing the eigen-decomposition step in the original DirectPred algorithm in Tian et al. 2021. Our experiments show that DirectCopy rivals or even outperforms DirectPred on STL-10, CIFAR-10, CIFAR-100, and ImageNet.  ( 2 min )
    An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret. (arXiv:2209.11817v1 [cs.LG])
    Recently a multi-agent variant of the classical multi-armed bandit was proposed to tackle fairness issues in online learning. Inspired by a long line of work in social choice and economics, the goal is to optimize the Nash social welfare instead of the total utility. Unfortunately previous algorithms either are not efficient or achieve sub-optimal regret in terms of the number of rounds $T$. We propose a new efficient algorithm with lower regret than even previous inefficient ones. For $N$ agents, $K$ arms, and $T$ rounds, our approach has a regret bound of $\tilde{O}(\sqrt{NKT} + NK)$. This is an improvement to the previous approach, which has regret bound of $\tilde{O}( \min(NK, \sqrt{N} K^{3/2})\sqrt{T})$. We also complement our efficient algorithm with an inefficient approach with $\tilde{O}(\sqrt{KT} + N^2K)$ regret. The experimental findings confirm the effectiveness of our efficient algorithm compared to the previous approaches.
    The network signature of constellation line figures. (arXiv:2110.12329v4 [cs.SI] UPDATED)
    In traditional astronomies across the world, groups of stars in the night sky were linked into constellations -- symbolic representations rich in meaning and with practical roles. In some sky cultures, constellations are represented as line (or connect-the-dot) figures, which are spatial networks drawn over the fixed background of stars. We analyse 1802 line figures from 56 sky cultures spanning all continents, in terms of their network, spatial, and brightness features, and ask what associations exist between these visual features and culture type or sky region. First, an embedded map of constellations is learnt, to show clusters of line figures. We then form the network of constellations (as linked by their similarity), to study how similar cultures are by computing their assortativity (or homophily) over the network. Finally, we measure the diversity (or entropy) index for the set of constellations drawn per sky region. Our results show distinct types of line figures, and that many folk astronomies with oral traditions have widespread similarities in constellation design, which do not align with cultural ancestry. In a minority of sky regions, certain line designs appear universal, but this is not the norm: in the majority of sky regions, the line geometries are diverse.
    Doubly Fair Dynamic Pricing. (arXiv:2209.11837v1 [cs.LG])
    We study the problem of online dynamic pricing with two types of fairness constraints: a "procedural fairness" which requires the proposed prices to be equal in expectation among different groups, and a "substantive fairness" which requires the accepted prices to be equal in expectation among different groups. A policy that is simultaneously procedural and substantive fair is referred to as "doubly fair". We show that a doubly fair policy must be random to have higher revenue than the best trivial policy that assigns the same price to different groups. In a two-group setting, we propose an online learning algorithm for the 2-group pricing problems that achieves $\tilde{O}(\sqrt{T})$ regret, zero procedural unfairness and $\tilde{O}(\sqrt{T})$ substantive unfairness over $T$ rounds of learning. We also prove two lower bounds showing that these results on regret and unfairness are both information-theoretically optimal up to iterated logarithmic factors. To the best of our knowledge, this is the first dynamic pricing algorithm that learns to price while satisfying two fairness constraints at the same time.
    Solving Seismic Wave Equations on Variable Velocity Models with Fourier Neural Operator. (arXiv:2209.12340v1 [cs.LG])
    In the study of subsurface seismic imaging, solving the acoustic wave equation is a pivotal component in existing models. With the advancement of deep learning, neural networks are applied to numerically solve partial differential equations by learning the mapping between the inputs and the solution of the equation, the wave equation in particular, since traditional methods can be time consuming if numerous instances are to be solved. Previous works that concentrate on solving the wave equation by neural networks consider either a single velocity model or multiple simple velocity models, which is restricted in practice. Therefore, inspired by the idea of operator learning, this work leverages the Fourier neural operator (FNO) to effectively learn the frequency domain seismic wavefields under the context of variable velocity models. Moreover, we propose a new framework paralleled Fourier neural operator (PFNO) for efficiently training the FNO-based solver given multiple source locations and frequencies. Numerical experiments demonstrate the high accuracy of both FNO and PFNO with complicated velocity models in the OpenFWI datasets. Furthermore, the cross-dataset generalization test verifies that PFNO adapts to out-of-distribution velocity models. Also, PFNO has robust performance in the presence of random noise in the labels. Finally, PFNO admits higher computational efficiency on large-scale testing datasets, compared with the traditional finite-difference method. The aforementioned advantages endow the FNO-based solver with the potential to build powerful models for research on seismic waves.  ( 3 min )
    DS6, Deformation-aware Semi-supervised Learning: Application to Small Vessel Segmentation with Noisy Training Data. (arXiv:2006.10802v3 [eess.IV] UPDATED)
    Blood vessels of the brain provide the human brain with the required nutrients and oxygen. As a vulnerable part of the cerebral blood supply, pathology of small vessels can cause serious problems such as Cerebral Small Vessel Diseases (CSVD). It has also been shown that CSVD is related to neurodegeneration, such as Alzheimer's disease. With the advancement of 7 Tesla MRI systems, higher spatial image resolution can be achieved, enabling the depiction of very small vessels in the brain. Non-Deep Learning-based approaches for vessel segmentation, e.g., Frangi's vessel enhancement with subsequent thresholding, are capable of segmenting medium to large vessels but often fail to segment small vessels. The sensitivity of these methods to small vessels can be increased by extensive parameter tuning or by manual corrections, albeit making them time-consuming, laborious, and not feasible for larger datasets. This paper proposes a deep learning architecture to automatically segment small vessels in 7 Tesla 3D Time-of-Flight (ToF) Magnetic Resonance Angiography (MRA) data. The algorithm was trained and evaluated on a small imperfect semi-automatically segmented dataset of only 11 subjects; using six for training, two for validation, and three for testing. The deep learning model based on U-Net Multi-Scale Supervision was trained using the training subset and was made equivariant to elastic deformations in a self-supervised manner using deformation-aware learning to improve the generalisation performance. The proposed technique was evaluated quantitatively and qualitatively against the test set and achieved a Dice score of 80.44 $\pm$ 0.83. Furthermore, the result of the proposed method was compared against a selected manually segmented region (62.07 resultant Dice) and has shown a considerable improvement (18.98\%) with deformation-aware learning.
    Nonstochastic Bandits with Composite Anonymous Feedback. (arXiv:2112.02866v2 [cs.LG] UPDATED)
    We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player, but rather spread over the subsequent rounds in an adversarial way. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions. This setting encompasses as a special case the easier task of bandits with delayed feedback, a well-studied framework where the player observes the delayed losses individually. Our first contribution is a general reduction transforming a standard bandit algorithm into one that can operate in the harder setting: We bound the regret of the transformed algorithm in terms of the stability and regret of the original algorithm. Then, we show that the transformation of a suitably tuned FTRL with Tsallis entropy has a regret of order $\sqrt{(d+1)KT}$, where $d$ is the maximum delay, $K$ is the number of arms, and $T$ is the time horizon. Finally, we show that our results cannot be improved in general by exhibiting a matching (up to a log factor) lower bound on the regret of any algorithm operating in this setting.  ( 3 min )
    Machine Learning and Artificial Intelligence-Driven Multi-Scale Modeling for High Burnup Accident-Tolerant Fuels for Light Water-Based SMR Applications. (arXiv:2209.12146v1 [eess.SY])
    The concept of small modular reactor has changed the outlook for tackling future energy crises. This new reactor technology is very promising considering its lower investment requirements, modularity, design simplicity, and enhanced safety features. The application of artificial intelligence-driven multi-scale modeling (neutronics, thermal hydraulics, fuel performance, etc.) incorporating Digital Twin and associated uncertainties in the research of small modular reactors is a recent concept. In this work, a comprehensive study is conducted on the multiscale modeling of accident-tolerant fuels. The application of these fuels in the light water-based small modular reactors is explored. This chapter also focuses on the application of machine learning and artificial intelligence in the design optimization, control, and monitoring of small modular reactors. Finally, a brief assessment of the research gap on the application of artificial intelligence to the development of high burnup composite accident-tolerant fuels is provided. Necessary actions to fulfill these gaps are also discussed.
    SPRITZ-1.5C: Employing Deep Ensemble Learning for Improving the Security of Computer Networks against Adversarial Attacks. (arXiv:2209.12195v1 [cs.CR])
    In the past few years, Convolutional Neural Networks (CNN) have demonstrated promising performance in various real-world cybersecurity applications, such as network and multimedia security. However, the underlying fragility of CNN structures poses major security problems, making them inappropriate for use in security-oriented applications including such computer networks. Protecting these architectures from adversarial attacks necessitates using security-wise architectures that are challenging to attack. In this study, we present a novel architecture based on an ensemble classifier that combines the enhanced security of 1-Class classification (known as 1C) with the high performance of conventional 2-Class classification (known as 2C) in the absence of attacks.Our architecture is referred to as the 1.5-Class (SPRITZ-1.5C) classifier and constructed using a final dense classifier, one 2C classifier (i.e., CNNs), and two parallel 1C classifiers (i.e., auto-encoders). In our experiments, we evaluated the robustness of our proposed architecture by considering eight possible adversarial attacks in various scenarios. We performed these attacks on the 2C and SPRITZ-1.5C architectures separately. The experimental results of our study showed that the Attack Success Rate (ASR) of the I-FGSM attack against a 2C classifier trained with the N-BaIoT dataset is 0.9900. In contrast, the ASR is 0.0000 for the SPRITZ-1.5C classifier.
    DeepChrome 2.0: Investigating and Improving Architectures, Visualizations, & Experiments. (arXiv:2209.11923v1 [cs.LG])
    Histone modifications play a critical role in gene regulation. Consequently, predicting gene expression from histone modification signals is a highly motivated problem in epigenetics. We build upon the work of DeepChrome by Singh et al. (2016), who trained classifiers that map histone modification signals to gene expression. We present a novel visualization technique for providing insight into combinatorial relationships among histone modifications for gene regulation that uses a generative adversarial network to generate histone modification signals. We also explore and compare various architectural changes, with results suggesting that the 645k-parameter convolutional neural network from DeepChrome has the same predictive power as a 12-parameter linear network. Results from cross-cell prediction experiments, where the model is trained and tested on datasets of varying sizes, cell-types, and correlations, suggest the relationship between histone modification signals and gene expression is independent of cell type. We release our PyTorch re-implementation of DeepChrome on GitHub \footnote{\url{github.com/ssss1029/gene_expression_294}}.\parfillskip=0pt
    Blinder: End-to-end Privacy Protection in Sensing Systems via Personalized Federated Learning. (arXiv:2209.12046v1 [cs.LG])
    This paper proposes a sensor data anonymization model that is trained on decentralized data and strikes a desirable trade-off between data utility and privacy, even in heterogeneous settings where the collected sensor data have different underlying distributions. Our anonymization model, dubbed Blinder, is based on a variational autoencoder and discriminator networks trained in an adversarial fashion. We use the model-agnostic meta-learning framework to adapt the anonymization model trained via federated learning to each user's data distribution. We evaluate Blinder under different settings and show that it provides end-to-end privacy protection at the cost of increasing privacy loss by up to 4.00% and decreasing data utility by up to 4.24%, compared to the state-of-the-art anonymization model trained on centralized data. Our experiments confirm that Blinder can obscure multiple private attributes at once, and has sufficiently low power consumption and computational overhead for it to be deployed on edge devices and smartphones to perform real-time anonymization of sensor data.
    Weather2vec: Representation Learning for Causal Inference with Non-Local Confounding in Air Pollution and Climate Studies. (arXiv:2209.12316v1 [cs.LG])
    Estimating the causal effects of a spatially-varying intervention on a spatially-varying outcome may be subject to non-local confounding (NLC), a phenomenon that can bias estimates when the treatments and outcomes of a given unit are dictated in part by the covariates of other nearby units. In particular, NLC is a challenge for evaluating the effects of environmental policies and climate events on health-related outcomes such as air pollution exposure. This paper first formalizes NLC using the potential outcomes framework, providing a comparison with the related phenomenon of causal interference. Then, it proposes a broadly applicable framework, termed "weather2vec", that uses the theory of balancing scores to learn representations of non-local information into a scalar or vector defined for each observational unit, which is subsequently used to adjust for confounding in conjunction with causal inference methods. The framework is evaluated in a simulation study and two case studies on air pollution where the weather is an (inherently regional) known confounder.
    Algorithms that Approximate Data Removal: New Results and Limitations. (arXiv:2209.12269v1 [stat.ML])
    We study the problem of deleting user data from machine learning models trained using empirical risk minimization. Our focus is on learning algorithms which return the empirical risk minimizer and approximate unlearning algorithms that comply with deletion requests that come streaming minibatches. Leveraging the infintesimal jacknife, we develop an online unlearning algorithm that is both computationally and memory efficient. Unlike prior memory efficient unlearning algorithms, we target models that minimize objectives with non-smooth regularizers, such as the commonly used $\ell_1$, elastic net, or nuclear norm penalties. We also provide generalization, deletion capacity, and unlearning guarantees that are consistent with state of the art methods. Across a variety of benchmark datasets, our algorithm empirically improves upon the runtime of prior methods while maintaining the same memory requirements and test accuracy. Finally, we open a new direction of inquiry by proving that all approximate unlearning algorithms introduced so far fail to unlearn in problem settings where common hyperparameter tuning methods, such as cross-validation, have been used to select models.
    Data Efficient Human Intention Prediction: Leveraging Neural Network Verification and Expert Guidance. (arXiv:2108.06871v3 [cs.LG] UPDATED)
    Predicting human intention is critical to facilitating safe and efficient human-robot collaboration (HRC). However, it is challenging to build data-driven models for human intention prediction. One major challenge is due to the diversity and noise in human motion data. It is expensive to collect a massive motion dataset that comprehensively covers all possible scenarios, which leads to the scarcity of human motion data in certain scenarios, and therefore, causes difficulties in constructing robust and reliable intention predictors. To address the challenge, this paper proposes an iterative adversarial data augmentation (IADA) framework to learn neural network models from an insufficient amount of training data. The method uses neural network verification to identify the most "confusing" input samples and leverages expert guidance to safely and iteratively augment the training data with these samples. The proposed framework is applied to collected human datasets. The experiments demonstrate that our method can achieve more robust and accurate prediction performance compared to existing training methods.  ( 2 min )
    On the Opportunities and Challenges of using Animals Videos in Reinforcement Learning. (arXiv:2209.12347v1 [eess.SY])
    We investigate the possibility of using animals videos to improve Reinforcement Learning (RL) efficiency and performance. Under a theoretical perspective, we motivate the use of weighted policy optimization for off-policy RL, describe the main challenges when learning from videos and propose solutions. We test our ideas both in offline and online RL and show encouraging results on a series of 2D navigation tasks.
    Joint Triplet Loss Learning for Next New POI Recommendation. (arXiv:2209.12162v1 [cs.IR])
    Sparsity of the User-POI matrix is a well established problem for next POI recommendation, which hinders effective learning of user preferences. Focusing on a more granular extension of the problem, we propose a Joint Triplet Loss Learning (JTLL) module for the Next New ($N^2$) POI recommendation task, which is more challenging. Our JTLL module first computes additional training samples from the users' historical POI visit sequence, then, a designed triplet loss function is proposed to decrease and increase distances of POI and user embeddings based on their respective relations. Next, the JTLL module is jointly trained with recent approaches to additionally learn unvisited relations for the recommendation task. Experiments conducted on two known real-world LBSN datasets show that our joint training module was able to improve the performances of recent existing works.  ( 2 min )
    Expanding the Deployment Envelope of Behavior Prediction via Adaptive Meta-Learning. (arXiv:2209.11820v1 [cs.LG])
    Learning-based behavior prediction methods are increasingly being deployed in real-world autonomous systems, e.g., in fleets of self-driving vehicles, which are beginning to commercially operate in major cities across the world. Despite their advancements, however, the vast majority of prediction systems are specialized to a set of well-explored geographic regions or operational design domains, complicating deployment to additional cities, countries, or continents. Towards this end, we present a novel method for efficiently adapting behavior prediction models to new environments. Our approach leverages recent advances in meta-learning, specifically Bayesian regression, to augment existing behavior prediction models with an adaptive layer that enables efficient domain transfer via offline fine-tuning, online adaptation, or both. Experiments across multiple real-world datasets demonstrate that our method can efficiently adapt to a variety of unseen environments.
    CryptoGCN: Fast and Scalable Homomorphically Encrypted Graph Convolutional Network Inference. (arXiv:2209.11904v1 [cs.CR])
    Recently cloud-based graph convolutional network (GCN) has demonstrated great success and potential in many privacy-sensitive applications such as personal healthcare and financial systems. Despite its high inference accuracy and performance on cloud, maintaining data privacy in GCN inference, which is of paramount importance to these practical applications, remains largely unexplored. In this paper, we take an initial attempt towards this and develop $\textit{CryptoGCN}$--a homomorphic encryption (HE) based GCN inference framework. A key to the success of our approach is to reduce the tremendous computational overhead for HE operations, which can be orders of magnitude higher than its counterparts in the plaintext space. To this end, we develop an approach that can effectively take advantage of the sparsity of matrix operations in GCN inference to significantly reduce the computational overhead. Specifically, we propose a novel AMA data formatting method and associated spatial convolution methods, which can exploit the complex graph structure and perform efficient matrix-matrix multiplication in HE computation and thus greatly reduce the HE operations. We also develop a co-optimization framework that can explore the trade offs among the accuracy, security level, and computational overhead by judicious pruning and polynomial approximation of activation module in GCNs. Based on the NTU-XVIEW skeleton joint dataset, i.e., the largest dataset evaluated homomorphically by far as we are aware of, our experimental results demonstrate that $\textit{CryptoGCN}$ outperforms state-of-the-art solutions in terms of the latency and number of homomorphic operations, i.e., achieving as much as a 3.10$\times$ speedup on latency and reduces the total Homomorphic Operation Count by 77.4\% with a small accuracy loss of 1-1.5$\%$.
    Hebbian Deep Learning Without Feedback. (arXiv:2209.11883v1 [cs.NE])
    Recent approximations to backpropagation (BP) have mitigated many of BP's computational inefficiencies and incompatibilities with biology, but important limitations still remain. Moreover, the approximations significantly decrease accuracy in benchmarks, suggesting that an entirely different approach may be more fruitful. Here, grounded on recent theory for Hebbian learning in soft winner-take-all networks, we present multilayer SoftHebb, i.e. an algorithm that trains deep neural networks, without any feedback, target, or error signals. As a result, it achieves efficiency by avoiding weight transport, non-local plasticity, time-locking of layer updates, iterative equilibria, and (self-) supervisory or other feedback signals -- which were necessary in other approaches. Its increased efficiency and biological compatibility do not trade off accuracy compared to state-of-the-art bio-plausible learning, but rather improve it. With up to five hidden layers and an added linear classifier, accuracies on MNIST, CIFAR-10, STL-10, and ImageNet, respectively reach 99.4%, 80.3%, 76.2%, and 27.3%. In conclusion, SoftHebb shows with a radically different approach from BP that Deep Learning over few layers may be plausible in the brain and increases the accuracy of bio-plausible machine learning.
    Learning Chess With Language Models and Transformers. (arXiv:2209.11902v1 [cs.AI])
    Representing a board game and its positions by text-based notation enables the possibility of NLP applications. Language models, can help gain insight into a variety of interesting problems such as unsupervised learning rules of a game, detecting player behavior patterns, player attribution, and ultimately learning the game to beat state of the art. In this study, we applied BERT models, first to the simple Nim game to analyze its performance in the presence of noise in a setup of a few-shot learning architecture. We analyzed the model performance via three virtual players, namely Nim Guru, Random player, and Q-learner. In the second part, we applied the game learning language model to the chess game, and a large set of grandmaster games with exhaustive encyclopedia openings. Finally, we have shown that model practically learns the rules of the chess game and can survive games against Stockfish at a category-A rating level.
    Hurricane Forecasting: A Novel Multimodal Machine Learning Framework. (arXiv:2011.06125v4 [cs.LG] UPDATED)
    This paper describes a novel machine learning (ML) framework for tropical cyclone intensity and track forecasting, combining multiple ML techniques and utilizing diverse data sources. Our multimodal framework, called Hurricast, efficiently combines spatial-temporal data with statistical data by extracting features with deep-learning encoder-decoder architectures and predicting with gradient-boosted trees. We evaluate our models in the North Atlantic and Eastern Pacific basins on 2016-2019 for 24-hour lead time track and intensity forecasts and show they achieve comparable mean absolute error and skill to current operational forecast models while computing in seconds. Furthermore, the inclusion of Hurricast into an operational forecast consensus model could improve over the National Hurricane Center's official forecast, thus highlighting the complementary properties with existing approaches. In summary, our work demonstrates that utilizing machine learning techniques to combine different data sources can lead to new opportunities in tropical cyclone forecasting.  ( 3 min )
    Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. (arXiv:2106.15013v4 [cs.LG] UPDATED)
    Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and generalization and in particular the critical role of small random initialization are not fully understood. In this paper, we take a step towards demystifying this role by proving that small random initialization followed by a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, also puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Concretely, we focus on the problem of reconstructing a low-rank matrix from a few measurements via a natural nonconvex formulation. In this setting, we show that the trajectory of the gradient descent iterations from small random initialization can be approximately decomposed into three phases: (I) a spectral or alignment phase where we show that that the iterates have an implicit spectral bias akin to spectral initialization allowing us to show that at the end of this phase the column space of the iterates and the underlying low-rank matrix are sufficiently aligned, (II) a saddle avoidance/refinement phase where we show that the trajectory of the gradient iterates moves away from certain degenerate saddle points, and (III) a local refinement phase where we show that after avoiding the saddles the iterates converge quickly to the underlying low-rank matrix. Underlying our analysis are insights for the analysis of overparameterized nonconvex optimization schemes that may have implications for computational problems beyond low-rank reconstruction.  ( 3 min )
    Periodic Graph Transformers for Crystal Material Property Prediction. (arXiv:2209.11807v1 [cs.LG])
    We consider representation learning on periodic graphs encoding crystal materials. Different from regular graphs, periodic graphs consist of a minimum unit cell repeating itself on a regular lattice in 3D space. How to effectively encode these periodic structures poses unique challenges not present in regular graph representation learning. In addition to being E(3) invariant, periodic graph representations need to be periodic invariant. That is, the learned representations should be invariant to shifts of cell boundaries as they are artificially imposed. Furthermore, the periodic repeating patterns need to be captured explicitly as lattices of different sizes and orientations may correspond to different materials. In this work, we propose a transformer architecture, known as Matformer, for periodic graph representation learning. Our Matformer is designed to be invariant to periodicity and can capture repeating patterns explicitly. In particular, Matformer encodes periodic patterns by efficient use of geometric distances between the same atoms in neighboring cells. Experimental results on multiple common benchmark datasets show that our Matformer outperforms baseline methods consistently. In addition, our results demonstrate the importance of periodic invariance and explicit repeating pattern encoding for crystal representation learning.
    Tighter Variational Bounds are Not Necessarily Better. A Research Report on Implementation, Ablation Study, and Extensions. (arXiv:2209.11875v1 [stat.ML])
    This report explains, implements and extends the works presented in "Tighter Variational Bounds are Not Necessarily Better" (T Rainforth et al., 2018). We provide theoretical and empirical evidence that increasing the number of importance samples $K$ in the importance weighted autoencoder (IWAE) (Burda et al., 2016) degrades the signal-to-noise ratio (SNR) of the gradient estimator in the inference network and thereby affecting the full learning process. In other words, even though increasing $K$ decreases the standard deviation of the gradients, it also reduces the magnitude of the true gradient faster, thereby increasing the relative variance of the gradient updates. Extensive experiments are performed to understand the importance of $K$. These experiments suggest that tighter variational bounds are beneficial for the generative network, whereas looser bounds are preferable for the inference network. With these insights, three methods are implemented and studied: the partially importance weighted autoencoder (PIWAE), the multiply importance weighted autoencoder (MIWAE) and the combination importance weighted autoencoder (CIWAE). Each of these three methods entails IWAE as a special case but employs the importance weights in different ways to ensure a higher SNR of the gradient estimators. In our research study and analysis, the efficacy of these algorithms is tested on multiple datasets such as MNIST and Omniglot. Finally, we demonstrate that the three presented IWAE variations are able to generate approximate posterior distributions that are much closer to the true posterior distribution than for the IWAE, while matching the performance of the IWAE generative network or potentially outperforming it in the case of PIWAE.
    In-context Learning and Induction Heads. (arXiv:2209.11895v1 [cs.LG])
    "Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.
    Batch size-invariance for policy optimization. (arXiv:2110.00641v3 [cs.LG] UPDATED)
    We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters. Stochastic gradient descent is well-known to have this property at small batch sizes, via the learning rate. However, some policy optimization algorithms (such as PPO) do not have this property, because of how they control the size of policy updates. In this work we show how to make these algorithms batch size-invariant. Our key insight is to decouple the proximal policy (used for controlling policy updates) from the behavior policy (used for off-policy corrections). Our experiments help explain why these algorithms work, and additionally show how they can make more efficient use of stale data.  ( 2 min )
    Physics-Informed Graph Neural Network for Spatial-temporal Production Forecasting. (arXiv:2209.11885v1 [cs.LG])
    Production forecast based on historical data provides essential value for developing hydrocarbon resources. Classic history matching workflow is often computationally intense and geometry-dependent. Analytical data-driven models like decline curve analysis (DCA) and capacitance resistance models (CRM) provide a grid-free solution with a relatively simple model capable of integrating some degree of physics constraints. However, the analytical solution may ignore subsurface geometries and is appropriate only for specific flow regimes and otherwise may violate physics conditions resulting in degraded model prediction accuracy. Machine learning-based predictive model for time series provides non-parametric, assumption-free solutions for production forecasting, but are prone to model overfit due to training data sparsity; therefore may be accurate over short prediction time intervals. We propose a grid-free, physics-informed graph neural network (PI-GNN) for production forecasting. A customized graph convolution layer aggregates neighborhood information from historical data and has the flexibility to integrate domain expertise into the data-driven model. The proposed method relaxes the dependence on close-form solutions like CRM and honors the given physics-based constraints. Our proposed method is robust, with improved performance and model interpretability relative to the conventional CRM and GNN baseline without physics constraints.
    Optimal Binary Classification Beyond Accuracy. (arXiv:2107.01777v3 [math.ST] UPDATED)
    The vast majority of statistical theory on binary classification characterizes performance in terms of accuracy. However, accuracy is known in many cases to poorly reflect the practical consequences of classification error, most famously in imbalanced binary classification, where data are dominated by samples from one of two classes. The first part of this paper derives a novel generalization of the Bayes-optimal classifier from accuracy to any performance metric computed from the confusion matrix. Specifically, this result (a) demonstrates that stochastic classifiers sometimes outperform the best possible deterministic classifier and (b) removes an empirically unverifiable absolute continuity assumption that is poorly understood but pervades existing results. We then demonstrate how to use this generalized Bayes classifier to obtain regret bounds in terms of the error of estimating regression functions under uniform loss. Finally, we use these results to develop some of the first finite-sample statistical guarantees specific to imbalanced binary classification. Specifically, we demonstrate that optimal classification performance depends on properties of class imbalance, such as a novel notion called Uniform Class Imbalance, that have not previously been formalized. We further illustrate these contributions numerically in the case of $k$-nearest neighbor classification
    Interventional Causal Representation Learning. (arXiv:2209.11924v1 [stat.ML])
    The theory of identifiable representation learning aims to build general-purpose methods that extract high-level latent (causal) factors from low-level sensory data. Most existing works focus on identifiable representation learning with observational data, relying on distributional assumptions on latent (causal) factors. However, in practice, we often also have access to interventional data for representation learning. How can we leverage interventional data to help identify high-level latents? To this end, we explore the role of interventional data for identifiable representation learning in this work. We study the identifiability of latent causal factors with and without interventional data, under minimal distributional assumptions on the latents. We prove that, if the true latent variables map to the observed high-dimensional data via a polynomial function, then representation learning via minimizing the standard reconstruction loss of autoencoders identifies the true latents up to affine transformation. If we further have access to interventional data generated by hard $do$ interventions on some of the latents, then we can identify these intervened latents up to permutation, shift and scaling.
    M2TRec: Metadata-aware Multi-task Transformer for Large-scale and Cold-start free Session-based Recommendations. (arXiv:2209.11824v1 [cs.IR])
    Session-based recommender systems (SBRSs) have shown superior performance over conventional methods. However, they show limited scalability on large-scale industrial datasets since most models learn one embedding per item. This leads to a large memory requirement (of storing one vector per item) and poor performance on sparse sessions with cold-start or unpopular items. Using one public and one large industrial dataset, we experimentally show that state-of-the-art SBRSs have low performance on sparse sessions with sparse items. We propose M2TRec, a Metadata-aware Multi-task Transformer model for session-based recommendations. Our proposed method learns a transformation function from item metadata to embeddings, and is thus, item-ID free (i.e., does not need to learn one embedding per item). It integrates item metadata to learn shared representations of diverse item attributes. During inference, new or unpopular items will be assigned identical representations for the attributes they share with items previously observed during training, and thus will have similar representations with those items, enabling recommendations of even cold-start and sparse items. Additionally, M2TRec is trained in a multi-task setting to predict the next item in the session along with its primary category and subcategories. Our multi-task strategy makes the model converge faster and significantly improves the overall performance. Experimental results show significant performance gains using our proposed approach on sparse items on the two datasets.
    Concordance based Survival Cobra with regression type weak learners. (arXiv:2209.11919v1 [stat.ML])
    In this paper, we predict conditional survival functions through a combined regression strategy. We take weak learners as different random survival trees. We propose to maximize concordance in the right-censored set up to find the optimal parameters. We explore two approaches, a usual survival cobra and a novel weighted predictor based on the concordance index. Our proposed formulations use two different norms, say, Max-norm and Frobenius norm, to find a proximity set of predictions from query points in the test dataset. We illustrate our algorithms through three different real-life dataset implementations.
    Creating Compact Regions of Social Determinants of Health. (arXiv:2209.11836v1 [cs.LG])
    Regionalization is the act of breaking a dataset into contiguous homogeneous regions that are heterogeneous from each other. Many different algorithms exist for performing regionalization; however, using these algorithms on large real world data sets have only become feasible in terms of compute power in recent years. Very few studies have been done comparing different regionalization methods, and those that do lack analysis in memory, scalability, geographic metrics, and large-scale real-world applications. This study compares state-of-the-art regionalization methods, namely, Agglomerative Clustering, SKATER, REDCAP, AZP, and Max-P-Regions using real world social determinant of health (SDOH) data. The scale of real world SDOH data, up to 1 million data points in this study, not only compares the algorithms over different data sets but provides a stress test for each individual regionalization algorithm, most of which have never been run on such scales previously. We use several new geographic metrics to compare algorithms as well as perform a comparative memory analysis. The prevailing regionalization method is then compared with unconstrained K-Means clustering on their ability to separate real health data in Virginia and Washington DC.
    Contrastive learning for unsupervised medical image clustering and reconstruction. (arXiv:2209.12005v1 [cs.CV])
    The lack of large labeled medical imaging datasets, along with significant inter-individual variability compared to clinically established disease classes, poses significant challenges in exploiting medical imaging information in a precision medicine paradigm, where in principle dense patient-specific data can be employed to formulate individual predictions and/or stratify patients into finer-grained groups which may follow more homogeneous trajectories and therefore empower clinical trials. In order to efficiently explore the effective degrees of freedom underlying variability in medical images in an unsupervised manner, in this work we propose an unsupervised autoencoder framework which is augmented with a contrastive loss to encourage high separability in the latent space. The model is validated on (medical) benchmark datasets. As cluster labels are assigned to each example according to cluster assignments, we compare performance with a supervised transfer learning baseline. Our method achieves similar performance to the supervised architecture, indicating that separation in the latent space reproduces expert medical observer-assigned labels. The proposed method could be beneficial for patient stratification, exploring new subdivisions of larger classes or pathological continua or, due to its sampling abilities in a variation setting, data augmentation in medical image processing.
    Hybrid Multimodal Fusion for Humor Detection. (arXiv:2209.11949v1 [cs.LG])
    In this paper, we present our solution to the MuSe-Humor sub-challenge of the Multimodal Emotional Challenge (MuSe) 2022. The goal of the MuSe-Humor sub-challenge is to detect humor and calculate AUC from audiovisual recordings of German football Bundesliga press conferences. It is annotated for humor displayed by the coaches. For this sub-challenge, we first build a discriminant model using the transformer module and BiLSTM module, and then propose a hybrid fusion strategy to use the prediction results of each modality to improve the performance of the model. Our experiments demonstrate the effectiveness of our proposed model and hybrid fusion strategy on multimodal fusion, and the AUC of our proposed model on the test set is 0.8972.
    Removal of Ocular Artifacts in EEG Using Deep Learning. (arXiv:2209.11980v1 [eess.SP])
    EEG signals are complex and low-frequency signals. Therefore, they are easily influenced by external factors. EEG artifact removal is crucial in neuroscience because artifacts have a significant impact on the results of EEG analysis. The removal of ocular artifacts is the most challenging among these artifacts. In this study, a novel ocular artifact removal method is presented by developing bidirectional long-short term memory (BiLSTM)-based deep learning (DL) models. We created a benchmarking dataset to train and test proposed DL models by combining the EEGdenoiseNet and DEAP datasets. We also augmented the data by contaminating ground-truth clean EEG signals with EOG at various SNR levels. The BiLSTM network is then fed to features extracted from augmented signals using highly-localized time-frequency (TF) coefficients obtained by wavelet synchrosqueezed transform (WSST). We also compare the WSST-based DL model results with traditional TF analysis (TFA) methods namely short-time Fourier transformation (STFT) and continuous wavelet transform (CWT) as well as augmented raw signals. The best average MSE value of 0.3066 was obtained by the first time-proposed BiLSTM-based WSST-Net model. Our results demonstrated the WSST-Net model significantly improves artifact removal performance compared to traditional TF and raw signal methods. Also, the proposed EOG removal approach reveals that it outperforms many conventional and DL-based ocular artifact removal methods in the literature.
    Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations. (arXiv:2209.11905v1 [cs.SD])
    To address the monaural speech enhancement problem, numerous research studies have been conducted to enhance speech via operations either in time-domain on the inner-domain learned from the speech mixture or in time--frequency domain on the fixed full-band short time Fourier transform (STFT) spectrograms. Very recently, a few studies on sub-band based speech enhancement have been proposed. By enhancing speech via operations on sub-band spectrograms, those studies demonstrated competitive performances on the benchmark dataset of DNS2020. Despite attractive, this new research direction has not been fully explored and there is still room for improvement. As such, in this study, we delve into the latest research direction and propose a sub-band based speech enhancement system with perceptually-motivated optimization and dual transformations, called PT-FSE. Specially, our proposed PT-FSE model improves its backbone, a full-band and sub-band fusion model, by three efforts. First, we design a frequency transformation module that aims to strengthen the global frequency correlation. Then a temporal transformation is introduced to capture long range temporal contexts. Lastly, a novel loss, with leverage of properties of human auditory perception, is proposed to facilitate the model to focus on low frequency enhancement. To validate the effectiveness of our proposed model, extensive experiments are conducted on the DNS2020 dataset. Experimental results show that our PT-FSE system achieves substantial improvements over its backbone, but also outperforms the current state-of-the-art while being 27\% smaller than the SOTA. With average NB-PESQ of 3.57 on the benchmark dataset, our system offers the best speech enhancement results reported till date.
    Tiered Pruning for Efficient Differentialble Inference-Aware Neural Architecture Search. (arXiv:2209.11785v1 [cs.LG])
    We propose three novel pruning techniques to improve the cost and results of inference-aware Differentiable Neural Architecture Search (DNAS). First, we introduce , a stochastic bi-path building block for DNAS, which can search over inner hidden dimensions with memory and compute complexity. Second, we present an algorithm for pruning blocks within a stochastic layer of the SuperNet during the search. Third, we describe a novel technique for pruning unnecessary stochastic layers during the search. The optimized models resulting from the search are called PruNet and establishes a new state-of-the-art Pareto frontier for NVIDIA V100 in terms of inference latency for ImageNet Top-1 image classification accuracy. PruNet as a backbone also outperforms GPUNet and EfficientNet on the COCO object detection task on inference latency relative to mean Average Precision (mAP).
    Two Bicomplex Least Mean Square (BLMS) algorithms. (arXiv:2209.11899v1 [cs.LG])
    We study and introduce new gradient operators in the complex and bicomplex settings, inspired from the well-known Least Mean Square (LMS) algorithm invented in 1960 by Widrow and Hoff for Adaptive Linear Neuron (ADALINE). These gradient operators will be used to formulate new learning rules for the Bicomplex Least Mean Square (BLMS) algorithms. This approach extends both the classical real and complex LMS algorithms.
    A Deep Learning Approach to Analyzing Continuous-Time Systems. (arXiv:2209.12128v1 [cs.LG])
    Scientists often use observational time series data to study complex natural processes, from climate change to civil conflict to brain activity. But regression analyses of these data often assume simplistic dynamics. Recent advances in deep learning have yielded startling improvements to the performance of models of complex processes, from speech comprehension to nuclear physics to competitive gaming. But deep learning is generally not used for scientific analysis. Here, we bridge this gap by showing that deep learning can be used, not just to imitate, but to analyze complex processes, providing flexible function approximation while preserving interpretability. Our approach -- the continuous-time deconvolutional regressive neural network (CDRNN) -- relaxes standard simplifying assumptions (e.g., linearity, stationarity, and homoscedasticity) that are implausible for many natural systems and may critically affect the interpretation of data. We evaluate CDRNNs on incremental human language processing, a domain with complex continuous dynamics. We demonstrate dramatic improvements to predictive likelihood in behavioral and neuroimaging data, and we show that CDRNNs enable flexible discovery of novel patterns in exploratory analyses, provide robust control of possible confounds in confirmatory analyses, and open up research questions that are otherwise hard to study using observational data.  ( 2 min )
    Mental arithmetic task classification with convolutional neural network based on spectral-temporal features from EEG. (arXiv:2209.11767v1 [eess.SP])
    In recent years, neuroscientists have been interested to the development of brain-computer interface (BCI) devices. Patients with motor disorders may benefit from BCIs as a means of communication and for the restoration of motor functions. Electroencephalography (EEG) is one of most used for evaluating the neuronal activity. In many computer vision applications, deep neural networks (DNN) show significant advantages. Towards to ultimate usage of DNN, we present here a shallow neural network that uses mainly two convolutional neural network (CNN) layers, with relatively few parameters and fast to learn spectral-temporal features from EEG. We compared this models to three other neural network models with different depths applied to a mental arithmetic task using eye-closed state adapted for patients suffering from motor disorders and a decline in visual functions. Experimental results showed that the shallow CNN model outperformed all the other models and achieved the highest classification accuracy of 90.68%. It's also more robust to deal with cross-subject classification issues: only 3% standard deviation of accuracy instead of 15.6% from conventional method.
    Multistage Large Segment Imputation Framework Based on Deep Learning and Statistic Metrics. (arXiv:2209.11766v1 [cs.LG])
    Missing value is a very common and unavoidable problem in sensors, and researchers have made numerous attempts for missing value imputation, particularly in deep learning models. However, for real sensor data, the specific data distribution and data periods are rarely considered, making it difficult to choose the appropriate evaluation indexes and models for different sensors. To address this issue, this study proposes a multistage imputation framework based on deep learning with adaptability for missing value imputation. The model presents a mixture measurement index of low- and higher-order statistics for data distribution and a new perspective on data imputation performance metrics, which is more adaptive and effective than the traditional mean squared error. A multistage imputation strategy and dynamic data length are introduced into the imputation process for data periods. Experimental results on different types of sensor data show that the multistage imputation strategy and the mixture index are superior and that the effect of missing value imputation has been improved to some extent, particularly for the large segment imputation problem. The codes and experimental results have been uploaded to GitHub.
    ALLSH: Active Learning Guided by Local Sensitivity and Hardness. (arXiv:2205.04980v2 [cs.CL] UPDATED)
    Active learning, which effectively collects informative unlabeled data for annotation, reduces the demand for labeled data. In this work, we propose to retrieve unlabeled samples with a local sensitivity and hardness-aware acquisition function. The proposed method generates data copies through local perturbations and selects data points whose predictive likelihoods diverge the most from their copies. We further empower our acquisition function by injecting the select-worst case perturbation. Our method achieves consistent gains over the commonly used active learning strategies in various classification tasks. Furthermore, we observe consistent improvements over the baselines on the study of prompt selection in prompt-based few-shot learning. These experiments demonstrate that our acquisition guided by local sensitivity and hardness can be effective and beneficial for many NLP tasks.
    GDA-HIN: A Generalized Domain Adaptive Model across Heterogeneous Information Networks. (arXiv:2012.05688v3 [cs.LG] UPDATED)
    Domain adaptation using graph-structured networks learns label-discriminative and network-invariant node embeddings by sharing graph parameters. Most existing works focus on domain adaptation of homogeneous networks. The few works that study heterogeneous cases only consider shared node types but ignore private node types in individual networks. However, for given source and target heterogeneous networks, they generally contain shared and private node types, where private types bring an extra challenge for graph domain adaptation. In this paper, we investigate Heterogeneous Information Networks (HINs) with both shared and private node types and propose a Generalized Domain Adaptive model across HINs (GDA-HIN) to handle the domain shift between them. GDA-HIN can not only align the distribution of identical-type nodes and edges in two HINs but also make full use of different-type nodes and edges to improve the performance of knowledge transfer. Extensive experiments on several datasets demonstrate that GDA-HIN can outperform state-of-the-art methods in various domain adaptation tasks across heterogeneous networks.
    Uniform Complexity for Text Generation. (arXiv:2204.05185v2 [cs.CL] UPDATED)
    Large pre-trained language models have shown promising results in a wide array of tasks such as narrative generation, question answering, and machine translation. Likewise, the current trend in literature has deeply focused on controlling salient properties of generated texts including sentiment, topic, and coherence to produce more human-like outputs. In this work, we introduce Uniform Complexity for Text Generation or UCTG which serves as a challenge to make existing models generate uniformly complex text with respect to inputs or prompts used. For example, if the reading level of an input text prompt is appropriate for low-leveled learners (ex. A2 in the CEFR), then the generated text by an NLG system should also assume this particular level for increased readability. In a controlled narrative generation task, we surveyed over 160 linguistic and cognitively-motivated features for evaluating text readability and found out that GPT-2 models and even humans struggle in preserving the linguistic complexity of input prompts used. Ultimately, we lay down potential methods and approaches which can be incorporated into the general framework of steering language models towards addressing this important challenge.
    Generalized Permutants and Graph GENEOs. (arXiv:2206.14798v2 [math.CO] UPDATED)
    In this paper we establish a bridge between Topological Data Analysis and Geometric Deep Learning, adapting the topological theory of group equivariant non-expansive operators (GENEOs) to act on the space of all graphs weighted on vertices or edges. This is done by showing how the general concept of GENEO can be used to transform graphs and to give information about their structure. This requires the introduction of the new concepts of generalized permutant and generalized permutant measure and the mathematical proof that these concepts allow us to build GENEOs between graphs. An experimental section concludes the paper, illustrating the possible use of our operators to extract information from graphs. This paper is part of a line of research devoted to developing a compositional and geometric theory of GENEOs for Geometric Deep Learning.
    Stochastic Gradient Descent Captures How Children Learn About Physics. (arXiv:2209.12344v1 [cs.LG])
    As children grow older, they develop an intuitive understanding of the physical processes around them. They move along developmental trajectories, which have been mapped out extensively in previous empirical research. We investigate how children's developmental trajectories compare to the learning trajectories of artificial systems. Specifically, we examine the idea that cognitive development results from some form of stochastic optimization procedure. For this purpose, we train a modern generative neural network model using stochastic gradient descent. We then use methods from the developmental psychology literature to probe the physical understanding of this model at different degrees of optimization. We find that the model's learning trajectory captures the developmental trajectories of children, thereby providing support to the idea of development as stochastic optimization.
    From Local to Global: Spectral-Inspired Graph Neural Networks. (arXiv:2209.12054v1 [stat.ML])
    Graph Neural Networks (GNNs) are powerful deep learning methods for Non-Euclidean data. Popular GNNs are message-passing algorithms (MPNNs) that aggregate and combine signals in a local graph neighborhood. However, shallow MPNNs tend to miss long-range signals and perform poorly on some heterophilous graphs, while deep MPNNs can suffer from issues like over-smoothing or over-squashing. To mitigate such issues, existing works typically borrow normalization techniques from training neural networks on Euclidean data or modify the graph structures. Yet these approaches are not well-understood theoretically and could increase the overall computational complexity. In this work, we draw inspirations from spectral graph embedding and propose $\texttt{PowerEmbed}$ -- a simple layer-wise normalization technique to boost MPNNs. We show $\texttt{PowerEmbed}$ can provably express the top-$k$ leading eigenvectors of the graph operator, which prevents over-smoothing and is agnostic to the graph topology; meanwhile, it produces a list of representations ranging from local features to global signals, which avoids over-squashing. We apply $\texttt{PowerEmbed}$ in a wide range of simulated and real graphs and demonstrate its competitive performance, particularly for heterophilous graphs.
    GPatch: Patching Graph Neural Networks for Cold-Start Recommendations. (arXiv:2209.12215v1 [cs.IR])
    Cold start is an essential and persistent problem in recommender systems. State-of-the-art solutions rely on training hybrid models for both cold-start and existing users/items, based on the auxiliary information. Such a hybrid model would compromise the performance of existing users/items, which might make these solutions not applicable in real-worlds recommender systems where the experience of existing users/items must be guaranteed. Meanwhile, graph neural networks (GNNs) have been demonstrated to perform effectively warm (non-cold-start) recommendations. However, they have never been applied to handle the cold-start problem in a user-item bipartite graph. This is a challenging but rewarding task since cold-start users/items do not have links. Besides, it is nontrivial to design an appropriate GNN to conduct cold-start recommendations while maintaining the performance for existing users/items. To bridge the gap, we propose a tailored GNN-based framework (GPatch) that contains two separate but correlated components. First, an efficient GNN architecture -- GWarmer, is designed to model the warm users/items. Second, we construct correlated Patching Networks to simulate and patch GWarmer by conducting cold-start recommendations. Experiments on benchmark and large-scale commercial datasets demonstrate that GPatch is significantly superior in providing recommendations for both existing and cold-start users/items.
    Emb-GAM: an Interpretable and Efficient Predictor using Pre-trained Language Models. (arXiv:2209.11799v1 [cs.AI])
    Deep learning models have achieved impressive prediction performance but often sacrifice interpretability, a critical consideration in high-stakes domains such as healthcare or policymaking. In contrast, generalized additive models (GAMs) can maintain interpretability but often suffer from poor prediction performance due to their inability to effectively capture feature interactions. In this work, we aim to bridge this gap by using pre-trained neural language models to extract embeddings for each input before learning a linear model in the embedding space. The final model (which we call Emb-GAM) is a transparent, linear function of its input features and feature interactions. Leveraging the language model allows Emb-GAM to learn far fewer linear coefficients, model larger interactions, and generalize well to novel inputs (e.g. unseen ngrams in text). Across a variety of natural-language-processing datasets, Emb-GAM achieves strong prediction performance without sacrificing interpretability. All code is made available on Github.
    Toward Smart Doors: A Position Paper. (arXiv:2209.11770v1 [cs.HC])
    Conventional automatic doors cannot distinguish between people wishing to pass through the door and people passing by the door, so they often open unnecessarily. This leads to the need to adopt new systems in both commercial and non-commercial environments: smart doors. In particular, a smart door system predicts the intention of people near the door based on the social context of the surrounding environment and then makes rational decisions about whether or not to open the door. This work proposes the first position paper related to smart doors, without bells and whistles. We first point out that the problem not only concerns reliability, climate control, safety, and mode of operation. Indeed, a system to predict the intention of people near the door also involves a deeper understanding of the social context of the scene through a complex combined analysis of proxemics and scene reasoning. Furthermore, we conduct an exhaustive literature review about automatic doors, providing a novel system formulation. Also, we present an analysis of the possible future application of smart doors, a description of the ethical shortcomings, and legislative issues.
    Privacy-Preserving Online Content Moderation: A Federated Learning Use Case. (arXiv:2209.11843v1 [cs.LG])
    Users are daily exposed to a large volume of harmful content on various social network platforms. One solution is developing online moderation tools using Machine Learning techniques. However, the processing of user data by online platforms requires compliance with privacy policies. Federated Learning (FL) is an ML paradigm where the training is performed locally on the users' devices. Although the FL framework complies, in theory, with the GDPR policies, privacy leaks can still occur. For instance, an attacker accessing the final trained model can successfully perform unwanted inference of the data belonging to the users who participated in the training process. In this paper, we propose a privacy-preserving FL framework for online content moderation that incorporates Differential Privacy (DP). To demonstrate the feasibility of our approach, we focus on detecting harmful content on Twitter - but the overall concept can be generalized to other types of misbehavior. We simulate a text classifier - in FL fashion - which can detect tweets with harmful content. We show that the performance of the proposed FL framework can be close to the centralized approach - for both the DP and non-DP FL versions. Moreover, it has a high performance even if a small number of clients (each with a small number of data points) are available for the FL training. When reducing the number of clients (from 50 to 10) or the data points per client (from 1K to 0.1K), the classifier can still achieve ~81% AUC. Furthermore, we extend the evaluation to four other Twitter datasets that capture different types of user misbehavior and still obtain a promising performance (61% - 80% AUC). Finally, we explore the overhead on the users' devices during the FL training phase and show that the local training does not introduce excessive CPU utilization and memory consumption overhead.
    Are Machine Programming Systems using Right Source-Code Measures to Select Code Repositories?. (arXiv:2209.11946v1 [cs.SE])
    Machine programming (MP) is an emerging field at the intersection of deterministic and probabilistic computing, and it aims to assist software and hardware engineers, among other applications. Along with powerful compute resources, MP systems often rely on vast amount of open-source code to learn interesting properties about code and programming and solve problems in the areas of debugging, code recommendation, auto-completion, etc. Unfortunately, several of the existing MP systems either do not consider quality of code repositories or use atypical quality measures than those typically used in software engineering community to select them. As such, impact of quality of code repositories on the performance of these systems needs to be studied. In this preliminary paper, we evaluate impact of different quality repositories on the performance of a candidate MP system. Towards that objective, we develop a framework, named GitRank, to rank open-source repositories on quality, maintainability, and popularity by leveraging existing research on this topic. We then apply GitRank to evaluate correlation between the quality measures used by the candidate MP system and the quality measures used by our framework. Our preliminary results reveal some correlation between the quality measures used in GitRank and ControlFlag's performance, suggesting that some of the measures used in GitRank are applicable to ControlFlag. But it also raises questions around right quality measures for code repositories used in MP systems. We believe that our findings also generate interesting insights towards code quality measures that affect performance of MP systems.
    PPG2ABP: Translating Photoplethysmogram (PPG) Signals to Arterial Blood Pressure (ABP) Waveforms using Fully Convolutional Neural Networks. (arXiv:2005.01669v2 [eess.SP] UPDATED)
    Cardiovascular diseases are one of the most severe causes of mortality, taking a heavy toll of lives annually throughout the world. The continuous monitoring of blood pressure seems to be the most viable option, but this demands an invasive process, bringing about several layers of complexities. This motivates us to develop a method to predict the continuous arterial blood pressure (ABP) waveform through a non-invasive approach using photoplethysmogram (PPG) signals. In addition we explore the advantage of deep learning as it would free us from sticking to ideally shaped PPG signals only, by making handcrafted feature computation irrelevant, which is a shortcoming of the existing approaches. Thus, we present, PPG2ABP, a deep learning based method, that manages to predict the continuous ABP waveform from the input PPG signal, with a mean absolute error of 4.604 mmHg, preserving the shape, magnitude and phase in unison. However, the more astounding success of PPG2ABP turns out to be that the computed values of DBP, MAP and SBP from the predicted ABP waveform outperforms the existing works under several metrics, despite that PPG2ABP is not explicitly trained to do so.
  • Open

    Concordance based Survival Cobra with regression type weak learners. (arXiv:2209.11919v1 [stat.ML])
    In this paper, we predict conditional survival functions through a combined regression strategy. We take weak learners as different random survival trees. We propose to maximize concordance in the right-censored set up to find the optimal parameters. We explore two approaches, a usual survival cobra and a novel weighted predictor based on the concordance index. Our proposed formulations use two different norms, say, Max-norm and Frobenius norm, to find a proximity set of predictions from query points in the test dataset. We illustrate our algorithms through three different real-life dataset implementations.
    AlphaZero-Inspired Game Learning: Faster Training by Using MCTS Only at Test Time. (arXiv:2204.13307v3 [cs.LG] UPDATED)
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with temporal difference (TD) learning agents. We wrap MCTS for the first time around TD n-tuple networks and we use this wrapping only at test time to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other learning-from-scratch algorithms could only defeat Edax up to level 2).  ( 3 min )
    One-Shot Learning of Stochastic Differential Equations with Computational Graph Completion. (arXiv:2209.12086v1 [stat.ML])
    We consider the problem of learning Stochastic Differential Equations of the form $dX_t = f(X_t)dt+\sigma(X_t)dW_t $ from one sample trajectory. This problem is more challenging than learning deterministic dynamical systems because one sample trajectory only provides indirect information on the unknown functions $f$, $\sigma$, and stochastic process $dW_t$ representing the drift, the diffusion, and the stochastic forcing terms, respectively. We propose a simple kernel-based solution to this problem that can be decomposed as follows: (1) Represent the time-increment map $X_t \rightarrow X_{t+dt}$ as a Computational Graph in which $f$, $\sigma$ and $dW_t$ appear as unknown functions and random variables. (2) Complete the graph (approximate unknown functions and random variables) via Maximum a Posteriori Estimation (given the data) with Gaussian Process (GP) priors on the unknown functions. (3) Learn the covariance functions (kernels) of the GP priors from data with randomized cross-validation. Numerical experiments illustrate the efficacy, robustness, and scope of our method.
    Optimal Binary Classification Beyond Accuracy. (arXiv:2107.01777v3 [math.ST] UPDATED)
    The vast majority of statistical theory on binary classification characterizes performance in terms of accuracy. However, accuracy is known in many cases to poorly reflect the practical consequences of classification error, most famously in imbalanced binary classification, where data are dominated by samples from one of two classes. The first part of this paper derives a novel generalization of the Bayes-optimal classifier from accuracy to any performance metric computed from the confusion matrix. Specifically, this result (a) demonstrates that stochastic classifiers sometimes outperform the best possible deterministic classifier and (b) removes an empirically unverifiable absolute continuity assumption that is poorly understood but pervades existing results. We then demonstrate how to use this generalized Bayes classifier to obtain regret bounds in terms of the error of estimating regression functions under uniform loss. Finally, we use these results to develop some of the first finite-sample statistical guarantees specific to imbalanced binary classification. Specifically, we demonstrate that optimal classification performance depends on properties of class imbalance, such as a novel notion called Uniform Class Imbalance, that have not previously been formalized. We further illustrate these contributions numerically in the case of $k$-nearest neighbor classification
    GDA-HIN: A Generalized Domain Adaptive Model across Heterogeneous Information Networks. (arXiv:2012.05688v3 [cs.LG] UPDATED)
    Domain adaptation using graph-structured networks learns label-discriminative and network-invariant node embeddings by sharing graph parameters. Most existing works focus on domain adaptation of homogeneous networks. The few works that study heterogeneous cases only consider shared node types but ignore private node types in individual networks. However, for given source and target heterogeneous networks, they generally contain shared and private node types, where private types bring an extra challenge for graph domain adaptation. In this paper, we investigate Heterogeneous Information Networks (HINs) with both shared and private node types and propose a Generalized Domain Adaptive model across HINs (GDA-HIN) to handle the domain shift between them. GDA-HIN can not only align the distribution of identical-type nodes and edges in two HINs but also make full use of different-type nodes and edges to improve the performance of knowledge transfer. Extensive experiments on several datasets demonstrate that GDA-HIN can outperform state-of-the-art methods in various domain adaptation tasks across heterogeneous networks.
    Latent Variable Method Demonstrator -- Software for Understanding Multivariate Data Analytics Algorithms. (arXiv:2205.08132v2 [stat.ML] UPDATED)
    The ever-increasing quantity of multivariate process data is driving a need for skilled engineers to analyze, interpret, and build models from such data. Multivariate data analytics relies heavily on linear algebra, optimization, and statistics and can be challenging for students to understand given that most curricula do not have strong coverage in the latter three topics. This article describes interactive software - the Latent Variable Demonstrator (LAVADE) - for teaching, learning, and understanding latent variable methods. In this software, users can interactively compare latent variable methods such as Partial Least Squares (PLS), and Principal Component Regression (PCR) with other regression methods such as Least Absolute Shrinkage and Selection Operator (lasso), Ridge Regression (RR), and Elastic Net (EN). LAVADE helps to build intuition on choosing appropriate methods, hyperparameter tuning, and model coefficient interpretation, fostering a conceptual understanding of the algorithms' differences. The software contains a data generation method and three chemical process datasets, allowing for comparing results of datasets with different levels of complexity. LAVADE is released as open-source software so that others can apply and advance the tool for use in teaching or research.
    Batch size-invariance for policy optimization. (arXiv:2110.00641v3 [cs.LG] UPDATED)
    We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters. Stochastic gradient descent is well-known to have this property at small batch sizes, via the learning rate. However, some policy optimization algorithms (such as PPO) do not have this property, because of how they control the size of policy updates. In this work we show how to make these algorithms batch size-invariant. Our key insight is to decouple the proximal policy (used for controlling policy updates) from the behavior policy (used for off-policy corrections). Our experiments help explain why these algorithms work, and additionally show how they can make more efficient use of stale data.  ( 2 min )
    Consistency of Constrained Spectral Clustering under Graph Induced Fair Planted Partitions. (arXiv:2105.03714v2 [cs.LG] UPDATED)
    Spectral clustering is popular among practitioners and theoreticians alike. While performance guarantees for spectral clustering are well understood, recent studies have focused on enforcing ``fairness'' in clusters, requiring them to be ``balanced'' with respect to a categorical sensitive node attribute (e.g. the race distribution in clusters must match the race distribution in the population). In this paper, we consider a setting where sensitive attributes indirectly manifest in an auxiliary \textit{representation graph} rather than being directly observed. This graph specifies node pairs that can represent each other with respect to sensitive attributes and is observed in addition to the usual \textit{similarity graph}. Our goal is to find clusters in the similarity graph while respecting a new individual-level fairness constraint encoded by the representation graph. We develop variants of unnormalized and normalized spectral clustering for this task and analyze their performance under a \emph{fair} planted partition model induced by the representation graph. This model uses both the cluster membership of the nodes and the structure of the representation graph to generate random similarity graphs. To the best of our knowledge, these are the first consistency results for constrained spectral clustering under an individual-level fairness constraint. Numerical results corroborate our theoretical findings.  ( 3 min )
    Deep Network Approximation: Achieving Arbitrary Accuracy with Fixed Number of Neurons. (arXiv:2107.02397v7 [cs.LG] UPDATED)
    This paper develops simple feed-forward neural networks that achieve the universal approximation property for all continuous functions with a fixed finite number of neurons. These neural networks are simple because they are designed with a simple, computable, and continuous activation function $\sigma$ leveraging a triangular-wave function and the softsign function. We first prove that $\sigma$-activated networks with width $36d(2d+1)$ and depth $11$ can approximate any continuous function on a $d$-dimensional hypercube within an arbitrarily small error. Hence, for supervised learning and its related regression problems, the hypothesis space generated by these networks with a size not smaller than $36d(2d+1)\times 11$ is dense in the continuous function space $C([a,b]^d)$ and therefore dense in the Lebesgue spaces $L^p([a,b]^d)$ for $p\in [1,\infty)$. Furthermore, we show that classification functions arising from image and signal classification are in the hypothesis space generated by $\sigma$-activated networks with width $36d(2d+1)$ and depth $12$ when there exist pairwise disjoint bounded closed subsets of $\mathbb{R}^d$ such that the samples of the same class are located in the same subset. Finally, we use numerical experimentation to show that replacing the rectified linear unit (ReLU) activation function by ours would improve the experiment results.  ( 3 min )
    Probabilistic combination of eigenlungs-based classifiers for COVID-19 diagnosis in chest CT images. (arXiv:2103.02961v2 [eess.IV] UPDATED)
    The outbreak of the COVID-19 (Coronavirus disease 2019) pandemic has changed the world. According to the World Health Organization (WHO), there have been more than 100 million confirmed cases of COVID-19, including more than 2.4 million deaths. It is extremely important the early detection of the disease, and the use of medical imaging such as chest X-ray (CXR) and chest Computed Tomography (CCT) have proved to be an excellent solution. However, this process requires clinicians to do it within a manual and time-consuming task, which is not ideal when trying to speed up the diagnosis. In this work, we propose an ensemble classifier based on probabilistic Support Vector Machine (SVM) in order to identify pneumonia patterns while providing information about the reliability of the classification. Specifically, each CCT scan is divided into cubic patches and features contained in each one of them are extracted by applying kernel PCA. The use of base classifiers within an ensemble allows our system to identify the pneumonia patterns regardless of their size or location. Decisions of each individual patch are then combined into a global one according to the reliability of each individual classification: the lower the uncertainty, the higher the contribution. Performance is evaluated in a real scenario, yielding an accuracy of 97.86%. The large performance obtained and the simplicity of the system (use of deep learning in CCT images would result in a huge computational cost) evidence the applicability of our proposal in a real-world environment.  ( 3 min )
    A unified framework for dataset shift diagnostics. (arXiv:2205.08340v2 [stat.ML] UPDATED)
    Most machine learning (ML) methods assume that the data used in the training phase comes from the target population. However, in practice one often faces dataset shift, which, if not properly taken into account, may decrease the predictive performance of the ML models. In general, if the practitioner knows which type of shift is taking place -- e.g., covariate shift or label shift -- they may apply transfer learning methods to obtain better predictions. Unfortunately, current methods for detecting shift are only designed to detect specific types of shift or cannot formally test their presence. We introduce a general and unified framework that gives insights on how to improve prediction methods by detecting the presence of different types of shift and quantifying how strong they are. Our approach can be used for any data type (tabular/image/text) and both for classification and regression tasks. Moreover, it uses formal hypotheses tests that controls false alarms. We illustrate how our framework is useful in practice using both artificial and real datasets, including an example of how our framework leads to insights that indeed improve the predictive power of a supervised model. Our package for dataset shift detection can be found in https://github.com/felipemaiapolo/detectshift.  ( 3 min )
    Learned Benchmarks for Subseasonal Forecasting. (arXiv:2109.10399v2 [physics.ao-ph] UPDATED)
    We benchmark a subseasonal forecasting toolkit of simple learned models that outperform both operational practice and state-of-the-art machine learning and deep learning methods. These models, introduced by Mouatadid et al. (2022), include (a) Climatology++, an adaptive alternative to climatology that, for precipitation, is 9% more accurate and 250% more skillful than the United States operational Climate Forecasting System (CFSv2); (b) CFSv2++, a learned CFSv2 correction that improves temperature and precipitation accuracy by 7-8% and skill by 50-275%; and (c) Persistence++, an augmented persistence model that combines CFSv2 forecasts with lagged measurements to improve temperature and precipitation accuracy by 6-9% and skill by 40-130%. Across the contiguous U.S., the Climatology++, CFSv2++, and Persistence++ toolkit consistently outperforms standard meteorological baselines, state-of-the-art machine and deep learning methods, and the European Centre for Medium-Range Weather Forecasts ensemble.  ( 2 min )
    Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem. (arXiv:2209.12013v1 [cs.LG])
    Bandits with knapsacks (BwK) is an influential model of sequential decision-making under uncertainty that incorporates resource consumption constraints. In each round, the decision-maker observes an outcome consisting of a reward and a vector of nonnegative resource consumptions, and the budget of each resource is decremented by its consumption. In this paper we introduce a natural generalization of the stochastic BwK problem that allows non-monotonic resource utilization. In each round, the decision-maker observes an outcome consisting of a reward and a vector of resource drifts that can be positive, negative or zero, and the budget of each resource is incremented by its drift. Our main result is a Markov decision process (MDP) policy that has constant regret against a linear programming (LP) relaxation when the decision-maker knows the true outcome distributions. We build upon this to develop a learning algorithm that has logarithmic regret against the same LP relaxation when the decision-maker does not know the true outcome distributions. We also present a reduction from BwK to our model that shows our regret bound matches existing results.
    From Local to Global: Spectral-Inspired Graph Neural Networks. (arXiv:2209.12054v1 [stat.ML])
    Graph Neural Networks (GNNs) are powerful deep learning methods for Non-Euclidean data. Popular GNNs are message-passing algorithms (MPNNs) that aggregate and combine signals in a local graph neighborhood. However, shallow MPNNs tend to miss long-range signals and perform poorly on some heterophilous graphs, while deep MPNNs can suffer from issues like over-smoothing or over-squashing. To mitigate such issues, existing works typically borrow normalization techniques from training neural networks on Euclidean data or modify the graph structures. Yet these approaches are not well-understood theoretically and could increase the overall computational complexity. In this work, we draw inspirations from spectral graph embedding and propose $\texttt{PowerEmbed}$ -- a simple layer-wise normalization technique to boost MPNNs. We show $\texttt{PowerEmbed}$ can provably express the top-$k$ leading eigenvectors of the graph operator, which prevents over-smoothing and is agnostic to the graph topology; meanwhile, it produces a list of representations ranging from local features to global signals, which avoids over-squashing. We apply $\texttt{PowerEmbed}$ in a wide range of simulated and real graphs and demonstrate its competitive performance, particularly for heterophilous graphs.
    Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. (arXiv:2106.15013v4 [cs.LG] UPDATED)
    Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and generalization and in particular the critical role of small random initialization are not fully understood. In this paper, we take a step towards demystifying this role by proving that small random initialization followed by a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, also puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Concretely, we focus on the problem of reconstructing a low-rank matrix from a few measurements via a natural nonconvex formulation. In this setting, we show that the trajectory of the gradient descent iterations from small random initialization can be approximately decomposed into three phases: (I) a spectral or alignment phase where we show that that the iterates have an implicit spectral bias akin to spectral initialization allowing us to show that at the end of this phase the column space of the iterates and the underlying low-rank matrix are sufficiently aligned, (II) a saddle avoidance/refinement phase where we show that the trajectory of the gradient iterates moves away from certain degenerate saddle points, and (III) a local refinement phase where we show that after avoiding the saddles the iterates converge quickly to the underlying low-rank matrix. Underlying our analysis are insights for the analysis of overparameterized nonconvex optimization schemes that may have implications for computational problems beyond low-rank reconstruction.
    Identifying latent activity behaviors and lifestyles using mobility data to describe urban dynamics. (arXiv:2209.12095v1 [physics.soc-ph])
    Urbanization and its problems require an in-depth and comprehensive understanding of urban dynamics, especially the complex and diversified lifestyles in modern cities. Digitally acquired data can accurately capture complex human activity, but it lacks the interpretability of demographic data. In this paper, we study a privacy-enhanced dataset of the mobility visitation patterns of 1.2 million people to 1.1 million places in 11 metro areas in the U.S. to detect the latent mobility behaviors and lifestyles in the largest American cities. Despite the considerable complexity of mobility visitations, we found that lifestyles can be automatically decomposed into only 12 latent interpretable activity behaviors on how people combine shopping, eating, working, or using their free time. Rather than describing individuals with a single lifestyle, we find that city dwellers' behavior is a mixture of those behaviors. Those detected latent activity behaviors are equally present across cities and cannot be fully explained by main demographic features. Finally, we find those latent behaviors are associated with dynamics like experienced income segregation, transportation, or healthy behaviors in cities, even after controlling for demographic features. Our results signal the importance of complementing traditional census data with activity behaviors to understand urban dynamics.
    Statistical Learning for Individualized Asset Allocation. (arXiv:2201.07998v2 [stat.ML] UPDATED)
    We establish a high-dimensional statistical learning framework for individualized asset allocation. Our proposed methodology addresses continuous-action decision-making with a large number of characteristics. We develop a discretization approach to model the effect of continuous actions and allow the discretization frequency to be large and diverge with the number of observations. The value function of continuous-action is estimated using penalized regression with our proposed generalized penalties that are imposed on linear transformations of the model coefficients. We show that our proposed Discretization and Regression with generalized fOlded concaVe penalty on Effect discontinuity (DROVE) approach enjoys desirable theoretical properties and allows for statistical inference of the optimal value associated with optimal decision-making. Empirically, the proposed framework is exercised with the Health and Retirement Study data in finding individualized optimal asset allocation. The results show that our individualized optimal strategy improves the population financial well-being.
    Two-Tailed Averaging: Anytime Adaptive Once-in-a-while Optimal Iterate Averaging for Stochastic Optimization. (arXiv:2209.12581v1 [stat.ML])
    Tail averaging improves on Polyak averaging's non-asymptotic behaviour by excluding a number of leading iterates of stochastic optimization from its calculations. In practice, with a finite number of optimization steps and a learning rate that cannot be annealed to zero, tail averaging can get much closer to a local minimum point of the training loss than either the individual iterates or the Polyak average. However, the number of leading iterates to ignore is an important hyperparameter, and starting averaging too early or too late leads to inefficient use of resources or suboptimal solutions. Setting this hyperparameter to improve generalization is even more difficult, especially in the presence of other hyperparameters and overfitting. Furthermore, before averaging starts, the loss is only weakly informative of the final performance, which makes early stopping unreliable. To alleviate these problems, we propose an anytime variant of tail averaging, that has no hyperparameters and approximates the optimal tail at all optimization steps. Our algorithm is based on two running averages with adaptive lengths bounded in terms of the optimal tail length, one of which achieves approximate optimality with some regularity. Requiring only the additional storage for two sets of weights and periodic evaluation of the loss, the proposed two-tailed averaging algorithm is a practical and widely applicable method for improving stochastic optimization.
    Graph Rationalization with Environment-based Augmentations. (arXiv:2206.02886v2 [cs.LG] UPDATED)
    Rationale is defined as a subset of input features that best explains or supports the prediction by machine learning models. Rationale identification has improved the generalizability and interpretability of neural networks on vision and language data. In graph applications such as molecule and polymer property prediction, identifying representative subgraph structures named as graph rationales plays an essential role in the performance of graph neural networks. Existing graph pooling and/or distribution intervention methods suffer from lack of examples to learn to identify optimal graph rationales. In this work, we introduce a new augmentation operation called environment replacement that automatically creates virtual data examples to improve rationale identification. We propose an efficient framework that performs rationale-environment separation and representation learning on the real and augmented examples in latent spaces to avoid the high complexity of explicit graph decoding and encoding. Comparing against recent techniques, experiments on seven molecular and four polymer real datasets demonstrate the effectiveness and efficiency of the proposed augmentation-based graph rationalization framework.
    Applying Machine Learning to Life Insurance: some knowledge sharing to master it. (arXiv:2209.02057v2 [stat.ML] UPDATED)
    Machine Learning permeates many industries, which brings new source of benefits for companies. However within the life insurance industry, Machine Learning is not widely used in practice as over the past years statistical models have shown their efficiency for risk assessment. Thus insurers may face difficulties to assess the value of the artificial intelligence. Focusing on the modification of the life insurance industry over time highlights the stake of using Machine Learning for insurers and benefits that it can bring by unleashing data value. This paper reviews traditional actuarial methodologies for survival modeling and extends them with Machine Learning techniques. It points out differences with regular machine learning models and emphasizes importance of specific implementations to face censored data with machine learning models family.In complement to this article, a Python library has been developed. Different open-source Machine Learning algorithms have been adjusted to adapt the specificities of life insurance data, namely censoring and truncation. Such models can be easily applied from this SCOR library to accurately model life insurance risks.
    Doubly Fair Dynamic Pricing. (arXiv:2209.11837v1 [cs.LG])
    We study the problem of online dynamic pricing with two types of fairness constraints: a "procedural fairness" which requires the proposed prices to be equal in expectation among different groups, and a "substantive fairness" which requires the accepted prices to be equal in expectation among different groups. A policy that is simultaneously procedural and substantive fair is referred to as "doubly fair". We show that a doubly fair policy must be random to have higher revenue than the best trivial policy that assigns the same price to different groups. In a two-group setting, we propose an online learning algorithm for the 2-group pricing problems that achieves $\tilde{O}(\sqrt{T})$ regret, zero procedural unfairness and $\tilde{O}(\sqrt{T})$ substantive unfairness over $T$ rounds of learning. We also prove two lower bounds showing that these results on regret and unfairness are both information-theoretically optimal up to iterated logarithmic factors. To the best of our knowledge, this is the first dynamic pricing algorithm that learns to price while satisfying two fairness constraints at the same time.
    A connection between probability, physics and neural networks. (arXiv:2209.12737v1 [stat.ML])
    We illustrate an approach that can be exploited for constructing neural networks which a priori obey physical laws. We start with a simple single-layer neural network (NN) but refrain from choosing the activation functions yet. Under certain conditions and in the infinite-width limit, we may apply the central limit theorem, upon which the NN output becomes Gaussian. We may then investigate and manipulate the limit network by falling back on Gaussian process (GP) theory. It is observed that linear operators acting upon a GP again yield a GP. This also holds true for differential operators defining differential equations and describing physical laws. If we demand the GP, or equivalently the limit network, to obey the physical law, then this yields an equation for the covariance function or kernel of the GP, whose solution equivalently constrains the model to obey the physical law. The central limit theorem then suggests that NNs can be constructed to obey a physical law by choosing the activation functions such that they match a particular kernel in the infinite-width limit. The activation functions constructed in this way guarantee the NN to a priori obey the physics, up to the approximation error of non-infinite network width. Simple examples of the homogeneous 1D-Helmholtz equation are discussed and compared to naive kernels and activations.
    RORL: Robust Offline Reinforcement Learning via Conservative Smoothing. (arXiv:2206.02829v2 [cs.LG] UPDATED)
    Offline reinforcement learning (RL) provides a promising direction to exploit the massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these OOD states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.
    Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret. (arXiv:2205.12418v2 [cs.LG] UPDATED)
    We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies $\pi^{\text{O}}$ and $\pi^{\text{E}}$: $\pi^{\text{O}}$ ("O" for "online") interacts with more risk-tolerant users from the first tier and minimizes regret by balancing exploration and exploitation as usual, while $\pi^{\text{E}}$ ("E" for "exploit") exclusively focuses on exploitation for risk-averse users from the second tier utilizing the data collected so far. An important question is whether such a separation yields advantages over the standard online setting (i.e., $\pi^{\text{E}}=\pi^{\text{O}}$) for the risk-averse users. We individually consider the gap-independent vs.~gap-dependent settings. For the former, we prove that the separation is indeed not beneficial from a minimax perspective. For the latter, we show that if choosing Pessimistic Value Iteration as the exploitation algorithm to produce $\pi^{\text{E}}$, we can achieve a constant regret for risk-averse users independent of the number of episodes $K$, which is in sharp contrast to the $\Omega(\log K)$ regret for any online RL algorithms in the same setting, while the regret of $\pi^{\text{O}}$ (almost) maintains its online regret optimality and does not need to compromise for the success of $\pi^{\text{E}}$.
    Tighter Variational Bounds are Not Necessarily Better. A Research Report on Implementation, Ablation Study, and Extensions. (arXiv:2209.11875v1 [stat.ML])
    This report explains, implements and extends the works presented in "Tighter Variational Bounds are Not Necessarily Better" (T Rainforth et al., 2018). We provide theoretical and empirical evidence that increasing the number of importance samples $K$ in the importance weighted autoencoder (IWAE) (Burda et al., 2016) degrades the signal-to-noise ratio (SNR) of the gradient estimator in the inference network and thereby affecting the full learning process. In other words, even though increasing $K$ decreases the standard deviation of the gradients, it also reduces the magnitude of the true gradient faster, thereby increasing the relative variance of the gradient updates. Extensive experiments are performed to understand the importance of $K$. These experiments suggest that tighter variational bounds are beneficial for the generative network, whereas looser bounds are preferable for the inference network. With these insights, three methods are implemented and studied: the partially importance weighted autoencoder (PIWAE), the multiply importance weighted autoencoder (MIWAE) and the combination importance weighted autoencoder (CIWAE). Each of these three methods entails IWAE as a special case but employs the importance weights in different ways to ensure a higher SNR of the gradient estimators. In our research study and analysis, the efficacy of these algorithms is tested on multiple datasets such as MNIST and Omniglot. Finally, we demonstrate that the three presented IWAE variations are able to generate approximate posterior distributions that are much closer to the true posterior distribution than for the IWAE, while matching the performance of the IWAE generative network or potentially outperforming it in the case of PIWAE.
    Descriptive vs. inferential community detection in networks: pitfalls, myths, and half-truths. (arXiv:2112.00183v6 [physics.soc-ph] UPDATED)
    Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods that are actually used in practice in a variety of fields. Here we attempt to address this discrepancy by dividing existing methods according to whether they have a "descriptive" or an "inferential" goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate generative models, and attempt to fit them to data. In this way, they are able to provide insights into the mechanisms of network formation, and separate structure from randomness in a manner supported by statistical evidence. We review how employing descriptive methods with inferential aims is riddled with pitfalls and misleading answers, and thus should be in general avoided. We argue that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in many cases preferred. We attempt to dispel some myths and half-truths often believed when community detection is employed in practice, in an effort to improve both the use of such methods as well as the interpretation of their results.
    Deep Empirical Risk Minimization in finance: looking into the future. (arXiv:2011.09349v3 [stat.ML] UPDATED)
    Many modern computational approaches to classical problems in quantitative finance are formulated as empirical loss minimization (ERM), allowing direct applications of classical results from statistical machine learning. These methods, designed to directly construct the optimal feedback representation of hedging or investment decisions, are analyzed in this framework demonstrating their effectiveness as well as their susceptibility to generalization error. Use of classical techniques shows that over-training renders trained investment decisions to become anticipative, and proves overlearning for large hypothesis spaces. On the other hand, non-asymptotic estimates based on Rademacher complexity show the convergence for sufficiently large training sets. These results emphasize the importance of synthetic data generation and the appropriate calibration of complex models to market data. A numerically studied stylized example illustrates these possibilities, including the importance of problem dimension in the degree of overlearning, and the effectiveness of this approach.
    A Stochastic Variance-Reduced Coordinate Descent Algorithm for Learning Sparse Bayesian Network from Discrete High-Dimensional Data. (arXiv:2108.09501v2 [cs.LG] UPDATED)
    This paper addresses the problem of learning a sparse structure Bayesian network from high-dimensional discrete data. Compared to continuous Bayesian networks, learning a discrete Bayesian network is a challenging problem due to the large parameter space. Although many approaches have been developed for learning continuous Bayesian networks, few approaches have been proposed for the discrete ones. In this paper, we address learning Bayesian networks as an optimization problem and propose a score function which guarantees the learnt structure to be a sparse directed acyclic graph. Besides, we implement a block-wised stochastic coordinate descent algorithm to optimize the score function. Specifically, we use a variance reducing method in our optimization algorithm to make the algorithm work efficiently for high-dimensional data. The proposed approach is applied to synthetic data from well-known benchmark networks. The quality, scalability, and robustness of the constructed network are measured. Compared to some competitive approaches, the results reveal that our algorithm outperforms some of the well-known proposed methods.
    GCF: Generalized Causal Forest for Heterogeneous Treatment Effect Estimation in Online Marketplace. (arXiv:2203.10975v2 [stat.ML] UPDATED)
    Uplift modeling is a rapidly growing approach that utilizes causal inference and machine learning methods to directly estimate the heterogeneous treatment effects, which has been widely applied to various online marketplaces to assist large-scale decision-making in recent years. The existing popular models, like causal forest (CF), are limited to either discrete treatments or posing parametric assumptions on the outcome-treatment relationship that may suffer model misspecification. However, continuous treatments (e.g., price, duration) often arise in marketplaces. To alleviate these restrictions, we use a kernel-based doubly robust estimator to recover the non-parametric dose-response functions that can flexibly model continuous treatment effects. Moreover, we propose a generic distance-based splitting criterion to capture the heterogeneity for the continuous treatments. We call the proposed algorithm generalized causal forest (GCF) as it generalizes the use case of CF to a much broader setting. We show the effectiveness of GCF by deriving the asymptotic property of the estimator and comparing it to popular uplift modeling methods on both synthetic and real-world datasets. We implement GCF on Spark and successfully deploy it into a large-scale online pricing system at a leading ride-sharing company. Online A/B testing results further validate the superiority of GCF.
    Cooperative Online Learning with Feedback Graphs. (arXiv:2106.04982v4 [cs.LG] UPDATED)
    We study the interplay between feedback and communication in a cooperative online learning setting where a network of agents solves a task in which the learners' feedback is determined by an arbitrary graph. We characterize regret in terms of the independence number of the strong product between the feedback graph and the communication network. Our analysis recovers as special cases many previously known bounds for distributed online learning with either expert or bandit feedback. A more detailed version of our results also captures the dependence of the regret on the delay caused by the time the information takes to traverse each graph. Experiments run on synthetic data show that the empirical behavior of our algorithm is consistent with the theoretical results.
    On Variance Estimation of Random Forests. (arXiv:2202.09008v3 [stat.ML] UPDATED)
    Ensemble methods, such as random forests, are popular in applications due to their high predictive accuracy. Existing literature views a random forest prediction as an infinite-order incomplete U-statistic to quantify its uncertainty. However, these methods focus on a small subsampling size of each tree, which is theoretically valid but practically limited. This paper develops an unbiased variance estimator based on incomplete U-statistics, which allows the tree size to be comparable with the overall sample size, making statistical inference possible in a broader range of real applications. Simulation results demonstrate that our estimators enjoy lower bias and more accurate coverage rate without additional computational costs. We also propose a local smoothing procedure to reduce the variation of our estimator, which shows improved numerical performance when the number of trees is relatively small. Further, we investigate the ratio consistency of our proposed variance estimator under specific scenarios. In particular, we develop a new "double U-statistic" formulation to analyze the Hoeffding decomposition of the estimator's variance.
    Random Feature Amplification: Feature Learning and Generalization in Neural Networks. (arXiv:2202.07626v3 [cs.LG] UPDATED)
    In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.
    Approximate Description Length, Covering Numbers, and VC Dimension. (arXiv:2209.12882v1 [cs.LG])
    Recently, Daniely and Granot [arXiv:1910.05697] introduced a new notion of complexity called Approximate Description Length (ADL). They used it to derive novel generalization bounds for neural networks, that despite substantial work, were out of reach for more classical techniques such as discretization, Covering Numbers and Rademacher Complexity. In this paper we explore how ADL relates to classical notions of function complexity such as Covering Numbers and VC Dimension. We find that for functions whose range is the reals, ADL is essentially equivalent to these classical complexity measures. However, this equivalence breaks for functions with high dimensional range.
    Distilling Importance Sampling for Likelihood Free Inference. (arXiv:1910.03632v5 [stat.CO] UPDATED)
    Likelihood-free inference involves inferring parameter values given observed data and a simulator model. The simulator is computer code which takes parameters, performs stochastic calculations, and outputs simulated data. In this work, we view the simulator as a function whose inputs are (1) the parameters and (2) a vector of pseudo-random draws. We attempt to infer all these inputs conditional on the observations. This is challenging as the resulting posterior can be high dimensional and involve strong dependence. We approximate the posterior using normalizing flows, a flexible parametric family of densities. Training data is generated by likelihood-free importance sampling with a large bandwidth value epsilon, which makes the target similar to the prior. The training data is "distilled" by using it to train an updated normalizing flow. The process is iterated, using the updated flow as the importance sampling proposal, and slowly reducing epsilon so the target becomes closer to the posterior. Unlike most other likelihood-free methods, we avoid the need to reduce data to low dimensional summary statistics, and hence can achieve more accurate results. We illustrate our method in two challenging examples on queuing and epidemiology.
    Towards Demystifying Representation Learning with Non-contrastive Self-supervision. (arXiv:2110.04947v2 [cs.LG] UPDATED)
    Non-contrastive methods of self-supervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image. These approaches have achieved remarkable performance in practice, but the theoretical understanding lags behind. Tian et al. 2021 explained why the representation does not collapse to zero, however, how the feature is learned still remains mysterious. In our work, we prove in a linear network, non-contrastive methods learn a desirable projection matrix and also reduce the sample complexity on downstream tasks. Our analysis suggests that weight decay acts as an implicit threshold that discards the features with high variance under data augmentations, and keeps the features with low variance. Inspired by our theory, we design a simpler and more computationally efficient algorithm DirectCopy by removing the eigen-decomposition step in the original DirectPred algorithm in Tian et al. 2021. Our experiments show that DirectCopy rivals or even outperforms DirectPred on STL-10, CIFAR-10, CIFAR-100, and ImageNet.
    Targeted Separation and Convergence with Kernel Discrepancies. (arXiv:2209.12835v1 [stat.ML])
    Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to P. In this article we derive new sufficient and necessary conditions to ensure (i) and (ii). For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels and for controlling convergence with bounded kernels. We use these results on $\mathbb{R}^d$ to substantially broaden the known conditions for KSD separation and convergence control and to develop the first KSDs known to exactly metrize weak convergence to P. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent.
    Interventional Causal Representation Learning. (arXiv:2209.11924v1 [stat.ML])
    The theory of identifiable representation learning aims to build general-purpose methods that extract high-level latent (causal) factors from low-level sensory data. Most existing works focus on identifiable representation learning with observational data, relying on distributional assumptions on latent (causal) factors. However, in practice, we often also have access to interventional data for representation learning. How can we leverage interventional data to help identify high-level latents? To this end, we explore the role of interventional data for identifiable representation learning in this work. We study the identifiability of latent causal factors with and without interventional data, under minimal distributional assumptions on the latents. We prove that, if the true latent variables map to the observed high-dimensional data via a polynomial function, then representation learning via minimizing the standard reconstruction loss of autoencoders identifies the true latents up to affine transformation. If we further have access to interventional data generated by hard $do$ interventions on some of the latents, then we can identify these intervened latents up to permutation, shift and scaling.
    Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks. (arXiv:2205.09653v2 [stat.ML] UPDATED)
    We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These kernel order parameters collectively define the hidden layer activation distribution, the evolution of the neural tangent kernel, and consequently output predictions. We show that the field theory derivation recovers the recursive stochastic process of infinite-width feature learning networks obtained from Yang and Hu (2021) with Tensor Programs . For deep linear networks, these kernels satisfy a set of algebraic matrix equations. For nonlinear networks, we provide an alternating sampling procedure to self-consistently solve for the kernel order parameters. We provide comparisons of the self-consistent solution to various approximation schemes including the static NTK approximation, gradient independence assumption, and leading order perturbation theory, showing that each of these approximations can break down in regimes where general self-consistent solutions still provide an accurate description. Lastly, we provide experiments in more realistic settings which demonstrate that the loss and kernel dynamics of CNNs at fixed feature learning strength is preserved across different widths on a CIFAR classification task.
    Undersampling is a Minimax Optimal Robustness Intervention in Nonparametric Classification. (arXiv:2205.13094v3 [cs.LG] UPDATED)
    While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. This is rather surprising, since undersampling algorithms discard excess majority group data. To understand this phenomenon, we ask if learning is fundamentally constrained by a lack of minority group samples. We prove that this is indeed the case in the setting of nonparametric binary classification. Our results show that in the worst case, an algorithm cannot outperform undersampling unless there is a high degree of overlap between the train and test distributions (which is unlikely to be the case in real-world datasets), or if the algorithm leverages additional structure about the distribution shift. In particular, in the case of label shift we show that there is always an undersampling algorithm that is minimax optimal. In the case of group-covariate shift we show that there is an undersampling algorithm that is minimax optimal when the overlap between the group distributions is small. We also perform an experimental case study on a label shift dataset and find that in line with our theory, the test accuracy of robust neural network classifiers is constrained by the number of minority samples.
    Realizable Learning is All You Need. (arXiv:2111.04746v2 [cs.LG] UPDATED)
    The equivalence of realizable and agnostic learnability is a fundamental phenomenon in learning theory. With variants ranging from classical settings like PAC learning and regression to recent trends such as adversarially robust and private learning, it's surprising that we still lack a unified theory; traditional proofs of the equivalence tend to be disparate, and rely on strong model-specific assumptions like uniform convergence and sample compression. In this work, we give the first model-independent framework explaining the equivalence of realizable and agnostic learnability: a three-line blackbox reduction that simplifies, unifies, and extends our understanding across a wide variety of settings. This includes models with no known characterization of learnability such as learning with arbitrary distributional assumptions or general loss, as well as a host of other popular settings such as robust learning, partial learning, fair learning, and the statistical query model. More generally, we argue that the equivalence of realizable and agnostic learning is actually a special case of a broader phenomenon we call property generalization: any desirable property of a learning algorithm (e.g.\ noise tolerance, privacy, stability) that can be satisfied over finite hypothesis classes extends (possibly in some variation) to any learnable hypothesis class.
    Variational Inference as Iterative Projection in a Bayesian Hilbert Space with Application to Robotic State Estimation. (arXiv:2005.07275v3 [cs.LG] UPDATED)
    Variational Bayesian inference is an important machine-learning tool that finds application from statistics to robotics. The goal is to find an approximate probability density function (PDF) from a chosen family that is in some sense 'closest' to the full Bayesian posterior. Closeness is typically defined through the selection of an appropriate loss functional such as the Kullback-Leibler (KL) divergence. In this paper, we explore a new formulation of variational inference by exploiting the fact that (most) PDFs are members of a Bayesian Hilbert space under careful definitions of vector addition, scalar multiplication and an inner product. We show that, under the right conditions, variational inference based on KL divergence can amount to iterative projection, in the Euclidean sense, of the Bayesian posterior onto a subspace corresponding to the selected approximation family. We work through the details of this general framework for the specific case of the Gaussian approximation family and show the equivalence to another Gaussian variational inference approach. We furthermore discuss the implications for systems that exhibit sparsity, which is handled naturally in Bayesian space, and give an example of a high-dimensional robotic state estimation problem that can be handled as a result. We provide some preliminary examples of how the approach could be applied to non-Gaussian inference and discuss the limitations of the approach in detail to encourage follow-on work along these lines.
    Learning GFlowNets from partial episodes for improved convergence and stability. (arXiv:2209.12782v1 [cs.LG])
    Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD($\lambda$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($\lambda$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths. We show that SubTB($\lambda$) accelerates sampler convergence in previously studied and new environments and enables training GFlowNets in environments with longer action sequences and sparser reward landscapes than what was possible before. We also perform a comparative analysis of stochastic gradient dynamics, shedding light on the bias-variance tradeoff in GFlowNet training and the advantages of subtrajectory balance.
    An Explainable Machine Learning Approach to Visual-Interactive Labeling: A Case Study on Non-communicable Disease Data. (arXiv:2209.12778v1 [cs.LG])
    We introduce a new visual-interactive tool: Explainable Labeling Assistant (XLabel) that takes an explainable machine learning approach to data labeling. The main component of XLabel is the Explainable Boosting Machine (EBM), a predictive model that can calculate the contribution of each input feature towards the final prediction. As a case study, we use XLabel to predict the labels of four non-communicable diseases (NCDs): diabetes, hypertension, chronic kidney disease, and dyslipidemia. We demonstrate that EBM is an excellent choice of predictive model by comparing it against a rule-based and four other machine learning models. By performing 5-fold cross-validation on 427 medical records, EBM's prediction accuracy, precision, and F1-score are greater than 0.95 in all four NCDs. It performed as well as two black-box models and outperformed the other models in these metrics. In an additional experiment, when 40% of the records were intentionally mislabeled, EBM could recall the correct labels of more than 90% of these records.  ( 2 min )
    Self-supervised Denoising via Low-rank Tensor Approximated Convolutional Neural Network. (arXiv:2209.12715v1 [cs.CV])
    Noise is ubiquitous during image acquisition. Sufficient denoising is often an important first step for image processing. In recent decades, deep neural networks (DNNs) have been widely used for image denoising. Most DNN-based image denoising methods require a large-scale dataset or focus on supervised settings, in which single/pairs of clean images or a set of noisy images are required. This poses a significant burden on the image acquisition process. Moreover, denoisers trained on datasets of limited scale may incur over-fitting. To mitigate these issues, we introduce a new self-supervised framework for image denoising based on the Tucker low-rank tensor approximation. With the proposed design, we are able to characterize our denoiser with fewer parameters and train it based on a single image, which considerably improves the model generalizability and reduces the cost of data acquisition. Extensive experiments on both synthetic and real-world noisy images have been conducted. Empirical results show that our proposed method outperforms existing non-learning-based methods (e.g., low-pass filter, non-local mean), single-image unsupervised denoisers (e.g., DIP, NN+BM3D) evaluated on both in-sample and out-sample datasets. The proposed method even achieves comparable performances with some supervised methods (e.g., DnCNN).  ( 2 min )
    Hamiltonian Monte Carlo for efficient Gaussian sampling: long and random steps. (arXiv:2209.12771v1 [stat.ML])
    Hamiltonian Monte Carlo (HMC) is a Markov chain algorithm for sampling from a high-dimensional distribution with density $e^{-f(x)}$, given access to the gradient of $f$. A particular case of interest is that of a $d$-dimensional Gaussian distribution with covariance matrix $\Sigma$, in which case $f(x) = x^\top \Sigma^{-1} x$. We show that HMC can sample from a distribution that is $\varepsilon$-close in total variation distance using $\widetilde{O}(\sqrt{\kappa} d^{1/4} \log(1/\varepsilon))$ gradient queries, where $\kappa$ is the condition number of $\Sigma$. Our algorithm uses long and random integration times for the Hamiltonian dynamics. This contrasts with (and was motivated by) recent results that give an $\widetilde\Omega(\kappa d^{1/2})$ query lower bound for HMC with fixed integration times, even for the Gaussian case.  ( 2 min )
    Shape And Structure Preserving Differential Privacy. (arXiv:2209.12667v1 [stat.ML])
    It is common for data structures such as images and shapes of 2D objects to be represented as points on a manifold. The utility of a mechanism to produce sanitized differentially private estimates from such data is intimately linked to how compatible it is with the underlying structure and geometry of the space. In particular, as recently shown, utility of the Laplace mechanism on a positively curved manifold, such as Kendall's 2D shape space, is significantly influences by the curvature. Focusing on the problem of sanitizing the Fr\'echet mean of a sample of points on a manifold, we exploit the characterisation of the mean as the minimizer of an objective function comprised of the sum of squared distances and develop a K-norm gradient mechanism on Riemannian manifolds that favors values that produce gradients close to the the zero of the objective function. For the case of positively curved manifolds, we describe how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism, and demonstrate this numerically on a dataset of shapes of corpus callosa. Further illustrations of the mechanism's utility on a sphere and the manifold of symmetric positive definite matrices are also presented.  ( 3 min )
    Neural State-Space Modeling with Latent Causal-Effect Disentanglement. (arXiv:2209.12387v1 [cs.LG])
    Despite substantial progress in deep learning approaches to time-series reconstruction, no existing methods are designed to uncover local activities with minute signal strength due to their negligible contribution to the optimization loss. Such local activities however can signify important abnormal events in physiological systems, such as an extra foci triggering an abnormal propagation of electrical waves in the heart. We discuss a novel technique for reconstructing such local activity that, while small in signal strength, is the cause of subsequent global activities that have larger signal strength. Our central innovation is to approach this by explicitly modeling and disentangling how the latent state of a system is influenced by potential hidden internal interventions. In a novel neural formulation of state-space models (SSMs), we first introduce causal-effect modeling of the latent dynamics via a system of interacting neural ODEs that separately describes 1) the continuous-time dynamics of the internal intervention, and 2) its effect on the trajectory of the system's native state. Because the intervention can not be directly observed but have to be disentangled from the observed subsequent effect, we integrate knowledge of the native intervention-free dynamics of a system, and infer the hidden intervention by assuming it to be responsible for differences observed between the actual and hypothetical intervention-free dynamics. We demonstrated a proof-of-concept of the presented framework on reconstructing ectopic foci disrupting the course of normal cardiac electrical propagation from remote observations.  ( 3 min )
    Learning Variational Models with Unrolling and Bilevel Optimization. (arXiv:2209.12651v1 [stat.ML])
    In this paper we consider the problem learning of variational models in the context of supervised learning via risk minimization. Our goal is to provide a deeper understanding of the two approaches of learning of variational models via bilevel optimization and via algorithm unrolling. The former considers the variational model as a lower level optimization problem below the risk minimization problem, while the latter replaces the lower level optimization problem by an algorithm that solves said problem approximately. Both approaches are used in practice, but, unrolling is much simpler from a computational point of view. To analyze and compare the two approaches, we consider a simple toy model, and compute all risks and the respective estimators explicitly. We show that unrolling can be better than the bilevel optimization approach, but also that the performance of unrolling can depend significantly on further parameters, sometimes in unexpected ways: While the stepsize of the unrolled algorithm matters a lot, the number of unrolled iterations only matters if the number is even or odd, and these two cases are notably different.  ( 2 min )
    Bounded Simplex-Structured Matrix Factorization. (arXiv:2209.12638v1 [cs.LG])
    In this paper, we propose a new low-rank matrix factorization model dubbed bounded simplex-structured matrix factorization (BSSMF). Given an input matrix $X$ and a factorization rank $r$, BSSMF looks for a matrix $W$ with $r$ columns and a matrix $H$ with $r$ rows such that $X \approx WH$ where the entries in each column of $W$ are bounded, that is, they belong to given intervals, and the columns of $H$ belong to the probability simplex, that is, $H$ is column stochastic. BSSMF generalizes nonnegative matrix factorization (NMF), and simplex-structured matrix factorization (SSMF). BSSMF is particularly well suited when the entries of the input matrix $X$ belong to a given interval; for example when the rows of $X$ represent images, or $X$ is a rating matrix such as in the Netflix and MovieLens data sets where the entries of $X$ belong to the interval $[1,5]$. The simplex-structured matrix $H$ not only leads to an easily understandable decomposition providing a soft clustering of the columns of $X$, but implies that the entries of each column of $WH$ belong to the same intervals as the columns of $W$. In this paper, we first propose a fast algorithm for BSSMF, even in the presence of missing data in $X$. Then we provide identifiability conditions for BSSMF, that is, we provide conditions under which BSSMF admits a unique decomposition, up to trivial ambiguities. Finally, we illustrate the effectiveness of BSSMF on two applications: extraction of features in a set of images, and the matrix completion problem for recommender systems.  ( 3 min )
    MaxMatch: Semi-Supervised Learning with Worst-Case Consistency. (arXiv:2209.12611v1 [cs.LG])
    In recent years, great progress has been made to incorporate unlabeled data to overcome the inefficiently supervised problem via semi-supervised learning (SSL). Most state-of-the-art models are based on the idea of pursuing consistent model predictions over unlabeled data toward the input noise, which is called consistency regularization. Nonetheless, there is a lack of theoretical insights into the reason behind its success. To bridge the gap between theoretical and practical results, we propose a worst-case consistency regularization technique for SSL in this paper. Specifically, we first present a generalization bound for SSL consisting of the empirical loss terms observed on labeled and unlabeled training data separately. Motivated by this bound, we derive an SSL objective that minimizes the largest inconsistency between an original unlabeled sample and its multiple augmented variants. We then provide a simple but effective algorithm to solve the proposed minimax problem, and theoretically prove that it converges to a stationary point. Experiments on five popular benchmark datasets validate the effectiveness of our proposed method.  ( 2 min )
    Algorithms that Approximate Data Removal: New Results and Limitations. (arXiv:2209.12269v1 [stat.ML])
    We study the problem of deleting user data from machine learning models trained using empirical risk minimization. Our focus is on learning algorithms which return the empirical risk minimizer and approximate unlearning algorithms that comply with deletion requests that come streaming minibatches. Leveraging the infintesimal jacknife, we develop an online unlearning algorithm that is both computationally and memory efficient. Unlike prior memory efficient unlearning algorithms, we target models that minimize objectives with non-smooth regularizers, such as the commonly used $\ell_1$, elastic net, or nuclear norm penalties. We also provide generalization, deletion capacity, and unlearning guarantees that are consistent with state of the art methods. Across a variety of benchmark datasets, our algorithm empirically improves upon the runtime of prior methods while maintaining the same memory requirements and test accuracy. Finally, we open a new direction of inquiry by proving that all approximate unlearning algorithms introduced so far fail to unlearn in problem settings where common hyperparameter tuning methods, such as cross-validation, have been used to select models.  ( 2 min )
    Sampling Constrained Continuous Probability Distributions: A Review. (arXiv:2209.12403v1 [stat.CO])
    The problem of sampling constrained continuous distributions has frequently appeared in many machine/statistical learning models. Many Monte Carlo Markov Chain (MCMC) sampling methods have been adapted to handle different types of constraints on the random variables. Among these methods, Hamilton Monte Carlo (HMC) and the related approaches have shown significant advantages in terms of computational efficiency compared to other counterparts. In this article, we first review HMC and some extended sampling methods, and then we concretely explain three constrained HMC-based sampling methods, reflection, reformulation, and spherical HMC. For illustration, we apply these methods to solve three well-known constrained sampling problems, truncated multivariate normal distributions, Bayesian regularized regression, and nonparametric density estimation. In this review, we also connect constrained sampling with another similar problem in the statistical design of experiments of constrained design space.  ( 2 min )
    Convergence of score-based generative modeling for general data distributions. (arXiv:2209.12381v1 [cs.LG])
    We give polynomial convergence guarantees for denoising diffusion models that do not rely on the data distribution satisfying functional inequalities or strong smoothness assumptions. Assuming a $L^2$-accurate score estimate, we obtain Wasserstein distance guarantees for any distributions of bounded support or sufficiently decaying tails, as well as TV guarantees for distributions with further smoothness assumptions.  ( 2 min )
    Clustering by Direct Optimization of the Medoid Silhouette. (arXiv:2209.12553v1 [cs.LG])
    The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, and provide two fast versions for the direct optimization. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm.  ( 2 min )
    Random graph matching at Otter's threshold via counting chandeliers. (arXiv:2209.12313v1 [cs.DS])
    We propose an efficient algorithm for graph matching based on similarity scores constructed from counting a certain family of weighted trees rooted at each vertex. For two Erd\H{o}s-R\'enyi graphs $\mathcal{G}(n,q)$ whose edges are correlated through a latent vertex correspondence, we show that this algorithm correctly matches all but a vanishing fraction of the vertices with high probability, provided that $nq\to\infty$ and the edge correlation coefficient $\rho$ satisfies $\rho^2>\alpha \approx 0.338$, where $\alpha$ is Otter's tree-counting constant. Moreover, this almost exact matching can be made exact under an extra condition that is information-theoretically necessary. This is the first polynomial-time graph matching algorithm that succeeds at an explicit constant correlation and applies to both sparse and dense graphs. In comparison, previous methods either require $\rho=1-o(1)$ or are restricted to sparse graphs. The crux of the algorithm is a carefully curated family of rooted trees called chandeliers, which allows effective extraction of the graph correlation from the counts of the same tree while suppressing the undesirable correlation between those of different trees.  ( 2 min )
    A Deep Learning Approach to Analyzing Continuous-Time Systems. (arXiv:2209.12128v1 [cs.LG])
    Scientists often use observational time series data to study complex natural processes, from climate change to civil conflict to brain activity. But regression analyses of these data often assume simplistic dynamics. Recent advances in deep learning have yielded startling improvements to the performance of models of complex processes, from speech comprehension to nuclear physics to competitive gaming. But deep learning is generally not used for scientific analysis. Here, we bridge this gap by showing that deep learning can be used, not just to imitate, but to analyze complex processes, providing flexible function approximation while preserving interpretability. Our approach -- the continuous-time deconvolutional regressive neural network (CDRNN) -- relaxes standard simplifying assumptions (e.g., linearity, stationarity, and homoscedasticity) that are implausible for many natural systems and may critically affect the interpretation of data. We evaluate CDRNNs on incremental human language processing, a domain with complex continuous dynamics. We demonstrate dramatic improvements to predictive likelihood in behavioral and neuroimaging data, and we show that CDRNNs enable flexible discovery of novel patterns in exploratory analyses, provide robust control of possible confounds in confirmatory analyses, and open up research questions that are otherwise hard to study using observational data.  ( 2 min )
    Weather2vec: Representation Learning for Causal Inference with Non-Local Confounding in Air Pollution and Climate Studies. (arXiv:2209.12316v1 [cs.LG])
    Estimating the causal effects of a spatially-varying intervention on a spatially-varying outcome may be subject to non-local confounding (NLC), a phenomenon that can bias estimates when the treatments and outcomes of a given unit are dictated in part by the covariates of other nearby units. In particular, NLC is a challenge for evaluating the effects of environmental policies and climate events on health-related outcomes such as air pollution exposure. This paper first formalizes NLC using the potential outcomes framework, providing a comparison with the related phenomenon of causal interference. Then, it proposes a broadly applicable framework, termed "weather2vec", that uses the theory of balancing scores to learn representations of non-local information into a scalar or vector defined for each observational unit, which is subsequently used to adjust for confounding in conjunction with causal inference methods. The framework is evaluated in a simulation study and two case studies on air pollution where the weather is an (inherently regional) known confounder.  ( 2 min )
    Robust Causality and False Attribution in Data-Driven Earth Science Discoveries. (arXiv:2209.12580v1 [stat.AP])
    Causal and attribution studies are essential for earth scientific discoveries and critical for informing climate, ecology, and water policies. However, the current generation of methods needs to keep pace with the complexity of scientific and stakeholder challenges and data availability combined with the adequacy of data-driven methods. Unless carefully informed by physics, they run the risk of conflating correlation with causation or getting overwhelmed by estimation inaccuracies. Given that natural experiments, controlled trials, interventions, and counterfactual examinations are often impractical, information-theoretic methods have been developed and are being continually refined in the earth sciences. Here we show that transfer entropy-based causal graphs, which have recently become popular in the earth sciences with high-profile discoveries, can be spurious even when augmented with statistical significance. We develop a subsample-based ensemble approach for robust causality analysis. Simulated data, and observations in climate and ecohydrology, suggest the robustness and consistency of this approach.  ( 2 min )
    Machine Learning and Artificial Intelligence-Driven Multi-Scale Modeling for High Burnup Accident-Tolerant Fuels for Light Water-Based SMR Applications. (arXiv:2209.12146v1 [eess.SY])
    The concept of small modular reactor has changed the outlook for tackling future energy crises. This new reactor technology is very promising considering its lower investment requirements, modularity, design simplicity, and enhanced safety features. The application of artificial intelligence-driven multi-scale modeling (neutronics, thermal hydraulics, fuel performance, etc.) incorporating Digital Twin and associated uncertainties in the research of small modular reactors is a recent concept. In this work, a comprehensive study is conducted on the multiscale modeling of accident-tolerant fuels. The application of these fuels in the light water-based small modular reactors is explored. This chapter also focuses on the application of machine learning and artificial intelligence in the design optimization, control, and monitoring of small modular reactors. Finally, a brief assessment of the research gap on the application of artificial intelligence to the development of high burnup composite accident-tolerant fuels is provided. Necessary actions to fulfill these gaps are also discussed.  ( 2 min )
    An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem. (arXiv:2209.12108v1 [cs.LG])
    We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully adaptive}$ setting, where the algorithm can make updates after every comparison. The "batched" dueling bandit problem is motivated by large-scale applications like web search ranking and recommendation systems, where performing sequential updates may be infeasible. In this work, we ask: $\textit{is there a solution using only a few adaptive rounds that matches the asymptotic regret bounds of the best sequential algorithms for $K$-armed dueling bandits?}$ We answer this in the affirmative $\textit{under the Condorcet condition}$, a standard setting of the $K$-armed dueling bandit problem. We obtain asymptotic regret of $O(K^2\log^2(K)) + O(K\log(T))$ in $O(\log(T))$ rounds, where $T$ is the time horizon. Our regret bounds nearly match the best regret bounds known in the fully sequential setting under the Condorcet condition. Finally, in computational experiments over a variety of real-world datasets, we observe that our algorithm using $O(\log(T))$ rounds achieves almost the same performance as fully sequential algorithms (that use $T$ rounds).  ( 2 min )
    Capacity dependent analysis for functional online learning algorithms. (arXiv:2209.12198v1 [stat.ML])
    This article provides convergence analysis of online stochastic gradient descent algorithms for functional linear models. Adopting the characterizations of the slope function regularity, the kernel space capacity, and the capacity of the sampling process covariance operator, significant improvement on the convergence rates is achieved. Both prediction problems and estimation problems are studied, where we show that capacity assumption can alleviate the saturation of the convergence rate as the regularity of the target function increases. We show that with properly selected kernel, capacity assumptions can fully compensate for the regularity assumptions for prediction problems (but not for estimation problems). This demonstrates the significant difference between the prediction problems and the estimation problems in functional data analysis.  ( 2 min )
    On Projections to Linear Subspaces. (arXiv:2209.12485v1 [cs.LG])
    The merit of projecting data onto linear subspaces is well known from, e.g., dimension reduction. One key aspect of subspace projections, the maximum preservation of variance (principal component analysis), has been thoroughly researched and the effect of random linear projections on measures such as intrinsic dimensionality still is an ongoing effort. In this paper, we investigate the less explored depths of linear projections onto explicit subspaces of varying dimensionality and the expectations of variance that ensue. The result is a new family of bounds for Euclidean distances and inner products. We showcase the quality of these bounds as well as investigate the intimate relation to intrinsic dimensionality estimation.  ( 2 min )

  • Open

    [D] DreamBooth Stable Diffusion training now possible in 24GB GPUs, and it runs about 2 times faster.
    https://github.com/huggingface/diffusers/pull/554#issuecomment-1258751183 submitted by /u/0x00groot [link] [comments]  ( 88 min )
    [D] Where does end-to-end learning fail?
    Under what conditions does end-to-end learning fail? When does it succeed? Consider the approaches taken by Tesla and Comma.ai to build self driving cars: Comma's thesis is that end-to-end behavior cloning on human drivers is necessary and sufficient. They don't produce any intermediate outputs or use any regularization you might call "semantic", it's merely observation to action (ok, some details on how exactly you get this to work, but that's the basic idea). Wheras Tesla's approach uses ML mostly to build "models" of the world, e.g. "is this voxel occupied", "what's the flow of this voxel", "where should I stop?" etc... (they seem to have become more grounded in geometry and moved away from higher semantics in the last few years from what I can tell). From these models they seem to do some kind of classical planning/control, which still uses possibly learned predictive models of other agents/dynamic objects behavior as constraints. Possibly they're going more in the direction of learning the driving policy from humans, but I haven't seen direct evidence of that (maybe there are already some learned cost functions in the classical planner?). I'd love to hear people's opinions about when one approach is better, please give me your general intuitions. Do we expect the largest networks just meta-learn the appropriate models given enough data and capacity? Is there evidence of this (or some way to prove something like dynamics models emerging inside a large net?) submitted by /u/AristocraticOctopus [link] [comments]  ( 95 min )
    [P] Stable diffusion free discord bot
    Just made this discord bot that you can add to your server for free, it runs on GPUs and takes 10-20s /image. Will keep it free for as long as I can (be gentle). Enjoy! Link and use "/paint": https://discord.com/oauth2/authorize?client_id=1022993363475116082&permissions=2147485696&scope=bot submitted by /u/paulcjh [link] [comments]  ( 105 min )
    [P] Data Labeling for ML Model Retraining with Label Studio
    Data-centric AI doesn't just stop with cleaning and preparing data for model training - there are rich insights to be gleaned from production data. By analyzing, segmenting, and selectively re-labeling your production inference data, you can generate datasets for future model retraining. This talk shows you how you can use human-in-the-loop oversight to generate high-quality, labeled datasets using Label Studio from your prediction data for future model retraining. Watch talk here. Link to Github repo. submitted by /u/modzykirsten [link] [comments]  ( 105 min )
    [D] Language Modeling for Sequence Labeling of Long Text with Specialized Corpus
    Hi everyone, I have a non-standard-class (not PER/ORG etc.) sequence labeling task that I want to tackle with pretrained language models however the task has some attributes that make it difficult to use the standard methods such as using a pretrained NER model thus wanted to see if my planned approach is the best. The task: Sequence Labeling with custom classes (not PERSON, ORG, LOCATION etc.) Very Long Text =~ 20k+ tokens Specialized corpus Tokens that shouldn't be broken down by subword tokenization Handle numerals such as percentages & monetary values Current planned approach: Model Option A: Transformer based long-text LM such as Longformer Edit: Mega has been suggested Option B: Use a biRNN (LSTM/GRU) Language Model Use pretrained embeddings with subword tokenization Data Already labeled with custom classes Open questions: How to add regex-able specialized tokens to the vocab (i.e. prevent tokenizer divide them up into subwords) Can Longformer-like Transformer based long-text LMs be adjusted to have 20k+ max sequence lenght? Do long-text Transformer based LMs have a larger memory footprint than bidirectional RNNs? Which pretrained embeddings to use? How to handle numerals when finetuning the embeddings/LM? Any help & recommendation is greatly appreciated (feel free to reply back with a link to a paper/blog etc that might have some info)! submitted by /u/nottakumasato [link] [comments]  ( 89 min )
    [R], Behavior-Oriented Design
    Behavior-Oriented Design is a methodology for constructing complex agents. Behavior-Oriented Design (BOD) is a development methodology for creating complex, complete agents such as virtual-reality characters, autonomous robots, intelligent tutors, or intelligent environments. BOD agents are modular, but not multi-agent systems. They use hierarchical reactive plans to perform arbitration between their component modules. BOD provides not only architectural specifications for modules and plans but a methodology for building them. The BOD methodology is cyclic, consisting of rules for an initial decomposition and heuristics for revising the specification over the process of development. Behavior-Based Al is an approach that specifies that intelligence should be decomposed along the lines of p…  ( 90 min )
    [N] Announcing the BigCode project - building large language models for code in an open/responsible way
    print("Hello world! 🎉") Excited to announce the BigCode project led by ServiceNow Research and Hugging Face! In the spirit of BigScience we aim to develop large language models for code in an open and responsible way: https://www.bigcode-project.org/ Here a summary of the main goals for the collaboration: 🌸Language models for code (Codex, CodeGen) and the applications they power (AI assisted programming) are gaining traction. Some models have been released, but there are still questions around data governance, robustness of evaluation benchmarks, and the engineering behind them. 📚The first goal of BigCode is to develop and release a dataset large enough to train a state-of-the-art language model for code. We will also ensure that only files from repositories with permissive licen…  ( 108 min )
    [D] Extracting the n-th hidden state from GRU output
    I want to extract the last meaningful hidden state from my GRU. That is, for each sequence in the batch, I want the last hidden state before padding. Currently my code looks like this (Pytorch): output, hidden = GRU(input) last_hidden = hidden[-1] However, the very last hidden state might be watered down for some sequences in the batch that are much smaller than the maximum length for that batch. I cannot use pack_padded_sequence to skip padding because it only works for RNN outputs, not their hidden states. What do I do? submitted by /u/Blutorangensaft [link] [comments]  ( 90 min )
    [D] Are there significant performance benefits to AVX-512?
    I am building a new workstation and am wondering if AMD's inclusion of AVX-512 can improve many machine learning workloads by much, or if it has little effect. My main workloads are DL, boosted trees, sklearn, and some bayesian statistics like R-INLA. Thanks. submitted by /u/onlymagik [link] [comments]  ( 112 min )
    [D] MQTransformer: Not good enough for ICLR but SOTA for Amazon?
    I noticed, that Amazon lists MQTransformer as their most recent (probabilistic) forecasting model, yet the corresponding paper is not accepted to ICLR. I get that the code is lacking (sadly also no results in papers with code), which is a red flag. Nonetheless, this is somehow surprising to me as I assume this is an important topic for Amazon and if they claim to have achieved the best results with this architecture, it should be worth publishing? OpenAI and in some cases Deep Mind also omit the code, although I do not see their SOTA algorithms rejected because of this. Double standard? New and more strict requirements? What are your thoughts on this? submitted by /u/canbooo [link] [comments]  ( 91 min )
    [project] Implementing BERT for sensor data
    Hello, I have three time series and I wish to learn a mapping between two of them and the third. All three of them come from sensors, and the third is most likely to break on real life. At first I tried to treat is as a simple regression problem but it performed not so good on the validation set. Because I want to allow different parts of the time series of the other sensors to have higher importance in predicting a given point of the missing sensor, I thought of using an encoder-decoder framework with attention. Because it is my first time using attention, I would like to discuss whether it is appropriate given the setup I described? Second, I thought of using a simpler version of BERT, but couldn’t find any implementation that describes the layers, but only HuggingFace, within the context of NLP. Is anyone familiar with a clear implementation of such network? Thanks submitted by /u/David202023 [link] [comments]  ( 89 min )
    [P] How can I use ML to find the relationship between multiple input variables and 1 output?
    I'm trying to use Machine Learning to analyse recorded data of chemical reactions (multiple input variables resulting in 1 output), and be able to predict the output when it's told all the inputs. Does anyone know what I can Google? I'm not sure where to start. submitted by /u/CultureImaginary [link] [comments]  ( 91 min )
    [P] Albumentations 1.3 is released (a Python library for image augmentation)
    https://preview.redd.it/vh0tnz0lc7q91.png?width=600&format=png&auto=webp&s=9913b8830a2ae0a621023b864a5499a367d7f114 The new release of a fast and flexible library for image augmentation includes: New augmentations RandomCropFromBorders - Crops image based on indents from image borders BBoxSafeRandomCrop - Crops image without loss of bboxes. Unlike RandomSizedBBoxSafeCrop this implementation does not apply resize to the target size. Spatter - Simulates corruption which can occlude a lens in the form of rain or mud. Defocus - Imitates lens defocusing. ZoomBlur - Imitates lens blur on zooming. Improvements and bug fixes Fixed bugs in RandomBrightnessContrast, Perspective, Affine, Rotate, Compose, RandomSunFlare, and RandomGamma. Release notes Full release notes are available at https://github.com/albumentations-team/albumentations/releases/tag/1.3.0 Installation As always, you can install the latest version of the library by running: pip install -U albumentations submitted by /u/alexparinov [link] [comments]  ( 89 min )
    [P] TikTok subscriber modelling + StyleGAN-based face tiktokifier
    An analysis of TikTok subscriber count. It appears this quantity is highly predictable, and one of the strongest signals is the face of the owner of the channel: https://medium.com/@enryu9000/lookism-in-tiktok-3def0f20cf78 submitted by /u/enryu42 [link] [comments]  ( 111 min )
    [D] Cross-dataset generalisation
    For an NLP task , I have trained my network on dataset D1 but use Dataset D2 for validation and test purpose. Is it correct to call this approach Cross-dataset generalisation OR Cross-dataset generalisation is something else? Edit: Dataset D1 is wikipedia. Dataset D2 is a another text corpus extracted by some other group but also covers multiple domains like wikipedia submitted by /u/7pointsome1 [link] [comments]  ( 88 min )
    [D] RNNs that don't require a fixed sequence length
    Is there any research on RNNs that don't require a fixed sequence length? I'm looking for some key-words for a literature search. I am asking because, in my current project, I have seen quite devastating effects of padding. I would have fun thinking about ways to get around this. Edit: I am aware that you only have to pad to the maximum length within a batch, but that still causes problems if your data vary widely in terms of sequence length. submitted by /u/Blutorangensaft [link] [comments]  ( 93 min )
    [D] How to Create a Fixed-Length, Binary, Sequence of Tokens Embedding?
    Say I have 10 classes represented by 1 x n_classes vector of binary My goal is to embed a sequence of 1xN binary so that I could also model the class-co occurrence. Say, class A, B, D are present and represented as [1, 1, 0, 1, 0, 0, 0, 0, 0, 0] The goal is for the embedding model to produce an embedding for this sequence. sequence = [1, 1, 0, 1, 0, 0, 0, 0, 0, 0] embedding = model(sequence) submitted by /u/sarmientoj24 [link] [comments]  ( 113 min )
    [P] Using GPT-3 to write viral content
    Viral content encompasses three essential principles, Entertainment value Relatable Evokes emotional response Given the fact that GPT-3 is an autoregressive language model, we can use it to transform a given text into viral content. Here is a sample input and output, ​ https://preview.redd.it/amqlx982p5q91.png?width=2240&format=png&auto=webp&s=9014162eccc788e064ae5e431c3488954e30aa89 The GPT-3 prompt I used is, I translate the given boring text into a viral post without compromising the original meaning. Another input I tried, I noticed you elected to cancel the subscription to Elephas. I would like to get your feedback before you go. Your help will be appreciated. The output I got is, We're sorry to see you go! Before you cancel your subscription, could you let us know what we could have done better? Your feedback is appreciated. As you can see, this content is much more engaging than the original. You can use this for writing emails, blogs, and social media posts. If you use a Mac and want to write engaging content like that, take a look at https://elephas.app, it has a built-in feature. Here is an example of me writing a tweet, https://reddit.com/link/xoch8r/video/xby372p6p5q91/player I am curious to know your feedback. submitted by /u/juliarmg [link] [comments]  ( 90 min )
    [R] Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (InSPyReNet)
    We just uploaded the paper and the source code of our work “Revisiting Image Pyramid Structure for High Resolution Salient Object Detection” which will appear in ACCV2022. Our work does not require either high-resolution dataset or high-resolution resizing for training, but produces state-of-the-art results compared to previous high-resolution salient object detection (e.g., PGNet from CVPR2022). Paper: https://arxiv.org/abs/2209.09475 Github: https://github.com/plemeri/InSPyReNet PapersWithCode: https://paperswithcode.com/paper/revisiting-image-pyramid-structure-for-high Here are some demos of our work. Thanks! ​ https://i.redd.it/dpbgxi0am5q91.gif https://i.redd.it/xdsyfk0am5q91.gif submitted by /u/swdsld [link] [comments]  ( 88 min )
    [D] Should I go with Prefect, Argo or Flyte for Model Training and ML workflow orchestration?
    We are building a next-generation ML Platform at my organization. It's a completely greenfield project. So far we've built the model inferencing layer which will be leveraging AWS Sagemaker. For the most part, we are sort of locked into using AWS and so Sagemaker will form the foundation of our ML Platform. Snowflake will be our centralized data warehousing solution. I've now been tasked with looking into model training, experimentation and workflow orchestration and I'm trying to ascertain the right tooling choice for this. This is a big organization with multiple teams of data scientists, each with their own AWS project account so we want a tool that approaches multi-tenancy in a first class manner. Most of the Data Scientists' models are NLP or computer vision models. I believe most teams are using Sagemaker training and Pytorch. I have good experience with Prefect but I've been reading up on Argo and I believe that would be a solid choice because it containerizes tasks so each DAG will have its own packages isolated from other DAGs. But I'm not sure if Argo has good compatibility with Sagemaker. I've seen Sagemaker operators being newly released but the docs only reference Kubeflow, not Argo. I've also recently discovered Flyte which seems to check a lot of boxes. It has amazing compatibility with Sagemaker training and Snowflake, and Deep Learning distributed training. Would appreciate any advice on this. Thanks in advance. submitted by /u/rirhun [link] [comments]  ( 96 min )
    [D] Is there a Loss Function for Binary Multi-Label Task that Could Also Function as an Image Embedding?
    Dataset is a bit confidential but I can provide a scenario that possibly has similar mechanics. Say I am creating an object detector for piano keys. Someone can take an image of a part of the piano/keyboard and get, say 4-6 keys in an image. Also, let us assume, for the sake of discussion, that a key might be missing because idk.... someone took it out. The object detector will then predict the key bounding box and the class (say E1, F1, etc.) Object detectors would work good. Say YOLO will give great results already. But I am trying to improve the detector by incorporating a branch module into the detector to add things such as co-occurence in the mix. As you can see, this is not your typical object detector. While it is multi-class, an object can only appear at most once in the …  ( 91 min )
  • Open

    Data Labeling for ML Model Retraining with Label Studio
    Data-centric AI doesn't just stop with cleaning and preparing data for model training - there are rich insights to be gleaned from production data. By analyzing, segmenting, and selectively re-labeling your production inference data, you can generate datasets for future model retraining. This talk shows you how you can use human-in-the-loop oversight to generate high-quality, labeled datasets using Label Studio from your prediction data for future model retraining. Watch talk here. Link to Github repo. submitted by /u/modzykirsten [link] [comments]  ( 87 min )
    Can AI replace creative jobs?
    submitted by /u/estasfuera [link] [comments]  ( 87 min )
    Who wants access to dalle 2?
    I got some accounts linked to dalle 2, dunno who to give em to soo? Who wants one submitted by /u/Designer-Career6211 [link] [comments]  ( 87 min )
    AI audio is on the rise and will spark new debates about the value of human effort
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 90 min )
    AI Dream 94 - Psycho Mushroom Dance I'm losing it!
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    Making AI Videos with Stable Diffusion and SD Deforum
    submitted by /u/pwillia7 [link] [comments]  ( 91 min )
    Write viral content using GPT-3
    submitted by /u/juliarmg [link] [comments]  ( 96 min )
    How can I use AI to find the relationship between multiple input variables and 1 output?
    I'm trying to use Machine Learning to analyse recorded data of chemical reactions (multiple input variables resulting in 1 output), and be able to predict the output when it's told all the inputs. Does anyone know what I can Google? I'm not sure where to start. submitted by /u/CultureImaginary [link] [comments]  ( 97 min )
    Self-attention for TinyML
    submitted by /u/bendee983 [link] [comments]  ( 91 min )
    A Romantic Wonderland in Stable Diffusion 💞💑 | Artificial Intelligence Slideshow Music Video
    submitted by /u/OceanicFeel [link] [comments]  ( 87 min )
    [Medical Segmentation] The all-in-one 3D medical image segmentation toolkit. From data annotation to model deployment, MedicalSeg is all you need!
    Hello, everyone! We have created an open-source all-in-one 3D medical image segmentation toolkit called MedicalSeg. MedicalSeg supports the whole segmentation process including data labeling, data preprocessing, model training, and model deployment. Major features include: Data preprocessing with 30% acceleration using CuPy. High precision pre-trained models on 5 different organs. High precision models including nnUnet, TransUnet, UNETR, Vnet, and more models are coming soon! 3D visualization demo based on itkwidgets. AI-assisted 3D medical image annotation platform called EISeg-Med3D: With the 3D segmentation model incorporated into the interactive segmentation algorithm, we managed to improve the annotation efficiency by ten times through AI-assisted click interaction! Combined with the machine learning algorithms and manual annotation toolkit, 100% accuracy is right on your hand. Let alone it is user-friendly and your annotation results and progress are saved automatically. The following images demonstrate the segmentation result predicted by MedicalSeg: lung segmentation result Spine Segmentation result EISeg-Med3D label process EISeg-Med3D: https://github.com/PaddlePaddle/PaddleSeg/blob/develop/EISeg/med3d/README_en.md MedicalSeg: https://github.com/PaddlePaddle/PaddleSeg/blob/develop/contrib/MedicalSeg/README.md submitted by /u/Daisy_SUGARFREE [link] [comments]  ( 96 min )
    Benefits of Vertex AI Workbench:
    Exploration and analysis are simple- BigQuery, Dataproc, Spark, and Vertex AI integration simplify data access and machine learning access in the notebook. Model development and rapid prototyping- To go from data to training at scale, take advantage of the potential of unbounded compute with Vertex AI training for exploration and prototyping. Notebook workflows from start to finish- Vertex AI Workbench allows you to centralize your training and deployment procedures on Vertex AI. submitted by /u/Ishan220699 [link] [comments]  ( 87 min )
    Artificial Intelligence (AI) Warfare
    submitted by /u/ingloriousbastard85 [link] [comments]  ( 94 min )
    Tools and Resources for Neuromorphic Computing
    Useful Tools and Resources for learning about Neuromorphic Computing. Table of Contents Getting Started with Neuromorphic Computing Developer Resources Online Training Courses Books YouTube videos Neuromorphic Computing Tools, Libraries, and Frameworks Machine Learning Deep Learning Development submitted by /u/Khaotic_Kernel [link] [comments]  ( 94 min )
    How Humanoid Robots are Already Taking Over Human's Job
    submitted by /u/Remetincaa1 [link] [comments]  ( 90 min )
  • Open

    Q&A: Global challenges surrounding the deployment of AI
    Aleksander Madry, Asu Ozdaglar, and Luis Videgaray, co-chairs of the AI Policy Forum, discuss key issues facing the AI policy landscape today.  ( 6 min )
  • Open

    Introducing self-service quota management and higher default service quotas for Amazon Textract
    Today, we’re excited to announce self-service quota management support for Amazon Textract via the AWS Service Quotas console, and higher default service quotas in select AWS Regions. Customers tell us they need quick turnaround times to process their requests for quota increases and visibility into their service quotas so they may continue to scale their […]  ( 8 min )
  • Open

    Assessing AI system performance: thinking beyond models to deployment contexts
    AI systems are becoming increasingly complex as we move from visionary research to deployable technologies such as self-driving cars, clinical predictive models, and novel accessibility devices. Unlike singular AI models, it is more difficult to assess whether these more complex AI systems are performing consistently and as intended to realize human benefit. How do we […] The post Assessing AI system performance: thinking beyond models to deployment contexts appeared first on Microsoft Research.  ( 12 min )
  • Open

    Convolutional Neural Networks (CNN) and its applications
    An Introduction  ( 8 min )
  • Open

    I taught an agent to solve a maze by asking questions!
    I present Ask Before You Act an RL agent architecture that allows an agent to ask "yes-no" questions to an all-knowing Oracle. This agent successfully learns to navigate a maze significantly outperforming the baseline by asking human-understandable questions about the position of its goal! https://preview.redd.it/84wk4b23m7q91.png?width=1787&format=png&auto=webp&s=e1ce67a0c24f9853c4b7bfab4d2c7186b2c33c28 Paper: https://arxiv.org/abs/2209.04665 GitHub: https://github.com/ser-ge/ask_before_you_act Disclaimer: I am one of the authors submitted by /u/TheNovicePhilomath [link] [comments]  ( 87 min )
    Finrl debugging help
    I've been trying to debug a crypto based DRL portfolio allocation problem for months. My implementation is actually based on the finrl introuctory notbook for portolio allocation :https://github.com/AI4Finance-Foundation/FinRL-Meta/blob/master/tutorials/1-Introduction/FinRL_PortfolioAllocation_NeurIPS_2020.ipynb ​ I opened a github issue which contails has all the necessary debugging information https://github.com/AI4Finance-Foundation/FinRL-Meta/issues/234#issue-1370125176 ​ I was hoping to get some input from people who may have more experience in the field. Please feel free to ask if you require anything additional. submitted by /u/One-Ad-8323 [link] [comments]  ( 87 min )
    🐑🐑🐑 FYI: There is an addictive tile-matching mini-game Sheep A Sheep (羊了个羊), which has become extremely popular in China recently. We have made a deep reinforcement learning version of it, check it out:
    submitted by /u/OpenDILab [link] [comments]  ( 87 min )
  • Open

    Neural networks applied to angular analysis and boat detection
    Hello there ! I'm currently working on a nice project. But I came to a dead end. So I have to determine wether a boat is attached to a mooring buoy by analysing the angular data sent by the buoy. The buoy analyse angles during 20 seconds and it tells me the max angle, the min angle and the average value during these 20 seconds. I got new datas every 4 to 5 minutes. ​ https://preview.redd.it/3gap7yc8m6q91.png?width=1852&format=png&auto=webp&s=2353ec1ddb5ba76d0263f53975b5df65f3a91e92 So you can see the result on this graphana. Typically when the angles are higher that means some boat is attached to the buoy so the "presence" should be at 1. If you're wondering why I'm not just setting a treshold that tell me a boat is here if the angles cross a set value, you can see between the 08/25 and 08/26 that sometimes it goes down even if a boat is attached. The graph under the angle represent the results of a neural network trained with keras. I used the 10 lasts measurements and send them in the NN so there is 30 inputs values, and 1 binary output. I'm a begginer in data science and NN so I would like to know what could be the best methods or neural networks "architectures" to suits my use case. I already tried training with the last 360 samples (so approximately 24h) but then the NN became not responsive at all (see exemple bellow) https://preview.redd.it/0g5ss3geo6q91.png?width=1843&format=png&auto=webp&s=d711b1fb3997fa4e15275e94d0a2a98288981e9a I also tried to use a recurrent neural network but the results not seemed to be better than before. ​ So thanks for poeple brave enough to read this entire messy post and for the few who could have ideas to improve my method. So I would welcome any improvement ideas or advices ^^ submitted by /u/Kdcius [link] [comments]  ( 89 min )
  • Open

    Machine Learning and Analytical Power Consumption Models for 5G Base Stations. (arXiv:2209.11600v1 [cs.NI])
    The energy consumption of the fifth generation(5G) of mobile networks is one of the major concerns of the telecom industry. However, there is not currently an accurate and tractable approach to evaluate 5G base stations (BSs) power consumption. In this article, we propose a novel model for a realistic characterisation of the power consumption of 5G multi-carrier BSs, which builds on a large data collection campaign. At first, we define a machine learning architecture that allows modelling multiple 5G BS products. Then, we exploit the knowledge gathered by this framework to derive a realistic and analytically tractable power consumption model, which can help driving both theoretical analyses as well as feature standardisation, development and optimisation frameworks. Notably, we demonstrate that such model has high precision, and it is able of capturing the benefits of energy saving mechanisms. We believe this analytical model represents a fundamental tool for understanding 5G BSs power consumption, and accurately optimising the network energy efficiency.  ( 2 min )
    Unified Algorithms for RL with Decision-Estimation Coefficients: No-Regret, PAC, and Reward-Free Learning. (arXiv:2209.11745v1 [cs.LG])
    Finding unified complexity measures and algorithms for sample-efficient learning is a central topic of research in reinforcement learning (RL). The Decision-Estimation Coefficient (DEC) is recently proposed by Foster et al. (2021) as a necessary and sufficient complexity measure for sample-efficient no-regret RL. This paper makes progress towards a unified theory for RL with the DEC framework. First, we propose two new DEC-type complexity measures: Explorative DEC (EDEC), and Reward-Free DEC (RFDEC). We show that they are necessary and sufficient for sample-efficient PAC learning and reward-free learning, thereby extending the original DEC which only captures no-regret learning. Next, we design new unified sample-efficient algorithms for all three learning goals. Our algorithms instantiate variants of the Estimation-To-Decisions (E2D) meta-algorithm with a strong and general model estimation subroutine. Even in the no-regret setting, our algorithm E2D-TA improves upon the algorithms of Foster et al. (2021) which require either bounding a variant of the DEC which may be prohibitively large, or designing problem-specific estimation subroutines. As applications, we recover existing and obtain new sample-efficient learning results for a wide range of tractable RL problems using essentially a single algorithm. Finally, as a connection, we re-analyze two existing optimistic model-based algorithms based on Posterior Sampling or Maximum Likelihood Estimation, showing that they enjoy similar regret bounds as E2D-TA under similar structural conditions as the DEC.  ( 3 min )
    Identifying the Context Shift between Test Benchmarks and Production Data. (arXiv:2207.01059v2 [cs.LG] UPDATED)
    Machine learning models are often brittle on production data despite achieving high accuracy on benchmark datasets. Benchmark datasets have traditionally served dual purposes: first, benchmarks offer a standard on which machine learning researchers can compare different methods, and second, benchmarks provide a model, albeit imperfect, of the real world. The incompleteness of test benchmarks (and the data upon which models are trained) hinder robustness in machine learning, enable shortcut learning, and leave models systematically prone to err on out-of-distribution and adversarially perturbed data. The mismatch between a single static benchmark dataset and a production dataset has traditionally been described as a dataset shift. In an effort to clarify how to address the mismatch between test benchmarks and production data, we introduce context shift to describe semantically meaningful changes in the underlying data generation process. Moreover, we identify three methods for addressing context shift that would otherwise lead to model prediction errors: first, we describe how human intuition and expert knowledge can identify semantically meaningful features upon which models systematically fail, second, we detail how dynamic benchmarking - with its focus on capturing the data generation process - can promote generalizability through corroboration, and third, we highlight that clarifying a model's limitations can reduce unexpected errors. Robust machine learning is focused on model performance beyond benchmarks, and as such, we consider three model organism domains - facial expression recognition, deepfake detection, and medical diagnosis - to highlight how implicit assumptions in benchmark tasks lead to errors in practice. By paying close attention to the role of context, researchers can design more comprehensive benchmarks, reduce context shift errors, and increase generalizability.  ( 3 min )
    Power Management in Smart Residential Building with Deep Learning Model for Occupancy Detection by Usage Pattern of Electric Appliances. (arXiv:2209.11520v1 [eess.SP])
    With the growth of smart building applications, occupancy information in residential buildings is becoming more and more significant. In the context of the smart buildings' paradigm, this kind of information is required for a wide range of purposes, including enhancing energy efficiency and occupant comfort. In this study, occupancy detection in residential building is implemented using deep learning based on technical information of electric appliances. To this end, a novel approach of occupancy detection for smart residential building system is proposed. The dataset of electric appliances, sensors, light, and HVAC, which is measured by smart metering system and is collected from 50 households, is used for simulations. To classify the occupancy among datasets, the support vector machine and autoencoder algorithm are used. Confusion matrix is utilized for accuracy, precision, recall, and F1 to demonstrate the comparative performance of the proposed method in occupancy detection. The proposed algorithm achieves occupancy detection using technical information of electric appliances by 95.7~98.4%. To validate occupancy detection data, principal component analysis and the t-distributed stochastic neighbor embedding (t-SNE) algorithm are employed. Power consumption with renewable energy system is reduced to 11.1~13.1% in smart buildings by using occupancy detection.  ( 3 min )
    T3VIP: Transformation-based 3D Video Prediction. (arXiv:2209.11693v1 [cs.CV])
    For autonomous skill acquisition, robots have to learn about the physical rules governing the 3D world dynamics from their own past experience to predict and reason about plausible future outcomes. To this end, we propose a transformation-based 3D video prediction (T3VIP) approach that explicitly models the 3D motion by decomposing a scene into its object parts and predicting their corresponding rigid transformations. Our model is fully unsupervised, captures the stochastic nature of the real world, and the observational cues in image and point cloud domains constitute its learning signals. To fully leverage all the 2D and 3D observational signals, we equip our model with automatic hyperparameter optimization (HPO) to interpret the best way of learning from them. To the best of our knowledge, our model is the first generative model that provides an RGB-D video prediction of the future for a static camera. Our extensive evaluation with simulated and real-world datasets demonstrates that our formulation leads to interpretable 3D models that predict future depth videos while achieving on-par performance with 2D models on RGB video prediction. Moreover, we demonstrate that our model outperforms 2D baselines on visuomotor control. Videos, code, dataset, and pre-trained models are available at this http URL  ( 3 min )
    Two-terminal source coding with common sum reconstruction. (arXiv:2206.06973v2 [cs.IT] UPDATED)
    We present the problem of two-terminal source coding with Common Sum Reconstruction (CSR). Consider two terminals, each with access to one of two correlated sources. Both terminals want to reconstruct the sum of the two sources under some average distortion constraint, and the reconstructions at two terminals must be identical with high probability. In this paper, we develop inner and outer bounds to the achievable rate distortion region of the CSR problem for a doubly symmetric binary source. We employ existing achievability results for Steinberg's common reconstruction and Wyner-Ziv's source coding with side information problems, and an achievability result for the lossy version of Korner-Marton's modulo-two sum computation problem.  ( 2 min )
    Holmes: An Efficient and Lightweight Semantic Based Anomalous Email Detector. (arXiv:2104.08044v12 [cs.CR] UPDATED)
    Email threat is a serious issue for enterprise security, which consists of various malicious scenarios, such as phishing, fraud, blackmail and malvertisement. Traditional anti-spam gateway commonly requires to maintain a greylist to filter out unexpected emails based on suspicious vocabularies existed in the mail subject and content. However, the signature-based approach cannot effectively discover novel and unknown suspicious emails that utilize various hot topics at present, such as COVID-19 and US election. To address the problem, in this paper, we present Holmes, an efficient and lightweight semantic based engine for anomalous email detection. Holmes can convert each event log of email to a sentence through word embedding then extract interesting items among them by novelty detection. Based on our observations, we claim that, in an enterprise environment, there is a stable relation between senders and receivers, but suspicious emails are commonly from unusual sources, which can be detected through the rareness selection. We evaluate the performance of Holmes in a real-world enterprise environment, in which it sends and receives around 5,000 emails each day. As a result, Holmes can achieve a high detection rate (output around 200 suspicious emails per day) and maintain a low false alarm rate for anomaly detection.  ( 3 min )
    Computational Discovery of Energy-Efficient Heat Treatment for Microstructure Design using Deep Reinforcement Learning. (arXiv:2209.11259v1 [cond-mat.mtrl-sci])
    Deep Reinforcement Learning (DRL) is employed to develop autonomously optimized and custom-designed heat-treatment processes that are both, microstructure-sensitive and energy efficient. Different from conventional supervised machine learning, DRL does not rely on static neural network training from data alone, but a learning agent autonomously develops optimal solutions, based on reward and penalty elements, with reduced or no supervision. In our approach, a temperature-dependent Allen-Cahn model for phase transformation is used as the environment for the DRL agent, serving as the model world in which it gains experience and takes autonomous decisions. The agent of the DRL algorithm is controlling the temperature of the system, as a model furnace for heat-treatment of alloys. Microstructure goals are defined for the agent based on the desired microstructure of the phases. After training, the agent can generate temperature-time profiles for a variety of initial microstructure states to reach the final desired microstructure state. The agent's performance and the physical meaning of the heat-treatment profiles generated are investigated in detail. In particular, the agent is capable of controlling the temperature to reach the desired microstructure starting from a variety of initial conditions. This capability of the agent in handling a variety of conditions paves the way for using such an approach also for recycling-oriented heat treatment process design where the initial composition can vary from batch to batch, due to impurity intrusion, and also for the design of energy-efficient heat treatments. For testing this hypothesis, an agent without penalty on the total consumed energy is compared with one that considers energy costs. The energy cost penalty is imposed as an additional criterion on the agent for finding the optimal temperature-time profile.  ( 3 min )
    A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models. (arXiv:2209.11533v1 [cs.LG])
    Variational inference with Gaussian mixture models (GMMs) enables learning of highly-tractable yet multi-modal approximations of intractable target distributions. GMMs are particular relevant for problem settings with up to a few hundred dimensions, for example in robotics, for modelling distributions over trajectories or joint distributions. This work focuses on two very effective methods for GMM-based variational inference that both employ independent natural gradient updates for the individual components and the categorical distribution of the weights. We show for the first time, that their derived updates are equivalent, although their practical implementations and theoretical guarantees differ. We identify several design choices that distinguish both approaches, namely with respect to sample selection, natural gradient estimation, stepsize adaptation, and whether trust regions are enforced or the number of components adapted. We perform extensive ablations on these design choices and show that they strongly affect the efficiency of the optimization and the variability of the learned distribution. Based on our insights, we propose a novel instantiation of our generalized framework, that combines first-order natural gradient estimates with trust-regions and component adaption, and significantly outperforms both previous methods in all our experiments.  ( 2 min )
    Model Free Barrier Functions via Implicit Evading Maneuvers. (arXiv:2107.12871v3 [cs.LG] UPDATED)
    This paper demonstrates that the safety override arising from the use of a barrier function can in some cases be needlessly restrictive. In particular, we examine the case of fixed-wing collision avoidance and show that when using a barrier function, there are cases where two fixed-wing aircraft can come closer to colliding than if there were no barrier function at all. In addition, we construct cases where the barrier function labels the system as unsafe even when the vehicles start arbitrarily far apart. In other words, the barrier function ensures safety but with unnecessary costs to performance. We therefore introduce model-free barrier functions which take a data driven approach to creating a barrier function. We demonstrate the effectiveness of model-free barrier functions in a collision avoidance simulation of two fixed-wing aircraft.  ( 2 min )
    On the Robustness of Sparse Counterfactual Explanations to Adverse Perturbations. (arXiv:2201.09051v3 [cs.LG] UPDATED)
    Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algorithms can be changed. Researchers have proposed a number of desiderata that CEs should meet to be practically useful, such as requiring minimal effort to enact, or complying with causal models. We consider a further aspect to improve the usability of CEs: robustness to adverse perturbations, which may naturally happen due to unfortunate circumstances. Since CEs typically prescribe a sparse form of intervention (i.e., only a subset of the features should be changed), we study the effect of addressing robustness separately for the features that are recommended to be changed and those that are not. Our definitions are workable in that they can be incorporated as penalty terms in the loss functions that are used for discovering CEs. To experiment with robustness, we create and release code where five data sets (commonly used in the field of fair and explainable machine learning) have been enriched with feature-specific annotations that can be used to sample meaningful perturbations. Our experiments show that CEs are often not robust and, if adverse perturbations take place (even if not worst-case), the intervention they prescribe may require a much larger cost than anticipated, or even become impossible. However, accounting for robustness in the search process, which can be done rather easily, allows discovering robust CEs systematically. Robust CEs make additional intervention to contrast perturbations much less costly than non-robust CEs. We also find that robustness is easier to achieve for the features to change, posing an important point of consideration for the choice of what counterfactual explanation is best for the user. Our code is available at: https://github.com/marcovirgolin/robust-counterfactuals.  ( 3 min )
    FinNet: Solving Time-Independent Differential Equations with Finite Difference Neural Network. (arXiv:2202.09282v2 [cs.LG] UPDATED)
    Deep learning approaches for partial differential equations (PDEs) have received much attention in recent years due to their mesh-freeness and computational efficiency. However, most of the works so far have concentrated on time-dependent nonlinear differential equations. In this work, we analyze potential issues with the well-known Physic Informed Neural Network for differential equations with little constraints on the boundary (i.e., the constraints are only on a few points). This analysis motivates us to introduce a novel technique called FinNet, for solving differential equations by incorporating finite difference into deep learning. Even though we use a mesh during training, the prediction phase is mesh-free. We illustrate the effectiveness of our method through experiments on solving various equations, which shows that FinNet can solve PDEs with low error rates and may work even when PINNs cannot.  ( 2 min )
    Real-time Adversarial Perturbations against Deep Reinforcement Learning Policies: Attacks and Defenses. (arXiv:2106.08746v4 [cs.LG] UPDATED)
    Deep reinforcement learning (DRL) is vulnerable to adversarial perturbations. Adversaries can mislead the policies of DRL agents by perturbing the state of the environment observed by the agents. Existing attacks are feasible in principle, but face challenges in practice, either by being too slow to fool DRL policies in real time or by modifying past observations stored in the agent's memory. We show that Universal Adversarial Perturbations (UAP), independent of the individual inputs to which they are applied, can fool DRL policies effectively and in real time. We introduce three attack variants leveraging UAP. Via an extensive evaluation using three Atari 2600 games, we show that our attacks are effective, as they fully degrade the performance of three different DRL agents (up to 100%, even when the $l_\infty$ bound on the perturbation is as small as 0.01). It is faster than the frame rate (60 Hz) of image capture and considerably faster than prior attacks ($\approx 1.8$ms). Our attack technique is also efficient, incurring an online computational cost of $\approx 0.027$ms. Using two tasks involving robotic movement, we confirm that our results generalize to complex DRL tasks. Furthermore, we demonstrate that the effectiveness of known defenses diminishes against universal perturbations. We introduce an effective technique that detects all known adversarial perturbations against DRL policies, including all universal perturbations presented in this paper.  ( 3 min )
    Sequential Causal Effect Variational Autoencoder: Time Series Causal Link Estimation under Hidden Confounding. (arXiv:2209.11497v1 [cs.LG])
    Estimating causal effects from observational data in the presence of latent variables sometimes leads to spurious relationships which can be misconceived as causal. This is an important issue in many fields such as finance and climate science. We propose Sequential Causal Effect Variational Autoencoder (SCEVAE), a novel method for time series causality analysis under hidden confounding. It is based on the CEVAE framework and recurrent neural networks. The causal link's intensity of the confounded variables is calculated by using direct causal criteria based on Pearl's do-calculus. We show the efficacy of SCEVAE by applying it to synthetic datasets with both linear and nonlinear causal links. Furthermore, we apply our method to real aerosol-cloud-climate observation data. We compare our approach to a time series deconfounding method with and without substitute confounders on the synthetic data. We demonstrate that our method performs better by comparing both methods to the ground truth. In the case of real data, we use the expert knowledge of causal links and show how the use of correct proxy variables aids data reconstruction.  ( 2 min )
    Active Few-Shot Classification: a New Paradigm for Data-Scarce Learning Settings. (arXiv:2209.11481v1 [cs.LG])
    We consider a novel formulation of the problem of Active Few-Shot Classification (AFSC) where the objective is to classify a small, initially unlabeled, dataset given a very restrained labeling budget. This problem can be seen as a rival paradigm to classical Transductive Few-Shot Classification (TFSC), as both these approaches are applicable in similar conditions. We first propose a methodology that combines statistical inference, and an original two-tier active learning strategy that fits well into this framework. We then adapt several standard vision benchmarks from the field of TFSC. Our experiments show the potential benefits of AFSC can be substantial, with gains in average weighted accuracy of up to 10% compared to state-of-the-art TFSC methods for the same labeling budget. We believe this new paradigm could lead to new developments and standards in data-scarce learning settings.  ( 2 min )
    Deep Learning-based Anonymization of Chest Radiographs: A Utility-preserving Measure for Patient Privacy. (arXiv:2209.11531v1 [eess.IV])
    Robust and reliable anonymization of chest radiographs constitutes an essential step before publishing large datasets of such for research purposes. The conventional anonymization process is carried out by obscuring personal information in the images with black boxes and removing or replacing meta-information. However, such simple measures retain biometric information in the chest radiographs, allowing patients to be re-identified by a linkage attack. Therefore, we see an urgent need to obfuscate the biometric information appearing in the images. To the best of our knowledge, we propose the first deep learning-based approach to targetedly anonymize chest radiographs while maintaining data utility for diagnostic and machine learning purposes. Our model architecture is a composition of three independent neural networks that, when collectively used, allow for learning a deformation field that is able to impede patient re-identification. The individual influence of each component is investigated with an ablation study. Quantitative results on the ChestX-ray14 dataset show a reduction of patient re-identification from 81.8% to 58.6% in the area under the receiver operating characteristic curve (AUC) with little impact on the abnormality classification performance. This indicates the ability to preserve underlying abnormality patterns while increasing patient privacy. Furthermore, we compare the proposed deep learning-based anonymization approach with differentially private image pixelization, and demonstrate the superiority of our method towards resolving the privacy-utility trade-off for chest radiographs.  ( 3 min )
    Complex-Value Spatio-temporal Graph Convolutional Neural Networks and its Applications to Electric Power Systems AI. (arXiv:2208.08485v2 [cs.LG] UPDATED)
    The effective representation, precessing, analysis, and visualization of large-scale structured data over graphs are gaining a lot of attention. So far most of the literature has focused on real-valued signals. However, signals are often sparse in the Fourier domain, and more informative and compact representations for them can be obtained using the complex envelope of their spectral components, as opposed to the original real-valued signals. Motivated by this fact, in this work we generalize graph convolutional neural networks (GCN) to the complex domain, deriving the theory that allows to incorporate a complex-valued graph shift operators (GSO) in the definition of graph filters (GF) and process complex-valued graph signals (GS). The theory developed can handle spatio-temporal complex network processes. We prove that complex-valued GCNs are stable with respect to perturbations of the underlying graph support, the bound of the transfer error and the bound of error propagation through multiply layers. Then we apply complex GCN to power grid state forecasting, power grid cyber-attack detection and localization.  ( 3 min )
    Scalable Gaussian Process Hyperparameter Optimization via Coverage Regularization. (arXiv:2209.11280v1 [cs.LG])
    Gaussian processes (GPs) are Bayesian non-parametric models popular in a variety of applications due to their accuracy and native uncertainty quantification (UQ). Tuning GP hyperparameters is critical to ensure the validity of prediction accuracy and uncertainty; uniquely estimating multiple hyperparameters in, e.g. the Matern kernel can also be a significant challenge. Moreover, training GPs on large-scale datasets is a highly active area of research: traditional maximum likelihood hyperparameter training requires quadratic memory to form the covariance matrix and has cubic training complexity. To address the scalable hyperparameter tuning problem, we present a novel algorithm which estimates the smoothness and length-scale parameters in the Matern kernel in order to improve robustness of the resulting prediction uncertainties. Using novel loss functions similar to those in conformal prediction algorithms in the computational framework provided by the hyperparameter estimation algorithm MuyGPs, we achieve improved UQ over leave-one-out likelihood maximization while maintaining a high degree of scalability as demonstrated in numerical experiments.  ( 2 min )
    Differentiable physics-enabled closure modeling for Burgers' turbulence. (arXiv:2209.11614v1 [physics.flu-dyn])
    Data-driven turbulence modeling is experiencing a surge in interest following algorithmic and hardware developments in the data sciences. We discuss an approach using the differentiable physics paradigm that combines known physics with machine learning to develop closure models for Burgers' turbulence. We consider the 1D Burgers system as a prototypical test problem for modeling the unresolved terms in advection-dominated turbulence problems. We train a series of models that incorporate varying degrees of physical assumptions on an a posteriori loss function to test the efficacy of models across a range of system parameters, including viscosity, time, and grid resolution. We find that constraining models with inductive biases in the form of partial differential equations that contain known physics or existing closure approaches produces highly data-efficient, accurate, and generalizable models, outperforming state-of-the-art baselines. Addition of structure in the form of physics information also brings a level of interpretability to the models, potentially offering a stepping stone to the future of closure modeling.  ( 2 min )
    Detecting Concept Drift With Neural Network Model Uncertainty. (arXiv:2107.01873v2 [cs.LG] UPDATED)
    Deployed machine learning models are confronted with the problem of changing data over time, a phenomenon also called concept drift. While existing approaches of concept drift detection already show convincing results, they require true labels as a prerequisite for successful drift detection. Especially in many real-world application scenarios-like the ones covered in this work-true labels are scarce, and their acquisition is expensive. Therefore, we introduce a new algorithm for drift detection, Uncertainty Drift Detection (UDD), which is able to detect drifts without access to true labels. Our approach is based on the uncertainty estimates provided by a deep neural network in combination with Monte Carlo Dropout. Structural changes over time are detected by applying the ADWIN technique on the uncertainty estimates, and detected drifts trigger a retraining of the prediction model. In contrast to input data-based drift detection, our approach considers the effects of the current input data on the properties of the prediction model rather than detecting change on the input data only (which can lead to unnecessary retrainings). We show that UDD outperforms other state-of-the-art strategies on two synthetic as well as ten real-world data sets for both regression and classification tasks.  ( 3 min )
    Phased Progressive Learning with Coupling-Regulation-Imbalance Loss for Imbalanced Data Classification. (arXiv:2205.12117v2 [cs.LG] UPDATED)
    Deep neural networks generally perform poorly with datasets that suffer from quantity imbalance and classification difficulty imbalance problems. Despite progress in this field, there still are problems of dataset bias or domain shift in the existing two-stage approaches. Therefore, a phased progressive learning schedule enabling smooth transfer of training emphasis from representation learning to upper classifier training is proposed. This has greater effectivity on datasets of severer imbalances or smaller scales. A coupling-regulation-imbalance loss function is designed, coupling a correction term, Focal loss, and LDAM loss. The loss can better deal with quantity imbalance and outliers while regulating the focus-of-attention of samples with different classification difficulties. These approaches achieved satisfactory results on multiple benchmark datasets, including Imbalanced CIFAR10, Imbalanced CIFAR100, ImageNet-LT, and iNaturalist 2018, and they can also be easily generalized for other imbalanced classification models.
    Stochastic Multiple Target Sampling Gradient Descent. (arXiv:2206.01934v3 [cs.LG] UPDATED)
    Sampling from an unnormalized target distribution is an essential problem with many applications in probabilistic inference. Stein Variational Gradient Descent (SVGD) has been shown to be a powerful method that iteratively updates a set of particles to approximate the distribution of interest. Furthermore, when analysing its asymptotic properties, SVGD reduces exactly to a single-objective optimization problem and can be viewed as a probabilistic version of this single-objective optimization problem. A natural question then arises: "Can we derive a probabilistic version of the multi-objective optimization?". To answer this question, we propose Stochastic Multiple Target Sampling Gradient Descent (MT-SGD), enabling us to sample from multiple unnormalized target distributions. Specifically, our MT-SGD conducts a flow of intermediate distributions gradually orienting to multiple target distributions, which allows the sampled particles to move to the joint high-likelihood region of the target distributions. Interestingly, the asymptotic analysis shows that our approach reduces exactly to the multiple-gradient descent algorithm for multi-objective optimization, as expected. Finally, we conduct comprehensive experiments to demonstrate the merit of our approach to multi-task learning.
    Adapting $k$-means algorithms for outliers. (arXiv:2007.01118v2 [cs.DS] UPDATED)
    This paper shows how to adapt several simple and classical sampling-based algorithms for the $k$-means problem to the setting with outliers. Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical $k$-means++ algorithm to the setting with outliers. However, their algorithm needs to output $O(\log (k) \cdot z)$ outliers, where $z$ is the number of true outliers, to match the $O(\log k)$-approximation guarantee of $k$-means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed $k$-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output $(1+\varepsilon)z$ outliers while achieving an $O(1 / \varepsilon)$-approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the distributed setting, we adapt a simple algorithm of Guha et al. (IEEE Trans. Know. and Data Engineering 2003) and the popular $k$-means$\|$ of Bahmani et al. (PVLDB 2012). A theoretical application of our techniques is an algorithm with running time $\tilde{O}(nk^2/z)$ that achieves an $O(1)$-approximation to the objective function while outputting $O(z)$ outliers, assuming $k \ll z \ll n$. This is complemented with a matching lower bound of $\Omega(nk^2/z)$ for this problem in the oracle model.
    CMGAN: Conformer-based Metric GAN for Speech Enhancement. (arXiv:2203.15149v3 [cs.SD] UPDATED)
    Recently, convolution-augmented transformer (Conformer) has achieved promising performance in automatic speech recognition (ASR) and time-domain speech enhancement (SE), as it can capture both local and global dependencies in the speech signal. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for SE in the time-frequency (TF) domain. In the generator, we utilize two-stage conformer blocks to aggregate all magnitude and complex spectrogram information by modeling both time and frequency dependencies. The estimation of magnitude and complex spectrogram is decoupled in the decoder stage and then jointly incorporated to reconstruct the enhanced speech. In addition, a metric discriminator is employed to further improve the quality of the enhanced estimated speech by optimizing the generator with respect to a corresponding evaluation score. Quantitative analysis on Voice Bank+DEMAND dataset indicates the capability of CMGAN in outperforming various previous models with a margin, i.e., PESQ of 3.41 and SSNR of 11.10 dB.
    FLEX: Feature-Logic Embedding Framework for CompleX Knowledge Graph Reasoning. (arXiv:2205.11039v2 [cs.AI] UPDATED)
    Current best performing models for knowledge graph reasoning (KGR) introduce geometry objects or probabilistic distributions to embed entities and first-order logical (FOL) queries into low-dimensional vector spaces. They can be summarized as a center-size framework (point/box/cone, Beta/Gaussian distribution, etc.). However, they have limited logical reasoning ability. And it is difficult to generalize to various features, because the center and size are one-to-one constrained, unable to have multiple centers or sizes. To address these challenges, we instead propose a novel KGR framework named Feature-Logic Embedding framework, FLEX, which is the first KGR framework that can not only TRULY handle all FOL operations including conjunction, disjunction, negation and so on, but also support various feature spaces. Specifically, the logic part of feature-logic framework is based on vector logic, which naturally models all FOL operations. Experiments demonstrate that FLEX significantly outperforms existing state-of-the-art methods on benchmark datasets.
    From Weakly Supervised Learning to Active Learning. (arXiv:2209.11629v1 [cs.LG])
    Applied mathematics and machine computations have raised a lot of hope since the recent success of supervised learning. Many practitioners in industries have been trying to switch from their old paradigms to machine learning. Interestingly, those data scientists spend more time scrapping, annotating and cleaning data than fine-tuning models. This thesis is motivated by the following question: can we derive a more generic framework than the one of supervised learning in order to learn from clutter data? This question is approached through the lens of weakly supervised learning, assuming that the bottleneck of data collection lies in annotation. We model weak supervision as giving, rather than a unique target, a set of target candidates. We argue that one should look for an ``optimistic'' function that matches most of the observations. This allows us to derive a principle to disambiguate partial labels. We also discuss the advantage to incorporate unsupervised learning techniques into our framework, in particular manifold regularization approached through diffusion techniques, for which we derived a new algorithm that scales better with input dimension then the baseline method. Finally, we switch from passive to active weakly supervised learning, introducing the ``active labeling'' framework, in which a practitioner can query weak information about chosen data. Among others, we leverage the fact that one does not need full information to access stochastic gradients and perform stochastic gradient descent.
    Thermodynamics of learning physical phenomena. (arXiv:2207.12749v2 [cs.LG] UPDATED)
    Thermodynamics could be seen as an expression of physics at a high epistemic level. As such, its potential as an inductive bias to help machine learning procedures attain accurate and credible predictions has been recently realized in many fields. We review how thermodynamics provides helpful insights in the learning process. At the same time, we study the influence of aspects such as the scale at which a given phenomenon is to be described, the choice of relevant variables for this description or the different techniques available for the learning process.
    MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks. (arXiv:2207.07941v2 [cs.LG] UPDATED)
    Implementations of SGD on distributed systems create new vulnerabilities, which can be identified and misused by one or more adversarial agents. Recently, it has been shown that well-known Byzantine-resilient gradient aggregation schemes are indeed vulnerable to informed attackers that can tailor the attacks (Fang et al., 2020; Xie et al., 2020b). We introduce MixTailor, a scheme based on randomization of the aggregation strategies that makes it impossible for the attacker to be fully informed. Deterministic schemes can be integrated into MixTailor on the fly without introducing any additional hyperparameters. Randomization decreases the capability of a powerful adversary to tailor its attacks, while the resulting randomized aggregation scheme is still competitive in terms of performance. For both iid and non-iid settings, we establish almost sure convergence guarantees that are both stronger and more general than those available in the literature. Our empirical studies across various datasets, attacks, and settings, validate our hypothesis and show that MixTailor successfully defends when well-known Byzantine-tolerant schemes fail.
    Deep Fusion of Multi-Object Densities Using Transformer. (arXiv:2209.08857v2 [cs.LG] UPDATED)
    In this paper, we demonstrate that deep learning based method can be used to fuse multi-object densities. Given a scenario with several sensors with possibly different field-of-views, tracking is performed locally in each sensor by a tracker, which produces random finite set multi-object densities. To fuse outputs from different trackers, we adapt a recently proposed transformer-based multi-object tracker, where the fusion result is a global multi-object density, describing the set of all alive objects at the current time. We compare the performance of the transformer-based fusion method with a well-performing model-based Bayesian fusion method in several simulated scenarios with different parameter settings using synthetic data. The simulation results show that the transformer-based fusion method outperforms the model-based Bayesian method in our experimental scenarios.
    Applications of Machine Learning in Chemical and Biological Oceanography. (arXiv:2209.11557v1 [cs.LG])
    Machine learning (ML) refers to computer algorithms that predict a meaningful output or categorise complex systems based on a large amount of data. ML applied in a variety of areas, including natural science, engineering, space exploration, and even gaming development. This article focused on the use of machine learning in the field of chemical and biological oceanography. In the prediction of global fixed nitrogen levels, partial carbon dioxide pressure, and other chemical properties, the application of ML is a promising tool. Machine learning is also utilised in the field of biological oceanography to detect planktonic forms from various images (i.e., microscopy, FlowCAM and video recorder), spectrometers, and other signal processing techniques. Moreover, ML successfully classified the mammals using their acoustics, detecting endangered mammalian and fish species in a specific environment. Most importantly, using environmental data, the ML proved to be an effective method for predicting hypoxic conditions and the harmful algal bloom events, an important measurement in terms of environmental monitoring. Furthermore, machine learning was used to construct a number of databases for various species that will be useful to other researchers, and the creation of new algorithms will help the marine research community better comprehend the chemistry and biology of the ocean.
    Approximating Discontinuous Nash Equilibrial Values of Two-Player General-Sum Differential Games. (arXiv:2207.01773v2 [cs.LG] UPDATED)
    Finding Nash equilibrial policies for two-player differential games requires solving Hamilton-Jacobi-Isaacs (HJI) PDEs. Self-supervised learning has been used to approximate solutions of such PDEs while circumventing the well-known curse of dimensionality. However, this method fails to learn discontinuous PDE solutions due to its sampling nature, leading to poor safety performance of the resulting controllers in robotics applications when player rewards are discontinuous. This paper investigates two potential solutions to this problem: a hybrid method that leverages both supervised Nash equilibria and the HJI PDE, and a value-hardening method where a sequence of HJIs are solved with a gradually hardening reward. We compare these solutions using the resulting generalization and safety performance in two vehicle interaction case studies with 5D and 9D state spaces, respectively. Result shows that with informative supervision (e.g., collision and near-collision demonstrations) and the low cost of self-supervised learning, the hybrid method achieves better safety performance than the supervised, self-supervised, and value hardening approaches on equal computational budget. Value hardening fails to generalize in the higher-dimensional case without informative supervision. Lastly, we show that the neural activation function needs to be continuously differentiable for learning PDEs and its choice can be case dependent.
    DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation. (arXiv:2209.10797v1 [eess.SY] CROSS LISTED)
    Transformer is a deep learning language model widely used for natural language processing (NLP) services in datacenters. Among transformer models, Generative Pre-trained Transformer (GPT) has achieved remarkable performance in text generation, or natural language generation (NLG), which needs the processing of a large input context in the summarization stage, followed by the generation stage that produces a single word at a time. The conventional platforms such as GPU are specialized for the parallel processing of large inputs in the summarization stage, but their performance significantly degrades in the generation stage due to its sequential characteristic. Therefore, an efficient hardware platform is required to address the high latency caused by the sequential characteristic of text generation. In this paper, we present DFX, a multi-FPGA acceleration appliance that executes GPT-2 model inference end-to-end with low latency and high throughput in both summarization and generation stages. DFX uses model parallelism and optimized dataflow that is model-and-hardware-aware for fast simultaneous workload execution among devices. Its compute cores operate on custom instructions and provide GPT-2 operations end-to-end. We implement the proposed hardware architecture on four Xilinx Alveo U280 FPGAs and utilize all of the channels of the high bandwidth memory (HBM) and the maximum number of compute resources for high hardware efficiency. DFX achieves 5.58x speedup and 3.99x energy efficiency over four NVIDIA V100 GPUs on the modern GPT-2 model. DFX is also 8.21x more cost-effective than the GPU appliance, suggesting that it is a promising solution for text generation workloads in cloud datacenters.
    Image Classification using Sequence of Pixels. (arXiv:2209.11495v1 [eess.IV])
    This study compares sequential image classification methods based on recurrent neural networks. We describe methods based on recurrent neural networks such as Long-Short-Term memory(LSTM), bidirectional Long-Short-Term memory(BiLSTM) architectures, etc. We also review the state-of-the-art sequential image classification architectures. We mainly focus on LSTM, BiLSTM, temporal convolution network, and independent recurrent neural network architecture in the study. It is known that RNN lacks in learning long-term dependencies in the input sequence. We use a simple feature construction method using orthogonal Ramanujan periodic transform on the input sequence. Experiments demonstrate that if these features are given to LSTM or BiLSTM networks, the performance increases drastically. Our focus in this study is to increase the training accuracy simultaneously reducing the training time for the LSTM and BiLSTM architecture, but not on pushing the state-of-the-art results, so we use simple LSTM/BiLSTM architecture. We compare sequential input with the constructed feature as input to single layer LSTM and BiLSTM network for MNIST and CIFAR datasets. We observe that sequential input to the LSTM network with 128 hidden unit training for five epochs results in training accuracy of 33% whereas constructed features as input to the same LSTM network results in training accuracy of 90% with 1/3 lesser time.
    STEADY: Simultaneous State Estimation and Dynamics Learning from Indirect Observations. (arXiv:2203.01299v3 [cs.RO] UPDATED)
    Accurate kinodynamic models play a crucial role in many robotics applications such as off-road navigation and high-speed driving. Many state-of-the-art approaches in learning stochastic kinodynamic models, however, require precise measurements of robot states as labeled input/output examples, which can be hard to obtain in outdoor settings due to limited sensor capabilities and the absence of ground truth. In this work, we propose a new technique for learning neural stochastic kinodynamic models from noisy and indirect observations by performing simultaneous state estimation and dynamics learning. The proposed technique iteratively improves the kinodynamic model in an expectation-maximization loop, where the E Step samples posterior state trajectories using particle filtering, and the M Step updates the dynamics to be more consistent with the sampled trajectories via stochastic gradient ascent. We evaluate our approach on both simulation and real-world benchmarks and compare it with several baseline techniques. Our approach not only achieves significantly higher accuracy but is also more robust to observation noise, thereby showing promise for boosting the performance of many other robotics applications.
    Reducing Exploitability with Population Based Training. (arXiv:2208.05083v2 [cs.LG] UPDATED)
    Self-play reinforcement learning has achieved state-of-the-art, and often superhuman, performance in a variety of zero-sum games. Yet prior work has found that policies that are highly capable against regular opponents can fail catastrophically against adversarial policies: an opponent trained explicitly against the victim. Prior defenses using adversarial training were able to make the victim robust to a specific adversary, but the victim remained vulnerable to new ones. We conjecture this limitation was due to insufficient diversity of adversaries seen during training. We propose a defense using population based training to pit the victim against a diverse set of opponents. We evaluate this defense's robustness against new adversaries in two low-dimensional environments. Our defense increases robustness against adversaries, as measured by number of attacker training timesteps to exploit the victim. Furthermore, we show that robustness is correlated with the size of the opponent population.
    Neural Clamping: Joint Input Perturbation and Temperature Scaling for Neural Network Calibration. (arXiv:2209.11604v1 [cs.LG])
    Neural network calibration is an essential task in deep learning to ensure consistency between the confidence of model prediction and the true correctness likelihood. In this paper, we propose a new post-processing calibration method called Neural Clamping, which employs a simple joint input-output transformation on a pre-trained classifier via a learnable universal input perturbation and an output temperature scaling parameter. Moreover, we provide theoretical explanations on why Neural Clamping is provably better than temperature scaling. Evaluated on CIFAR-100 and ImageNet image recognition datasets and a variety of deep neural network models, our empirical results show that Neural Clamping significantly outperforms state-of-the-art post-processing calibration methods.
    A singular Riemannian geometry approach to Deep Neural Networks I. Theoretical foundations. (arXiv:2201.09656v2 [cs.LG] UPDATED)
    Deep Neural Networks are widely used for solving complex problems in several scientific areas, such as speech recognition, machine translation, image analysis. The strategies employed to investigate their theoretical properties mainly rely on Euclidean geometry, but in the last years new approaches based on Riemannian geometry have been developed. Motivated by some open problems, we study a particular sequence of maps between manifolds, with the last manifold of the sequence equipped with a Riemannian metric. We investigate the structures induced trough pullbacks on the other manifolds of the sequence and on some related quotients. In particular, we show that the pullbacks of the final Riemannian metric to any manifolds of the sequence is a degenerate Riemannian metric inducing a structure of pseudometric space, we show that the Kolmogorov quotient of this pseudometric space yields a smooth manifold, which is the base space of a particular vertical bundle. We investigate the theoretical properties of the maps of such sequence, eventually we focus on the case of maps between manifolds implementing neural networks of practical interest and we present some applications of the geometric framework we introduced in the first part of the paper.
    CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction. (arXiv:2205.08012v2 [cs.CL] UPDATED)
    Knowledge graph (KG) link prediction is a fundamental task in artificial intelligence, with applications in natural language processing, information retrieval, and biomedicine. Recently, promising results have been achieved by leveraging cross-modal information in KGs, using ensembles that combine knowledge graph embeddings (KGEs) and contextual language models (LMs). However, existing ensembles are either (1) not consistently effective in terms of ranking accuracy gains or (2) impractically inefficient on larger datasets due to the combinatorial explosion problem of pairwise ranking with deep language models. In this paper, we propose a novel tiered ranking architecture CascadER to maintain the ranking accuracy of full ensembling while improving efficiency considerably. CascadER uses LMs to rerank the outputs of more efficient base KGEs, relying on an adaptive subset selection scheme aimed at invoking the LMs minimally while maximizing accuracy gain over the KGE. Extensive experiments demonstrate that CascadER improves MRR by up to 9 points over KGE baselines, setting new state-of-the-art performance on four benchmarks while improving efficiency by one or more orders of magnitude over competitive cross-modal baselines. Our empirical analyses reveal that diversity of models across modalities and preservation of individual models' confidence signals help explain the effectiveness of CascadER, and suggest promising directions for cross-modal cascaded architectures. Code and pretrained models are available at https://github.com/tsafavi/cascader.
    Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition. (arXiv:2207.12248v2 [cs.SD] UPDATED)
    Computers can understand and then engage with people in an emotionally intelligent way thanks to speech-emotion recognition (SER). However, the performance of SER in cross-corpus and real-world live data feed scenarios can be significantly improved. The inability to adapt an existing model to a new domain is one of the shortcomings of SER methods. To address this challenge, researchers have developed domain adaptation techniques that transfer knowledge learnt by a model across the domain. Although existing domain adaptation techniques have improved performances across domains, they can be improved to adapt to a real-world live data feed situation where a model can self-tune while deployed. In this paper, we present a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained model to a real-world live data feed setting while interacting with the environment and collecting continual feedback. RL-DA is evaluated on SER tasks, including cross-corpus and cross-language domain adaption schema. Evaluation results show that in a live data feed setting, RL-DA outperforms a baseline strategy by 11% and 14% in cross-corpus and cross-language scenarios, respectively.
    Lightweight Transformers for Human Activity Recognition on Mobile Devices. (arXiv:2209.11750v1 [cs.CV])
    Human Activity Recognition (HAR) on mobile devices has shown to be achievable with lightweight neural models learned from data generated by the user's inertial measurement units (IMUs). Most approaches for instanced-based HAR have used Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTMs), or a combination of the two to achieve state-of-the-art results with real-time performances. Recently, the Transformers architecture in the language processing domain and then in the vision domain has pushed further the state-of-the-art over classical architectures. However, such Transformers architecture is heavyweight in computing resources, which is not well suited for embedded applications of HAR that can be found in the pervasive computing domain. In this study, we present Human Activity Recognition Transformer (HART), a lightweight, sensor-wise transformer architecture that has been specifically adapted to the domain of the IMUs embedded on mobile devices. Our experiments on HAR tasks with several publicly available datasets show that HART uses fewer FLoating-point Operations Per Second (FLOPS) and parameters while outperforming current state-of-the-art results. Furthermore, we present evaluations across various architectures on their performances in heterogeneous environments and show that our models can better generalize on different sensing devices or on-body positions.
    A singular Riemannian geometry approach to Deep Neural Networks II. Reconstruction of 1-D equivalence classes. (arXiv:2112.10583v2 [cs.LG] UPDATED)
    In a previous work, we proposed a geometric framework to study a deep neural network, seen as sequence of maps between manifolds, employing singular Riemannian geometry. In this paper, we present an application of this framework, proposing a way to build the class of equivalence of an input point: such class is defined as the set of the points on the input manifold mapped to the same output by the neural network. In other words, we build the preimage of a point in the output manifold in the input space. In particular. we focus for simplicity on the case of neural networks maps from n-dimensional real spaces to (n - 1)-dimensional real spaces, we propose an algorithm allowing to build the set of points lying on the same class of equivalence. This approach leads to two main applications: the generation of new synthetic data and it may provides some insights on how a classifier can be confused by small perturbation on the input data (e.g. a penguin image classified as an image containing a chihuahua). In addition, for neural networks from 2D to 1D real spaces, we also discuss how to find the preimages of closed intervals of the real line. We also present some numerical experiments with several neural networks trained to perform non-linear regression tasks, including the case of a binary classifier.
    Assessing Robustness of EEG Representations under Data-shifts via Latent Space and Uncertainty Analysis. (arXiv:2209.11233v1 [eess.SP])
    The recent availability of large datasets in bio-medicine has inspired the development of representation learning methods for multiple healthcare applications. Despite advances in predictive performance, the clinical utility of such methods is limited when exposed to real-world data. Here we develop model diagnostic measures to detect potential pitfalls during deployment without assuming access to external data. Specifically, we focus on modeling realistic data shifts in electrophysiological signals (EEGs) via data transforms, and extend the conventional task-based evaluations with analyses of a) model's latent space and b) predictive uncertainty, under these transforms. We conduct experiments on multiple EEG feature encoders and two clinically relevant downstream tasks using publicly available large-scale clinical EEGs. Within this experimental setting, our results suggest that measures of latent space integrity and model uncertainty under the proposed data shifts may help anticipate performance degradation during deployment.
    Introducing Non-Linear Activations into Quantum Generative Models. (arXiv:2205.14506v3 [quant-ph] UPDATED)
    Due to the linearity of quantum mechanics, it remains a challenge to design quantum generative machine learning models that embed non-linear activations into the evolution of the statevector. However, some of the most successful classical generative models, such as those based on neural networks, involve highly non-linear dynamics for quality training. In this paper, we explore the effect of these dynamics in quantum generative modeling by introducing a model that adds non-linear activations via a neural network structure onto the standard Born Machine framework - the Quantum Neuron Born Machine (QNBM). To achieve this, we utilize a previously introduced Quantum Neuron subroutine, which is a repeat-until-success circuit with mid-circuit measurements and classical control. After introducing the QNBM, we investigate how its performance depends on network size, by training a 3-layer QNBM with 4 output neurons and various input and hidden layer sizes. We then compare our non-linear QNBM to the linear Quantum Circuit Born Machine (QCBM). We allocate similar time and memory resources to each model, such that the only major difference is the qubit overhead required by the QNBM. With gradient-based training, we show that while both models can easily learn a trivial uniform probability distribution, on a more challenging class of distributions, the QNBM achieves an almost 3x smaller error rate than a QCBM with a similar number of tunable parameters. We therefore provide evidence that suggests that non-linearity is a useful resource in quantum generative models, and we put forth the QNBM as a new model with good generative performance and potential for quantum advantage.
    Learning Interpretable Dynamics from Images of a Freely Rotating 3D Rigid Body. (arXiv:2209.11355v1 [cs.CV])
    In many real-world settings, image observations of freely rotating 3D rigid bodies, such as satellites, may be available when low-dimensional measurements are not. However, the high-dimensionality of image data precludes the use of classical estimation techniques to learn the dynamics and a lack of interpretability reduces the usefulness of standard deep learning methods. In this work, we present a physics-informed neural network model to estimate and predict 3D rotational dynamics from image sequences. We achieve this using a multi-stage prediction pipeline that maps individual images to a latent representation homeomorphic to $\mathbf{SO}(3)$, computes angular velocities from latent pairs, and predicts future latent states using the Hamiltonian equations of motion with a learned representation of the Hamiltonian. We demonstrate the efficacy of our approach on a new rotating rigid-body dataset with sequences of rotating cubes and rectangular prisms with uniform and non-uniform density.
    An Additive Instance-Wise Approach to Multi-class Model Interpretation. (arXiv:2207.03113v2 [cs.LG] UPDATED)
    Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on selecting explanatory input features, which follow either additive or instance-wise directions. Additive methods exploit local neighborhoods to learn instance-specific explainers sequentially. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, instance-wise methods directly optimize local feature distributions in a global training framework, thereby being capable of leveraging global information from other inputs. However, they can only interpret single-class predictions and suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness with more compact and comprehensible explanations. We also demonstrate the capacity to select stable and important features through extensive experiments on various data sets and black-box model architectures.
    Learning State Representations via Retracing in Reinforcement Learning. (arXiv:2111.12600v2 [cs.LG] UPDATED)
    We propose learning via retracing, a novel self-supervised approach for learning the state representation (and the associated dynamics model) for reinforcement learning tasks. In addition to the predictive (reconstruction) supervision in the forward direction, we propose to include "retraced" transitions for representation / model learning, by enforcing the cycle-consistency constraint between the original and retraced states, hence improve upon the sample efficiency of learning. Moreover, learning via retracing explicitly propagates information about future transitions backward for inferring previous states, thus facilitates stronger representation learning for the downstream reinforcement learning tasks. We introduce Cycle-Consistency World Model (CCWM), a concrete model-based instantiation of learning via retracing. Additionally we propose a novel adaptive "truncation" mechanism for counteracting the negative impacts brought by "irreversible" transitions such that learning via retracing can be maximally effective. Through extensive empirical studies on visual-based continuous control benchmarks, we demonstrate that CCWM achieves state-of-the-art performance in terms of sample efficiency and asymptotic performance, whilst exhibiting behaviours that are indicative of stronger representation learning.
    GLSO: Grammar-guided Latent Space Optimization for Sample-efficient Robot Design Automation. (arXiv:2209.11748v1 [cs.RO])
    Robots have been used in all sorts of automation, and yet the design of robots remains mainly a manual task. We seek to provide design tools to automate the design of robots themselves. An important challenge in robot design automation is the large and complex design search space which grows exponentially with the number of components, making optimization difficult and sample inefficient. In this work, we present Grammar-guided Latent Space Optimization (GLSO), a framework that transforms design automation into a low-dimensional continuous optimization problem by training a graph variational autoencoder (VAE) to learn a mapping between the graph-structured design space and a continuous latent space. This transformation allows optimization to be conducted in a continuous latent space, where sample efficiency can be significantly boosted by applying algorithms such as Bayesian Optimization. GLSO guides training of the VAE using graph grammar rules and robot world space features, such that the learned latent space focus on valid robots and is easier for the optimization algorithm to explore. Importantly, the trained VAE can be reused to search for designs specialized to multiple different tasks without retraining. We evaluate GLSO by designing robots for a set of locomotion tasks in simulation, and demonstrate that our method outperforms related state-of-the-art robot design automation methods.
    Colonoscopy Landmark Detection using Vision Transformers. (arXiv:2209.11304v1 [cs.CV])
    Colonoscopy is a routine outpatient procedure used to examine the colon and rectum for any abnormalities including polyps, diverticula and narrowing of colon structures. A significant amount of the clinician's time is spent in post-processing snapshots taken during the colonoscopy procedure, for maintaining medical records or further investigation. Automating this step can save time and improve the efficiency of the process. In our work, we have collected a dataset of 120 colonoscopy videos and 2416 snapshots taken during the procedure, that have been annotated by experts. Further, we have developed a novel, vision-transformer based landmark detection algorithm that identifies key anatomical landmarks (the appendiceal orifice, ileocecal valve/cecum landmark and rectum retroflexion) from snapshots taken during colonoscopy. Our algorithm uses an adaptive gamma correction during preprocessing to maintain a consistent brightness for all images. We then use a vision transformer as the feature extraction backbone and a fully connected network based classifier head to categorize a given frame into four classes: the three landmarks or a non-landmark frame. We compare the vision transformer (ViT-B/16) backbone with ResNet-101 and ConvNext-B backbones that have been trained similarly. We report an accuracy of 82% with the vision transformer backbone on a test dataset of snapshots.
    I-SPLIT: Deep Network Interpretability for Split Computing. (arXiv:2209.11607v1 [cs.CV])
    This work makes a substantial step in the field of split computing, i.e., how to split a deep neural network to host its early part on an embedded device and the rest on a server. So far, potential split locations have been identified exploiting uniquely architectural aspects, i.e., based on the layer sizes. Under this paradigm, the efficacy of the split in terms of accuracy can be evaluated only after having performed the split and retrained the entire pipeline, making an exhaustive evaluation of all the plausible splitting points prohibitive in terms of time. Here we show that not only the architecture of the layers does matter, but the importance of the neurons contained therein too. A neuron is important if its gradient with respect to the correct class decision is high. It follows that a split should be applied right after a layer with a high density of important neurons, in order to preserve the information flowing until then. Upon this idea, we propose Interpretable Split (I-SPLIT): a procedure that identifies the most suitable splitting points by providing a reliable prediction on how well this split will perform in terms of classification accuracy, beforehand of its effective implementation. As a further major contribution of I-SPLIT, we show that the best choice for the splitting point on a multiclass categorization problem depends also on which specific classes the network has to deal with. Exhaustive experiments have been carried out on two networks, VGG16 and ResNet-50, and three datasets, Tiny-Imagenet-200, notMNIST, and Chest X-Ray Pneumonia. The source code is available at https://github.com/vips4/I-Split.
    An Investigation of the Bias-Variance Tradeoff in Meta-Gradients. (arXiv:2209.11303v1 [cs.LG])
    Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms. Estimation of meta-gradients is central to the performance of these meta-algorithms, and has been studied in the setting of MAML-style short-horizon meta-RL problems. In this context, prior work has investigated the estimation of the Hessian of the RL objective, as well as tackling the problem of credit assignment to pre-adaptation behavior by making a sampling correction. However, we show that Hessian estimation, implemented for example by DiCE and its variants, always adds bias and can also add variance to meta-gradient estimation. Meanwhile, meta-gradient estimation has been studied less in the important long-horizon setting, where backpropagation through the full inner optimization trajectories is not feasible. We study the bias and variance tradeoff arising from truncated backpropagation and sampling correction, and additionally compare to evolution strategies, which is a recently popular alternative strategy to long-horizon meta-learning. While prior work implicitly chooses points in this bias-variance space, we disentangle the sources of bias and variance and present an empirical study that relates existing estimators to each other.
    Importance Sampling CAMs for Weakly-Supervised Segmentation with Highly Accurate Contours. (arXiv:2203.12459v2 [cs.CV] UPDATED)
    Classification networks have been used in weakly-supervised semantic segmentation (WSSS) to segment objects by means of class activation maps (CAMs). However, without pixel-level annotations, they are known to (1) mainly focus on discriminative regions, and (2) to produce diffuse CAMs without well-defined prediction contours. In this work, we alleviate both problems by improving CAM learning. First, we incorporate importance sampling based on the class-wise probability mass function induced by the CAMs to produce stochastic image-level class predictions. This results in segmentations that cover a larger extent of the objects, as shown in our empirical studies. Second, we formulate a feature similarity loss term, which further improves the alignment of predicted contours with edges in the image. Furthermore, we shed new light onto the problem of WSSS by measuring the contour F-score as a complement to the common area mIoU metric. We show that our method significantly outperforms previous methods in terms of contour quality, while matching state-of-the-art on region similarity.
    Exact conservation laws for neural network integrators of dynamical systems. (arXiv:2209.11661v1 [math.DS])
    The solution of time dependent differential equations with neural networks has attracted a lot of attention recently. The central idea is to learn the laws that govern the evolution of the solution from data, which might be polluted with random noise. However, in contrast to other machine learning applications, usually a lot is known about the system at hand. For example, for many dynamical systems physical quantities such as energy or (angular) momentum are exactly conserved. Hence, the neural network has to learn these conservation laws from data and they will only be satisfied approximately due to finite training time and random noise. In this paper we present an alternative approach which uses Noether's Theorem to inherently incorporate conservation laws into the architecture of the neural network. We demonstrate that this leads to better predictions for three model systems: the motion of a non-relativistic particle in a three-dimensional Newtonian gravitational potential, the motion of a massive relativistic particle in the Schwarzschild metric and a system of two interacting particles in four dimensions.
    Multidimensional Interactive Fixed-Effects. (arXiv:2209.11691v1 [econ.EM])
    This paper studies a linear and additively separable model for multidimensional panel data of three or more dimensions with unobserved interactive fixed effects. Two approaches are considered to account for these unobserved interactive fixed-effects when estimating coefficients on the observed covariates. First, the model is embedded within the standard two-dimensional panel framework and restrictions are derived under which the factor structure methods in Bai (2009) lead to consistent estimation of model parameters. The second approach considers group fixed-effects and kernel methods that are more robust to the multidimensional nature of the problem. Theoretical results and simulations show the benefit of standard two-dimensional panel methods when the structure of the interactive fixed-effect term is known, but also highlight how the group fixed-effects and kernel methods perform well without knowledge of this structure. The methods are implemented to estimate the demand elasticity for beer under a handful of models for demand.
    A Preliminary Investigation of MLOps Practices in GitHub. (arXiv:2209.11453v1 [cs.SE])
    Background. The rapid and growing popularity of machine learning (ML) applications has led to an increasing interest in MLOps, that is, the practice of continuous integration and deployment (CI/CD) of ML-enabled systems. Aims. Since changes may affect not only the code but also the ML model parameters and the data themselves, the automation of traditional CI/CD needs to be extended to manage model retraining in production. Method. In this paper, we present an initial investigation of the MLOps practices implemented in a set of ML-enabled systems retrieved from GitHub, focusing on GitHub Actions and CML, two solutions to automate the development workflow. Results. Our preliminary results suggest that the adoption of MLOps workflows in open-source GitHub projects is currently rather limited. Conclusions. Issues are also identified, which can guide future research work.
    Deadwooding: Robust Global Pruning for Deep Neural Networks. (arXiv:2202.05226v4 [cs.LG] UPDATED)
    The ability of Deep Neural Networks to approximate highly complex functions is key to their success. This benefit, however, comes at the expense of a large model size, which challenges its deployment in resource-constrained environments. Pruning is an effective technique used to limit this issue, but often comes at the cost of reduced accuracy and adversarial robustness. This paper addresses these shortcomings and introduces Deadwooding, a novel global pruning technique that exploits a Lagrangian Dual method to encourage model sparsity while retaining accuracy and ensuring robustness. The resulting model is shown to significantly outperform the state-of-the-art studies in measures of robustness and accuracy.
    Unsupervised Deep Unrolled Reconstruction Using Regularization by Denoising. (arXiv:2205.03519v2 [eess.IV] UPDATED)
    Deep learning methods have been successfully used in various computer vision tasks. Inspired by that success, deep learning has been explored in magnetic resonance imaging (MRI) reconstruction. In particular, integrating deep learning and model-based optimization methods has shown considerable advantages. However, a large amount of labeled training data is typically needed for high reconstruction quality, which is challenging for some MRI applications. In this paper, we propose a novel reconstruction method, named DURED-Net, that enables interpretable unsupervised learning for MR image reconstruction by combining an unsupervised denoising network and a plug-and-play method. We aim to boost the reconstruction performance of unsupervised learning by adding an explicit prior that utilizes imaging physics. Specifically, the leverage of a denoising network for MRI reconstruction is achieved using Regularization by Denoising (RED). Experiment results demonstrate that the proposed method requires a reduced amount of training data to achieve high reconstruction quality.
    Tensor-CSPNet: A Novel Geometric Deep Learning Framework for Motor Imagery Classification. (arXiv:2202.02472v3 [eess.SP] UPDATED)
    Deep learning (DL) has been widely investigated in a vast majority of applications in electroencephalography (EEG)-based brain-computer interfaces (BCIs), especially for motor imagery (MI) classification in the past five years. The mainstream DL methodology for the MI-EEG classification exploits the temporospatial patterns of EEG signals using convolutional neural networks (CNNs), which have remarkably succeeded in visual images. However, since the statistical characteristics of visual images depart radically from EEG signals, a natural question arises whether an alternative network architecture exists apart from CNNs. To address this question, we propose a novel geometric deep learning (GDL) framework called Tensor-CSPNet, which characterizes spatial covariance matrices derived from EEG signals on symmetric positive definite (SPD) manifolds and fully captures the temporospatiofrequency patterns using existing deep neural networks on SPD manifolds, integrating with experiences from many successful MI-EEG classifiers to optimize the framework. In the experiments, Tensor-CSPNet attains or slightly outperforms the current state-of-the-art performance on the cross-validation and holdout scenarios in two commonly-used MI-EEG datasets. Moreover, the visualization and interpretability analyses also exhibit the validity of Tensor-CSPNet for the MI-EEG classification. To conclude, in this study, we provide a feasible answer to the question by generalizing the DL methodologies on SPD manifolds, which indicates the start of a specific GDL methodology for the MI-EEG classification.
    Separation of Scales and a Thermodynamic Description of Feature Learning in Some CNNs. (arXiv:2112.15383v3 [stat.ML] UPDATED)
    Deep neural networks (DNNs) are powerful tools for compressing and distilling information. Their scale and complexity, often involving billions of inter-dependent parameters, render direct microscopic analysis difficult. Under such circumstances, a common strategy is to identify slow variables that average the erratic behavior of the fast microscopic variables. Here, we identify a similar separation of scales occurring in fully trained finitely over-parameterized deep convolutional neural networks (CNNs) and fully connected networks (FCNs). Specifically, we show that DNN layers couple only through the second moment (kernels) of their activations and pre-activations. Moreover, the latter fluctuates in a nearly Gaussian manner. For infinite width DNNs, these kernels are inert, while for finite ones they adapt to the data and yield a tractable data-aware Gaussian Process. The resulting thermodynamic theory of deep learning yields accurate predictions in various settings. In addition, it provides new ways of analyzing and understanding DNNs in general.
    Semantic scene descriptions as an objective of human vision. (arXiv:2209.11737v1 [cs.CV])
    Interpreting the meaning of a visual scene requires not only identification of its constituent objects, but also a rich semantic characterization of object interrelations. Here, we study the neural mechanisms underlying visuo-semantic transformations by applying modern computational techniques to a large-scale 7T fMRI dataset of human brain responses elicited by complex natural scenes. Using semantic embeddings obtained by applying linguistic deep learning models to human-generated scene descriptions, we identify a widely distributed network of brain regions that encode semantic scene descriptions. Importantly, these semantic embeddings better explain activity in these regions than traditional object category labels. In addition, they are effective predictors of activity despite the fact that the participants did not actively engage in a semantic task, suggesting that visuo-semantic transformations are a default mode of vision. In support of this view, we then show that highly accurate reconstructions of scene captions can be directly linearly decoded from patterns of brain activity. Finally, a recurrent convolutional neural network trained on semantic embeddings further outperforms semantic embeddings in predicting brain activity, providing a mechanistic model of the brain's visuo-semantic transformations. Together, these experimental and computational results suggest that transforming visual input into rich semantic scene descriptions may be a central objective of the visual system, and that focusing efforts on this new objective may lead to improved models of visual information processing in the human brain.
    Privacy-preserving Federated Adversarial Domain Adaption over Feature Groups for Interpretability. (arXiv:2111.10934v2 [cs.LG] UPDATED)
    We present a novel privacy-preserving federated adversarial domain adaptation approach ($\textbf{PrADA}$) to address an under-studied but practical cross-silo federated domain adaptation problem, in which the party of the target domain is insufficient in both samples and features. We address the lack-of-feature issue by extending the feature space through vertical federated learning with a feature-rich party and tackle the sample-scarce issue by performing adversarial domain adaptation from the sample-rich source party to the target party. In this work, we focus on financial applications where interpretability is critical. However, existing adversarial domain adaptation methods typically apply a single feature extractor to learn feature representations that are low-interpretable with respect to the target task. To improve interpretability, we exploit domain expertise to split the feature space into multiple groups that each holds relevant features, and we learn a semantically meaningful high-order feature from each feature group. In addition, we apply a feature extractor (along with a domain discriminator) for each feature group to enable a fine-grained domain adaptation. We design a secure protocol that enables performing the PrADA in a secure and efficient manner. We evaluate our approach on two tabular datasets. Experiments demonstrate both the effectiveness and practicality of our approach.
    Automated detection of Alzheimer disease using MRI images and deep neural networks- A review. (arXiv:2209.11282v1 [eess.IV])
    Early detection of Alzheimer disease is crucial for deploying interventions and slowing the disease progression. A lot of machine learning and deep learning algorithms have been explored in the past decade with the aim of building an automated detection for Alzheimer. Advancements in data augmentation techniques and advanced deep learning architectures have opened up new frontiers in this field, and research is moving at a rapid speed. Hence, the purpose of this survey is to provide an overview of recent research on deep learning models for Alzheimer disease diagnosis. In addition to categorizing the numerous data sources, neural network architectures, and commonly used assessment measures, we also classify implementation and reproducibility. Our objective is to assist interested researchers in keeping up with the newest developments and in reproducing earlier investigations as benchmarks. In addition, we also indicate future research directions for this topic.
    Stochastic Inverse Reinforcement Learning. (arXiv:1905.08513v8 [cs.LG] UPDATED)
    The goal of the inverse reinforcement learning (IRL) problem is to recover the reward functions from expert demonstrations. However, the IRL problem like any ill-posed inverse problem suffers the congenital defect that the policy may be optimal for many reward functions, and expert demonstrations may be optimal for many policies. In this work, we generalize the IRL problem to a well-posed expectation optimization problem stochastic inverse reinforcement learning (SIRL) to recover the probability distribution over reward functions. We adopt the Monte Carlo expectation-maximization (MCEM) method to estimate the parameter of the probability distribution as the first solution to the SIRL problem. The solution is succinct, robust, and transferable for a learning task and can generate alternative solutions to the IRL problem. Through our formulation, it is possible to observe the intrinsic property of the IRL problem from a global viewpoint, and our approach achieves a considerable performance on the objectworld.
    TeST: Test-time Self-Training under Distribution Shift. (arXiv:2209.11459v1 [cs.CV])
    Despite their recent success, deep neural networks continue to perform poorly when they encounter distribution shifts at test time. Many recently proposed approaches try to counter this by aligning the model to the new distribution prior to inference. With no labels available this requires unsupervised objectives to adapt the model on the observed test data. In this paper, we propose Test-Time Self-Training (TeST): a technique that takes as input a model trained on some source data and a novel data distribution at test time, and learns invariant and robust representations using a student-teacher framework. We find that models adapted using TeST significantly improve over baseline test-time adaptation algorithms. TeST achieves competitive performance to modern domain adaptation algorithms, while having access to 5-10x less data at time of adaption. We thoroughly evaluate a variety of baselines on two tasks: object detection and image segmentation and find that models adapted with TeST. We find that TeST sets the new state-of-the art for test-time domain adaptation algorithms.
    Environment Optimization for Multi-Agent Navigation. (arXiv:2209.11279v1 [cs.RO])
    Traditional approaches to the design of multi-agent navigation algorithms consider the environment as a fixed constraint, despite the obvious influence of spatial constraints on agents' performance. Yet hand-designing improved environment layouts and structures is inefficient and potentially expensive. The goal of this paper is to consider the environment as a decision variable in a system-level optimization problem, where both agent performance and environment cost can be accounted for. We begin by proposing a novel environment optimization problem. We show, through formal proofs, under which conditions the environment can change while guaranteeing completeness (i.e., all agents reach their navigation goals). Our solution leverages a model-free reinforcement learning approach. In order to accommodate a broad range of implementation scenarios, we include both online and offline optimization, and both discrete and continuous environment representations. Numerical results corroborate our theoretical findings and validate our approach.
    Learning Rigid Body Dynamics with Lagrangian Graph Neural Network. (arXiv:2209.11588v1 [cs.LG])
    Lagrangian and Hamiltonian neural networks (LNN and HNN respectively) encode strong inductive biases that allow them to outperform other models of physical systems significantly. However, these models have, thus far, mostly been limited to simple systems such as pendulums and springs or a single rigid body such as a gyroscope or a rigid rotor. Here, we present a Lagrangian graph neural network (LGNN) that can learn the dynamics of rigid bodies by exploiting their topology. We demonstrate the performance of LGNN by learning the dynamics of ropes, chains, and trusses with the bars modeled as rigid bodies. LGNN also exhibits generalizability -- LGNN trained on chains with a few segments exhibits generalizability to simulate a chain with large number of links and arbitrary link length. We also show that the LGNN can simulate unseen hybrid systems including bars and chains, on which they have not been trained on. Specifically, we show that the LGNN can be used to model the dynamics of complex real-world structures such as the stability of tensegrity structures. Finally, we discuss the non-diagonal nature of the mass matrix and it's ability to generalize in complex systems.
    TNet: A Model-Constrained Tikhonov Network Approach for Inverse Problems. (arXiv:2105.12033v3 [stat.ML] UPDATED)
    Deep Learning (DL), in particular deep neural networks (DNN), by default is purely data-driven and in general does not require physics. This is the strength of DL but also one of its key limitations when applied to science and engineering problems in which underlying physical properties and desired accuracy need to be achieved. DL methods in their original forms are not capable of respecting the underlying mathematical models or achieving desired accuracy even in big-data regimes. However, many data-driven science and engineering problems, such as inverse problems, typically have limited experimental or observational data, and DL would overfit the data in this case. Leveraging information encoded in the underlying mathematical models, we argue, not only compensates missing information in low data regimes but also provides opportunities to equip DL methods with the underlying physics, hence promoting better generalization. This paper develops a model-constrained deep learning approach and its variant TNet that are capable of learning information hidden in both the training data and the underlying mathematical models to solve inverse problems governed by partial differential equations. We provide the constructions and some theoretical results for the proposed approaches. We show that data randomization can enhance the smoothness of the networks and their generalizations. Comprehensive numerical results not only confirm the theoretical findings but also show that with even as little as 20 training data samples for 1D deconvolution, 50 for inverse 2D heat conductivity problem, 100 and 50 for inverse initial conditions for time-dependent 2D Burgers' equation and 2D Navier-Stokes equations, respectively. TNet solutions can be as accurate as Tikhonov solutions while being several orders of magnitude faster. This is possible owing to the model-constrained term, replications, and randomization.
    Robust Domain Adaptation for Machine Reading Comprehension. (arXiv:2209.11615v1 [cs.LG])
    Most domain adaptation methods for machine reading comprehension (MRC) use a pre-trained question-answer (QA) construction model to generate pseudo QA pairs for MRC transfer. Such a process will inevitably introduce mismatched pairs (i.e., noisy correspondence) due to i) the unavailable QA pairs in target documents, and ii) the domain shift during applying the QA construction model to the target domain. Undoubtedly, the noisy correspondence will degenerate the performance of MRC, which however is neglected by existing works. To solve such an untouched problem, we propose to construct QA pairs by additionally using the dialogue related to the documents, as well as a new domain adaptation method for MRC. Specifically, we propose Robust Domain Adaptation for Machine Reading Comprehension (RMRC) method which consists of an answer extractor (AE), a question selector (QS), and an MRC model. Specifically, RMRC filters out the irrelevant answers by estimating the correlation to the document via the AE, and extracts the questions by fusing the candidate questions in multiple rounds of dialogue chats via the QS. With the extracted QA pairs, MRC is fine-tuned and provides the feedback to optimize the QS through a novel reinforced self-training method. Thanks to the optimization of the QS, our method will greatly alleviate the noisy correspondence problem caused by the domain shift. To the best of our knowledge, this could be the first study to reveal the influence of noisy correspondence in domain adaptation MRC models and show a feasible way to achieve robustness to mismatched pairs. Extensive experiments on three datasets demonstrate the effectiveness of our method.
    Artificial Intelligence in Material Engineering: A review on applications of AI in Material Engineering. (arXiv:2209.11234v1 [cs.LG])
    Recently, there has been extensive use of artificial Intelligence (AI) in the field of material engineering. This can be attributed to the development of high performance computing and thereby feasibility to test deep learning models with large parameters. In this article we tried to review some of the latest developments in the applications of AI in material engineering.
    Minimizing Human Assistance: Augmenting a Single Demonstration for Deep Reinforcement Learning. (arXiv:2209.11275v1 [cs.LG])
    The use of human demonstrations in reinforcement learning has proven to significantly improve agent performance. However, any requirement for a human to manually 'teach' the model is somewhat antithetical to the goals of reinforcement learning. This paper attempts to minimize human involvement in the learning process while still retaining the performance advantages by using a single human example collected through a simple-to-use virtual reality simulation to assist with RL training. Our method augments a single demonstration to generate numerous human-like demonstrations that, when combined with Deep Deterministic Policy Gradients and Hindsight Experience Replay (DDPG + HER), significantly improve training time on simple tasks and allows the agent to solve a complex task (block stacking) that DDPG + HER alone cannot solve. The model achieves this significant training advantage using a single human example, requiring less than a minute of human input.  ( 2 min )
    Adaptive-SpikeNet: Event-based Optical Flow Estimation using Spiking Neural Networks with Learnable Neuronal Dynamics. (arXiv:2209.11741v1 [cs.CV])
    Event-based cameras have recently shown great potential for high-speed motion estimation owing to their ability to capture temporally rich information asynchronously. Spiking Neural Networks (SNNs), with their neuro-inspired event-driven processing can efficiently handle such asynchronous data, while neuron models such as the leaky-integrate and fire (LIF) can keep track of the quintessential timing information contained in the inputs. SNNs achieve this by maintaining a dynamic state in the neuron memory, retaining important information while forgetting redundant data over time. Thus, we posit that SNNs would allow for better performance on sequential regression tasks compared to similarly sized Analog Neural Networks (ANNs). However, deep SNNs are difficult to train due to vanishing spikes at later layers. To that effect, we propose an adaptive fully-spiking framework with learnable neuronal dynamics to alleviate the spike vanishing problem. We utilize surrogate gradient-based backpropagation through time (BPTT) to train our deep SNNs from scratch. We validate our approach for the task of optical flow estimation on the Multi-Vehicle Stereo Event-Camera (MVSEC) dataset and the DSEC-Flow dataset. Our experiments on these datasets show an average reduction of 13% in average endpoint error (AEE) compared to state-of-the-art ANNs. We also explore several down-scaled models and observe that our SNN models consistently outperform similarly sized ANNs offering 10%-16% lower AEE. These results demonstrate the importance of SNNs for smaller models and their suitability at the edge. In terms of efficiency, our SNNs offer substantial savings in network parameters (48x) and computational energy (51x) while attaining ~10% lower EPE compared to the state-of-the-art ANN implementations.  ( 3 min )
    An artificial neural network-based system for detecting machine failures using tiny sound data: A case study. (arXiv:2209.11527v1 [cs.SD])
    In an effort to advocate the research for a deep learning-based machine failure detection system, we present a case study of our proposed system based on a tiny sound dataset. Our case study investigates a variational autoencoder (VAE) for augmenting a small drill sound dataset from Valmet AB. A Valmet dataset contains 134 sounds that have been divided into two categories: "Anomaly" and "Normal" recorded from a drilling machine in Valmet AB, a company in Sundsvall, Sweden that supplies equipment and processes for the production of biofuels. Using deep learning models to detect failure drills on such a small sound dataset is typically unsuccessful. We employed a VAE to increase the number of sounds in the tiny dataset by synthesizing new sounds from original sounds. The augmented dataset was created by combining these synthesized sounds with the original sounds. We used a high-pass filter with a passband frequency of 1000 Hz and a low-pass filter with a passband frequency of 22\kern 0.16667em000 Hz to pre-process sounds in the augmented dataset before transforming them to Mel spectrograms. The pre-trained 2D-CNN Alexnet was then trained using these Mel spectrograms. When compared to using the original tiny sound dataset to train pre-trained Alexnet, using the augmented sound dataset enhanced the CNN model's classification results by 6.62\%(94.12\% when trained on the augmented dataset versus 87.5\% when trained on the original dataset).  ( 3 min )
    Local AdaGrad-Type Algorithm for Stochastic Convex-Concave Optimization. (arXiv:2106.10022v2 [cs.LG] UPDATED)
    Large scale convex-concave minimax problems arise in numerous applications, including game theory, robust training, and training of generative adversarial networks. Despite their wide applicability, solving such problems efficiently and effectively is challenging in the presence of large amounts of data using existing stochastic minimax methods. We study a class of stochastic minimax methods and develop a communication-efficient distributed stochastic extragradient algorithm, LocalAdaSEG, with an adaptive learning rate suitable for solving convex-concave minimax problems in the Parameter-Server model. LocalAdaSEG has three main features: (i) a periodic communication strategy that reduces the communication cost between workers and the server; (ii) an adaptive learning rate that is computed locally and allows for tuning-free implementation; and (iii) theoretically, a nearly linear speed-up with respect to the dominant variance term, arising from the estimation of the stochastic gradient, is proven in both the smooth and nonsmooth convex-concave settings. LocalAdaSEG is used to solve a stochastic bilinear game, and train a generative adversarial network. We compare LocalAdaSEG against several existing optimizers for minimax problems and demonstrate its efficacy through several experiments in both homogeneous and heterogeneous settings.  ( 3 min )
    A Robust and Explainable Data-Driven Anomaly Detection Approach For Power Electronics. (arXiv:2209.11427v1 [eess.SY])
    Timely and accurate detection of anomalies in power electronics is becoming increasingly critical for maintaining complex production systems. Robust and explainable strategies help decrease system downtime and preempt or mitigate infrastructure cyberattacks. This work begins by explaining the types of uncertainty present in current datasets and machine learning algorithm outputs. Three techniques for combating these uncertainties are then introduced and analyzed. We further present two anomaly detection and classification approaches, namely the Matrix Profile algorithm and anomaly transformer, which are applied in the context of a power electronic converter dataset. Specifically, the Matrix Profile algorithm is shown to be well suited as a generalizable approach for detecting real-time anomalies in streaming time-series data. The STUMPY python library implementation of the iterative Matrix Profile is used for the creation of the detector. A series of custom filters is created and added to the detector to tune its sensitivity, recall, and detection accuracy. Our numerical results show that, with simple parameter tuning, the detector provides high accuracy and performance in a variety of fault scenarios.  ( 2 min )
    MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust Classifier. (arXiv:2209.11549v1 [cs.CV])
    We offer a method for one-shot image synthesis that allows controlling manipulations of a single image by inverting a quasi-robust classifier equipped with strong regularizers. Our proposed method, entitled Magic, samples structured gradients from a pre-trained quasi-robust classifier to better preserve the input semantics while preserving its classification accuracy, thereby guaranteeing credibility in the synthesis. Unlike current methods that use complex primitives to supervise the process or use attention maps as a weak supervisory signal, Magic aggregates gradients over the input, driven by a guide binary mask that enforces a strong, spatial prior. Magic implements a series of manipulations with a single framework achieving shape and location control, intense non-rigid shape deformations, and copy/move operations in the presence of repeating objects and gives users firm control over the synthesis by requiring simply specifying binary guide masks. Our study and findings are supported by various qualitative comparisons with the state-of-the-art on the same images sampled from ImageNet and quantitative analysis using machine perception along with a user survey of 100+ participants that endorse our synthesis quality.  ( 2 min )
    Neural Lyapunov Control. (arXiv:2005.00611v4 [cs.LG] UPDATED)
    We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability. The framework consists of a learner that attempts to find the control and Lyapunov functions, and a falsifier that finds counterexamples to quickly guide the learner towards solutions. The procedure terminates when no counterexample is found by the falsifier, in which case the controlled nonlinear system is provably stable. The approach significantly simplifies the process of Lyapunov control design, provides end-to-end correctness guarantee, and can obtain much larger regions of attraction than existing methods such as LQR and SOS/SDP. We show experiments on how the new methods obtain high-quality solutions for challenging control problems.  ( 2 min )
    Optimizing Class Distribution in Memory for Multi-Label Online Continual Learning. (arXiv:2209.11469v1 [cs.LG])
    Online continual learning, especially when task identities and task boundaries are unavailable, is a challenging continual learning setting. One representative kind of methods for online continual learning is replay-based methods, in which a replay buffer called memory is maintained to keep a small part of past samples for overcoming catastrophic forgetting. When tackling with online continual learning, most existing replay-based methods focus on single-label problems in which each sample in the data stream has only one label. But multi-label problems may also happen in the online continual learning setting in which each sample may have more than one label. In the online setting with multi-label samples, the class distribution in data stream is typically highly imbalanced, and it is challenging to control class distribution in memory since changing the number of samples belonging to one class may affect the number of samples belonging to other classes. But class distribution in memory is critical for replay-based memory to get good performance, especially when the class distribution in data stream is highly imbalanced. In this paper, we propose a simple but effective method, called optimizing class distribution in memory (OCDM), for multi-label online continual learning. OCDM formulates the memory update mechanism as an optimization problem and updates the memory by solving this problem. Experiments on two widely used multi-label datasets show that OCDM can control the class distribution in memory well and can outperform other state-of-the-art methods.
    Differentially private partitioned variational inference. (arXiv:2209.11595v1 [cs.LG])
    Learning a privacy-preserving model from distributed sensitive data is an increasingly important problem, often formulated in the federated learning context. Variational inference has recently been extended to the non-private federated learning setting via the partitioned variational inference algorithm. For privacy protection, the current gold standard is called differential privacy. Differential privacy guarantees privacy in a strong, mathematically clearly defined sense. In this paper, we present differentially private partitioned variational inference, the first general framework for learning a variational approximation to a Bayesian posterior distribution in the federated learning setting while minimising the number of communication rounds and providing differential privacy guarantees for data subjects. We propose three alternative implementations in the general framework, one based on perturbing local optimisation done by individual parties, and two based on perturbing global updates (one using a version of federated averaging, one adding virtual parties to the protocol), and compare their properties both theoretically and empirically. We show that perturbing the local optimisation works well with simple and complex models as long as each party has enough local data. However, the privacy is always guaranteed independently by each party. In contrast, perturbing the global updates works best with relatively simple models. Given access to suitable secure primitives, such as secure aggregation or secure shuffling, the performance can be improved by all parties guaranteeing privacy jointly.
    Catoptric Light can be Dangerous: Effective Physical-World Attack by Natural Phenomenon. (arXiv:2209.11739v1 [cs.CV])
    Deep neural networks (DNNs) have achieved great success in many tasks. Therefore, it is crucial to evaluate the robustness of advanced DNNs. The traditional methods use stickers as physical perturbations to fool the classifiers, which is difficult to achieve stealthiness and there exists printing loss. Some new types of physical attacks use light beam to perform attacks (e.g., laser, projector), whose optical patterns are artificial rather than natural. In this work, we study a new type of physical attack, called adversarial catoptric light (AdvCL), in which adversarial perturbations are generated by common natural phenomena, catoptric light, to achieve stealthy and naturalistic adversarial attacks against advanced DNNs in physical environments. Carefully designed experiments demonstrate the effectiveness of the proposed method in simulated and real-world environments. The attack success rate is 94.90% in a subset of ImageNet and 83.50% in the real-world environment. We also discuss some of AdvCL's transferability and defense strategy against this attack.  ( 2 min )
    Combinatorial optimization and reasoning with graph neural networks. (arXiv:2102.09544v3 [cs.LG] UPDATED)
    Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning, especially graph neural networks (GNNs), as a key building block for combinatorial tasks, either directly as solvers or by enhancing exact solvers. The inductive bias of GNNs effectively encodes combinatorial and relational input due to their invariance to permutations and awareness of input sparsity. This paper presents a conceptual review of recent key advancements in this emerging field, aiming at optimization and machine learning researchers.  ( 2 min )
    Convolutional Learning on Multigraphs. (arXiv:2209.11354v1 [cs.LG])
    Graph convolutional learning has led to many exciting discoveries in diverse areas. However, in some applications, traditional graphs are insufficient to capture the structure and intricacies of the data. In such scenarios, multigraphs arise naturally as discrete structures in which complex dynamics can be embedded. In this paper, we develop convolutional information processing on multigraphs and introduce convolutional multigraph neural networks (MGNNs). To capture the complex dynamics of information diffusion within and across each of the multigraph's classes of edges, we formalize a convolutional signal processing model, defining the notions of signals, filtering, and frequency representations on multigraphs. Leveraging this model, we develop a multigraph learning architecture, including a sampling procedure to reduce computational complexity. The introduced architecture is applied towards optimal wireless resource allocation and a hate speech localization task, offering improved performance over traditional graph neural networks.  ( 2 min )
    ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. (arXiv:2209.11302v1 [cs.RO])
    Task planning can require defining myriad domain knowledge about the world in which a robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to score potential next actions during task planning, and even generate action sequences directly, given an instruction in natural language with no additional domain information. However, such methods either require enumerating all possible next steps for scoring, or generate free-form text that may contain actions not possible on a given robot in its current context. We present a programmatic LLM prompt structure that enables plan generation functional across situated environments, robot capabilities, and tasks. Our key insight is to prompt the LLM with program-like specifications of the available actions and objects in an environment, as well as with example programs that can be executed. We make concrete recommendations about prompt structure and generation constraints through ablation experiments, demonstrate state of the art success rates in VirtualHome household tasks, and deploy our method on a physical robot arm for tabletop tasks. Website at progprompt.github.io  ( 2 min )
    LEADER: Learning Attention over Driving Behaviors for Planning under Uncertainty. (arXiv:2209.11422v1 [cs.LG])
    Uncertainty on human behaviors poses a significant challenge to autonomous driving in crowded urban environments. The partially observable Markov decision processes (POMDPs) offer a principled framework for planning under uncertainty, often leveraging Monte Carlo sampling to achieve online performance for complex tasks. However, sampling also raises safety concerns by potentially missing critical events. To address this, we propose a new algorithm, LEarning Attention over Driving bEhavioRs (LEADER), that learns to attend to critical human behaviors during planning. LEADER learns a neural network generator to provide attention over human behaviors in real-time situations. It integrates the attention into a belief-space planner, using importance sampling to bias reasoning towards critical events. To train the algorithm, we let the attention generator and the planner form a min-max game. By solving the min-max game, LEADER learns to perform risk-aware planning without human labeling.  ( 2 min )
    Query-based Hard-Image Retrieval for Object Detection at Test Time. (arXiv:2209.11559v1 [cs.CV])
    There is a longstanding interest in capturing the error behaviour of object detectors by finding images where their performance is likely to be unsatisfactory. In real-world applications such as autonomous driving, it is also crucial to characterise potential failures beyond simple requirements of detection performance. For example, a missed detection of a pedestrian close to an ego vehicle will generally require closer inspection than a missed detection of a car in the distance. The problem of predicting such potential failures at test time has largely been overlooked in the literature and conventional approaches based on detection uncertainty fall short in that they are agnostic to such fine-grained characterisation of errors. In this work, we propose to reformulate the problem of finding "hard" images as a query-based hard image retrieval task, where queries are specific definitions of "hardness", and offer a simple and intuitive method that can solve this task for a large family of queries. Our method is entirely post-hoc, does not require ground-truth annotations, is independent of the choice of a detector, and relies on an efficient Monte Carlo estimation that uses a simple stochastic model in place of the ground-truth. We show experimentally that it can be applied successfully to a wide variety of queries for which it can reliably identify hard images for a given detector without any labelled data. We provide results on ranking and classification tasks using the widely used RetinaNet, Faster-RCNN, Mask-RCNN, and Cascade Mask-RCNN object detectors.  ( 3 min )
    Quantification before Selection: Active Dynamics Preference for Robust Reinforcement Learning. (arXiv:2209.11596v1 [cs.LG])
    Training a robust policy is critical for policy deployment in real-world systems or dealing with unknown dynamics mismatch in different dynamic systems. Domain Randomization~(DR) is a simple and elegant approach that trains a conservative policy to counter different dynamic systems without expert knowledge about the target system parameters. However, existing works reveal that the policy trained through DR tends to be over-conservative and performs poorly in target domains. Our key insight is that dynamic systems with different parameters provide different levels of difficulty for the policy, and the difficulty of behaving well in a system is constantly changing due to the evolution of the policy. If we can actively sample the systems with proper difficulty for the policy on the fly, it will stabilize the training process and prevent the policy from becoming over-conservative or over-optimistic. To operationalize this idea, we introduce Active Dynamics Preference~(ADP), which quantifies the informativeness and density of sampled system parameters. ADP actively selects system parameters with high informativeness and low density. We validate our approach in four robotic locomotion tasks with various discrepancies between the training and testing environments. Extensive results demonstrate that our approach has superior robustness for system inconsistency compared to several baselines.  ( 2 min )
    The complexity of unsupervised learning of lexicographic preferences. (arXiv:2209.11505v1 [cs.AI])
    This paper considers the task of learning users' preferences on a combinatorial set of alternatives, as generally used by online configurators, for example. In many settings, only a set of selected alternatives during past interactions is available to the learner. Fargier et al. [2018] propose an approach to learn, in such a setting, a model of the users' preferences that ranks previously chosen alternatives as high as possible; and an algorithm to learn, in this setting, a particular model of preferences: lexicographic preferences trees (LP-trees). In this paper, we study complexity-theoretical problems related to this approach. We give an upper bound on the sample complexity of learning an LP-tree, which is logarithmic in the number of attributes. We also prove that computing the LP tree that minimises the empirical risk can be done in polynomial time when restricted to the class of linear LP-trees.  ( 2 min )
    Achieve the Minimum Width of Neural Networks for Universal Approximation. (arXiv:2209.11395v1 [cs.LG])
    The universal approximation property (UAP) of neural networks is fundamental for deep learning, and it is well known that wide neural networks are universal approximators of continuous functions within both the $L^p$ norm and the continuous/uniform norm. However, the exact minimum width, $w_{\min}$, for the UAP has not been studied thoroughly. Recently, using a decoder-memorizer-encoder scheme, \citet{Park2021Minimum} found that $w_{\min} = \max(d_x+1,d_y)$ for both the $L^p$-UAP of ReLU networks and the $C$-UAP of ReLU+STEP networks, where $d_x,d_y$ are the input and output dimensions, respectively. In this paper, we consider neural networks with an arbitrary set of activation functions. We prove that both $C$-UAP and $L^p$-UAP for functions on compact domains share a universal lower bound of the minimal width; that is, $w^*_{\min} = \max(d_x,d_y)$. In particular, the critical width, $w^*_{\min}$, for $L^p$-UAP can be achieved by leaky-ReLU networks, provided that the input or output dimension is larger than one. Our construction is based on the approximation power of neural ordinary differential equations and the ability to approximate flow maps by neural networks. The nonmonotone or discontinuous activation functions case and the one-dimensional case are also discussed.  ( 2 min )
    Error Mitigation-Aided Optimization of Parameterized Quantum Circuits: Convergence Analysis. (arXiv:2209.11514v1 [quant-ph])
    Variational quantum algorithms (VQAs) offer the most promising path to obtaining quantum advantages via noisy intermediate-scale quantum (NISQ) processors. Such systems leverage classical optimization to tune the parameters of a parameterized quantum circuit (PQC). The goal is minimizing a cost function that depends on measurement outputs obtained from the PQC. Optimization is typically implemented via stochastic gradient descent (SGD). On NISQ computers, gate noise due to imperfections and decoherence affects the stochastic gradient estimates by introducing a bias. Quantum error mitigation (QEM) techniques can reduce the estimation bias without requiring any increase in the number of qubits, but they in turn cause an increase in the variance of the gradient estimates. This work studies the impact of quantum gate noise on the convergence of SGD for the variational eigensolver (VQE), a fundamental instance of VQAs. The main goal is ascertaining conditions under which QEM can enhance the performance of SGD for VQEs. It is shown that quantum gate noise induces a non-zero error-floor on the convergence error of SGD (evaluated with respect to a reference noiseless PQC), which depends on the number of noisy gates, the strength of the noise, as well as the eigenspectrum of the observable being measured and minimized. In contrast, with QEM, any arbitrarily small error can be obtained. Furthermore, for error levels attainable with or without QEM, QEM can reduce the number of required iterations, but only as long as the quantum noise level is sufficiently small, and a sufficiently large number of measurements is allowed at each SGD iteration. Numerical examples for a max-cut problem corroborate the main theoretical findings.  ( 3 min )
    On Efficient Reinforcement Learning for Full-length Game of StarCraft II. (arXiv:2209.11553v1 [cs.LG])
    StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL), of which the main difficulties include huge state space, varying action space, and a long time horizon. In this work, we investigate a set of RL techniques for the full-length game of StarCraft II. We investigate a hierarchical RL approach involving extracted macro-actions and a hierarchical architecture of neural networks. We investigate a curriculum transfer training procedure and train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64 map and using restrictive units, we achieve a win rate of 99% against the level-1 built-in AI. Through the curriculum transfer learning algorithm and a mixture of combat models, we achieve a 93% win rate against the most difficult non-cheating level built-in AI (level-7). In this extended version of the paper, we improve our architecture to train the agent against the cheating level AIs and achieve the win rate against the level-8, level-9, and level-10 AIs as 96%, 97%, and 94%, respectively. Our codes are at https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the AlphaStar for our work as well as the research and open-source community, we reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version of mAS is 1.07, which can be trained on the raw action space which has 564 actions. It is designed to run training on a single common machine, by making the hyper-parameters adjustable. We then compare our work with mAS using the same resources and show that our method is more effective. The codes of mini-AlphaStar are at https://github.com/liuruoze/mini-AlphaStar. We hope our study could shed some light on the future research of efficient reinforcement learning on SC2 and other large-scale games.  ( 3 min )
    Recurrence-free Survival Prediction under the Guidance of Automatic Gross Tumor Volume Segmentation for Head and Neck Cancers. (arXiv:2209.11268v1 [cs.CV])
    For Head and Neck Cancers (HNC) patient management, automatic gross tumor volume (GTV) segmentation and accurate pre-treatment cancer recurrence prediction are of great importance to assist physicians in designing personalized management plans, which have the potential to improve the treatment outcome and quality of life for HNC patients. In this paper, we developed an automated primary tumor (GTVp) and lymph nodes (GTVn) segmentation method based on combined pre-treatment positron emission tomography/computed tomography (PET/CT) scans of HNC patients. We extracted radiomics features from the segmented tumor volume and constructed a multi-modality tumor recurrence-free survival (RFS) prediction model, which fused the prediction results from separate CT radiomics, PET radiomics, and clinical models. We performed 5-fold cross-validation to train and evaluate our methods on the MICCAI 2022 HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR) dataset. The ensemble prediction results on the testing cohort achieved Dice scores of 0.77 and 0.73 for GTVp and GTVn segmentation, respectively, and a C-index value of 0.67 for RFS prediction. The code is publicly available (https://github.com/wangkaiwan/HECKTOR-2022-AIRT). Our team's name is AIRT.  ( 2 min )
    FusionVAE: A Deep Hierarchical Variational Autoencoder for RGB Image Fusion. (arXiv:2209.11277v1 [cs.CV])
    Sensor fusion can significantly improve the performance of many computer vision tasks. However, traditional fusion approaches are either not data-driven and cannot exploit prior knowledge nor find regularities in a given dataset or they are restricted to a single application. We overcome this shortcoming by presenting a novel deep hierarchical variational autoencoder called FusionVAE that can serve as a basis for many fusion tasks. Our approach is able to generate diverse image samples that are conditioned on multiple noisy, occluded, or only partially visible input images. We derive and optimize a variational lower bound for the conditional log-likelihood of FusionVAE. In order to assess the fusion capabilities of our model thoroughly, we created three novel datasets for image fusion based on popular computer vision datasets. In our experiments, we show that FusionVAE learns a representation of aggregated information that is relevant to fusion tasks. The results demonstrate that our approach outperforms traditional methods significantly. Furthermore, we present the advantages and disadvantages of different design choices.  ( 2 min )
    A Neural Model for Regular Grammar Induction. (arXiv:2209.11628v1 [cs.LG])
    Grammatical inference is a classical problem in computational learning theory and a topic of wider influence in natural language processing. We treat grammars as a model of computation and propose a novel neural approach to induction of regular grammars from positive and negative examples. Our model is fully explainable, its intermediate results are directly interpretable as partial parses, and it can be used to learn arbitrary regular grammars when provided with sufficient data. Our method consistently attains high recall and precision scores across a range of tests of varying complexity. We make the detailed results and code readily available.  ( 2 min )
    Do Current Multi-Task Optimization Methods in Deep Learning Even Help?. (arXiv:2209.11379v1 [cs.LG])
    Recent research has proposed a series of specialized optimization algorithms for deep multi-task models. It is often claimed that these multi-task optimization (MTO) methods yield solutions that are superior to the ones found by simply optimizing a weighted average of the task losses. In this paper, we perform large-scale experiments on a variety of language and vision tasks to examine the empirical validity of these claims. We show that, despite the added design and computational complexity of these algorithms, MTO methods do not yield any performance improvements beyond what is achievable via traditional optimization approaches. We highlight alternative strategies that consistently yield improvements to the performance profile and point out common training pitfalls that might cause suboptimal results. Finally, we outline challenges in reliably evaluating the performance of MTO algorithms and discuss potential solutions.  ( 2 min )
    Tensor-Based Multi-Modality Feature Selection and Regression for Alzheimer's Disease Diagnosis. (arXiv:2209.11372v1 [cs.LG])
    The assessment of Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) associated with brain changes remains a challenging task. Recent studies have demonstrated that combination of multi-modality imaging techniques can better reflect pathological characteristics and contribute to more accurate diagnosis of AD and MCI. In this paper, we propose a novel tensor-based multi-modality feature selection and regression method for diagnosis and biomarker identification of AD and MCI from normal controls. Specifically, we leverage the tensor structure to exploit high-level correlation information inherent in the multi-modality data, and investigate tensor-level sparsity in the multilinear regression model. We present the practical advantages of our method for the analysis of ADNI data using three imaging modalities (VBM- MRI, FDG-PET and AV45-PET) with clinical parameters of disease severity and cognitive scores. The experimental results demonstrate the superior performance of our proposed method against the state-of-the-art for the disease diagnosis and the identification of disease-specific regions and modality-related differences. The code for this work is publicly available at https://github.com/junfish/BIOS22.  ( 2 min )
    Smart Active Sampling to enhance Quality Assurance Efficiency. (arXiv:2209.11464v1 [cs.LG])
    We propose a new sampling strategy, called smart active sapling, for quality inspections outside the production line. Based on the principles of active learning a machine learning model decides which samples are sent to quality inspection. On the one hand, this minimizes the production of scrap parts due to earlier detection of quality violations. On the other hand, quality inspection costs are reduced for smooth operation.  ( 2 min )
    Relation Embedding based Graph Neural Networks for Handling Heterogeneous Graph. (arXiv:2209.11414v1 [cs.LG])
    Heterogeneous graph learning has drawn significant attentions in recent years, due to the success of graph neural networks (GNNs) and the broad applications of heterogeneous information networks. Various heterogeneous graph neural networks have been proposed to generalize GNNs for processing the heterogeneous graphs. Unfortunately, these approaches model the heterogeneity via various complicated modules. This paper aims to propose a simple yet efficient framework to make the homogeneous GNNs have adequate ability to handle heterogeneous graphs. Specifically, we propose Relation Embedding based Graph Neural Networks (RE-GNNs), which employ only one parameter per relation to embed the importance of edge type relations and self-loop connections. To optimize these relation embeddings and the other parameters simultaneously, a gradient scaling factor is proposed to constrain the embeddings to converge to suitable values. Besides, we theoretically demonstrate that our RE-GNNs have more expressive power than the meta-path based heterogeneous GNNs. Extensive experiments on the node classification tasks validate the effectiveness of our proposed method.  ( 2 min )
    StyleTime: Style Transfer for Synthetic Time Series Generation. (arXiv:2209.11306v1 [cs.LG])
    Neural style transfer is a powerful computer vision technique that can incorporate the artistic "style" of one image to the "content" of another. The underlying theory behind the approach relies on the assumption that the style of an image is represented by the Gram matrix of its features, which is typically extracted from pre-trained convolutional neural networks (e.g., VGG-19). This idea does not straightforwardly extend to time series stylization since notions of style for two-dimensional images are not analogous to notions of style for one-dimensional time series. In this work, a novel formulation of time series style transfer is proposed for the purpose of synthetic data generation and enhancement. We introduce the concept of stylized features for time series, which is directly related to the time series realism properties, and propose a novel stylization algorithm, called StyleTime, that uses explicit feature extraction techniques to combine the underlying content (trend) of one time series with the style (distributional properties) of another. Further, we discuss evaluation metrics, and compare our work to existing state-of-the-art time series generation and augmentation schemes. To validate the effectiveness of our methods, we use stylized synthetic data as a means for data augmentation to improve the performance of recurrent neural network models on several forecasting tasks.  ( 2 min )
    A Jensen-Shannon Divergence Based Loss Function for Bayesian Neural Networks. (arXiv:2209.11366v1 [cs.LG])
    Kullback-Leibler (KL) divergence is widely used for variational inference of Bayesian Neural Networks (BNNs). However, the KL divergence has limitations such as unboundedness and asymmetry. We examine the Jensen-Shannon (JS) divergence that is more general, bounded, and symmetric. We formulate a novel loss function for BNNs based on the geometric JS divergence and show that the conventional KL divergence-based loss function is its special case. We evaluate the divergence part of the proposed loss function in a closed form for a Gaussian prior. For any other general prior, Monte Carlo approximations can be used. We provide algorithms for implementing both of these cases. We demonstrate that the proposed loss function offers an additional parameter that can be tuned to control the degree of regularisation. We derive the conditions under which the proposed loss function regularises better than the KL divergence-based loss function for Gaussian priors and posteriors. We demonstrate performance improvements over the state-of-the-art KL divergence-based BNN on the classification of a noisy CIFAR data set and a biased histopathology data set.  ( 2 min )
  • Open

    Differentially private partitioned variational inference. (arXiv:2209.11595v1 [cs.LG])
    Learning a privacy-preserving model from distributed sensitive data is an increasingly important problem, often formulated in the federated learning context. Variational inference has recently been extended to the non-private federated learning setting via the partitioned variational inference algorithm. For privacy protection, the current gold standard is called differential privacy. Differential privacy guarantees privacy in a strong, mathematically clearly defined sense. In this paper, we present differentially private partitioned variational inference, the first general framework for learning a variational approximation to a Bayesian posterior distribution in the federated learning setting while minimising the number of communication rounds and providing differential privacy guarantees for data subjects. We propose three alternative implementations in the general framework, one based on perturbing local optimisation done by individual parties, and two based on perturbing global updates (one using a version of federated averaging, one adding virtual parties to the protocol), and compare their properties both theoretically and empirically. We show that perturbing the local optimisation works well with simple and complex models as long as each party has enough local data. However, the privacy is always guaranteed independently by each party. In contrast, perturbing the global updates works best with relatively simple models. Given access to suitable secure primitives, such as secure aggregation or secure shuffling, the performance can be improved by all parties guaranteeing privacy jointly.
    A Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models. (arXiv:2209.11533v1 [cs.LG])
    Variational inference with Gaussian mixture models (GMMs) enables learning of highly-tractable yet multi-modal approximations of intractable target distributions. GMMs are particular relevant for problem settings with up to a few hundred dimensions, for example in robotics, for modelling distributions over trajectories or joint distributions. This work focuses on two very effective methods for GMM-based variational inference that both employ independent natural gradient updates for the individual components and the categorical distribution of the weights. We show for the first time, that their derived updates are equivalent, although their practical implementations and theoretical guarantees differ. We identify several design choices that distinguish both approaches, namely with respect to sample selection, natural gradient estimation, stepsize adaptation, and whether trust regions are enforced or the number of components adapted. We perform extensive ablations on these design choices and show that they strongly affect the efficiency of the optimization and the variability of the learned distribution. Based on our insights, we propose a novel instantiation of our generalized framework, that combines first-order natural gradient estimates with trust-regions and component adaption, and significantly outperforms both previous methods in all our experiments.
    Combinatorial optimization and reasoning with graph neural networks. (arXiv:2102.09544v3 [cs.LG] UPDATED)
    Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning, especially graph neural networks (GNNs), as a key building block for combinatorial tasks, either directly as solvers or by enhancing exact solvers. The inductive bias of GNNs effectively encodes combinatorial and relational input due to their invariance to permutations and awareness of input sparsity. This paper presents a conceptual review of recent key advancements in this emerging field, aiming at optimization and machine learning researchers.
    Stochastic Multiple Target Sampling Gradient Descent. (arXiv:2206.01934v3 [cs.LG] UPDATED)
    Sampling from an unnormalized target distribution is an essential problem with many applications in probabilistic inference. Stein Variational Gradient Descent (SVGD) has been shown to be a powerful method that iteratively updates a set of particles to approximate the distribution of interest. Furthermore, when analysing its asymptotic properties, SVGD reduces exactly to a single-objective optimization problem and can be viewed as a probabilistic version of this single-objective optimization problem. A natural question then arises: "Can we derive a probabilistic version of the multi-objective optimization?". To answer this question, we propose Stochastic Multiple Target Sampling Gradient Descent (MT-SGD), enabling us to sample from multiple unnormalized target distributions. Specifically, our MT-SGD conducts a flow of intermediate distributions gradually orienting to multiple target distributions, which allows the sampled particles to move to the joint high-likelihood region of the target distributions. Interestingly, the asymptotic analysis shows that our approach reduces exactly to the multiple-gradient descent algorithm for multi-objective optimization, as expected. Finally, we conduct comprehensive experiments to demonstrate the merit of our approach to multi-task learning.  ( 2 min )
    TNet: A Model-Constrained Tikhonov Network Approach for Inverse Problems. (arXiv:2105.12033v3 [stat.ML] UPDATED)
    Deep Learning (DL), in particular deep neural networks (DNN), by default is purely data-driven and in general does not require physics. This is the strength of DL but also one of its key limitations when applied to science and engineering problems in which underlying physical properties and desired accuracy need to be achieved. DL methods in their original forms are not capable of respecting the underlying mathematical models or achieving desired accuracy even in big-data regimes. However, many data-driven science and engineering problems, such as inverse problems, typically have limited experimental or observational data, and DL would overfit the data in this case. Leveraging information encoded in the underlying mathematical models, we argue, not only compensates missing information in low data regimes but also provides opportunities to equip DL methods with the underlying physics, hence promoting better generalization. This paper develops a model-constrained deep learning approach and its variant TNet that are capable of learning information hidden in both the training data and the underlying mathematical models to solve inverse problems governed by partial differential equations. We provide the constructions and some theoretical results for the proposed approaches. We show that data randomization can enhance the smoothness of the networks and their generalizations. Comprehensive numerical results not only confirm the theoretical findings but also show that with even as little as 20 training data samples for 1D deconvolution, 50 for inverse 2D heat conductivity problem, 100 and 50 for inverse initial conditions for time-dependent 2D Burgers' equation and 2D Navier-Stokes equations, respectively. TNet solutions can be as accurate as Tikhonov solutions while being several orders of magnitude faster. This is possible owing to the model-constrained term, replications, and randomization.  ( 3 min )
    Scalable Gaussian Process Hyperparameter Optimization via Coverage Regularization. (arXiv:2209.11280v1 [cs.LG])
    Gaussian processes (GPs) are Bayesian non-parametric models popular in a variety of applications due to their accuracy and native uncertainty quantification (UQ). Tuning GP hyperparameters is critical to ensure the validity of prediction accuracy and uncertainty; uniquely estimating multiple hyperparameters in, e.g. the Matern kernel can also be a significant challenge. Moreover, training GPs on large-scale datasets is a highly active area of research: traditional maximum likelihood hyperparameter training requires quadratic memory to form the covariance matrix and has cubic training complexity. To address the scalable hyperparameter tuning problem, we present a novel algorithm which estimates the smoothness and length-scale parameters in the Matern kernel in order to improve robustness of the resulting prediction uncertainties. Using novel loss functions similar to those in conformal prediction algorithms in the computational framework provided by the hyperparameter estimation algorithm MuyGPs, we achieve improved UQ over leave-one-out likelihood maximization while maintaining a high degree of scalability as demonstrated in numerical experiments.  ( 2 min )
    Forecast combinations: an over 50-year review. (arXiv:2205.04216v2 [stat.ME] UPDATED)
    Forecast combinations have flourished remarkably in the forecasting community and, in recent years, have become part of the mainstream of forecasting research and activities. Combining multiple forecasts produced from single (target) series is now widely used to improve accuracy through the integration of information gleaned from different sources, thereby mitigating the risk of identifying a single "best" forecast. Combination schemes have evolved from simple combination methods without estimation, to sophisticated methods involving time-varying weights, nonlinear combinations, correlations among components, and cross-learning. They include combining point forecasts and combining probabilistic forecasts. This paper provides an up-to-date review of the extensive literature on forecast combinations, together with reference to available open-source software implementations. We discuss the potential and limitations of various methods and highlight how these ideas have developed over time. Some important issues concerning the utility of forecast combinations are also surveyed. Finally, we conclude with current research gaps and potential insights for future research.  ( 2 min )
    From Weakly Supervised Learning to Active Learning. (arXiv:2209.11629v1 [cs.LG])
    Applied mathematics and machine computations have raised a lot of hope since the recent success of supervised learning. Many practitioners in industries have been trying to switch from their old paradigms to machine learning. Interestingly, those data scientists spend more time scrapping, annotating and cleaning data than fine-tuning models. This thesis is motivated by the following question: can we derive a more generic framework than the one of supervised learning in order to learn from clutter data? This question is approached through the lens of weakly supervised learning, assuming that the bottleneck of data collection lies in annotation. We model weak supervision as giving, rather than a unique target, a set of target candidates. We argue that one should look for an ``optimistic'' function that matches most of the observations. This allows us to derive a principle to disambiguate partial labels. We also discuss the advantage to incorporate unsupervised learning techniques into our framework, in particular manifold regularization approached through diffusion techniques, for which we derived a new algorithm that scales better with input dimension then the baseline method. Finally, we switch from passive to active weakly supervised learning, introducing the ``active labeling'' framework, in which a practitioner can query weak information about chosen data. Among others, we leverage the fact that one does not need full information to access stochastic gradients and perform stochastic gradient descent.  ( 3 min )
    Treatment Effect Estimation from Observational Network Data using Augmented Inverse Probability Weighting and Machine Learning. (arXiv:2206.14591v2 [stat.ME] UPDATED)
    Causal inference methods for treatment effect estimation usually assume independent experimental units. However, this assumption is often questionable because experimental units may interact. We develop augmented inverse probability weighting (AIPW) for estimation and inference of causal treatment effects on dependent observational data. Our framework covers very general cases of spillover effects induced by units interacting in networks. We use plugin machine learning to estimate infinite-dimensional nuisance components leading to a consistent treatment effect estimator that converges at the parametric rate and asymptotically follows a Gaussian distribution. We apply our AIPW method to the Swiss StudentLife Study data to investigate the effect of hours spent studying on exam performance accounting for the students' social network.  ( 2 min )
    Unified Algorithms for RL with Decision-Estimation Coefficients: No-Regret, PAC, and Reward-Free Learning. (arXiv:2209.11745v1 [cs.LG])
    Finding unified complexity measures and algorithms for sample-efficient learning is a central topic of research in reinforcement learning (RL). The Decision-Estimation Coefficient (DEC) is recently proposed by Foster et al. (2021) as a necessary and sufficient complexity measure for sample-efficient no-regret RL. This paper makes progress towards a unified theory for RL with the DEC framework. First, we propose two new DEC-type complexity measures: Explorative DEC (EDEC), and Reward-Free DEC (RFDEC). We show that they are necessary and sufficient for sample-efficient PAC learning and reward-free learning, thereby extending the original DEC which only captures no-regret learning. Next, we design new unified sample-efficient algorithms for all three learning goals. Our algorithms instantiate variants of the Estimation-To-Decisions (E2D) meta-algorithm with a strong and general model estimation subroutine. Even in the no-regret setting, our algorithm E2D-TA improves upon the algorithms of Foster et al. (2021) which require either bounding a variant of the DEC which may be prohibitively large, or designing problem-specific estimation subroutines. As applications, we recover existing and obtain new sample-efficient learning results for a wide range of tractable RL problems using essentially a single algorithm. Finally, as a connection, we re-analyze two existing optimistic model-based algorithms based on Posterior Sampling or Maximum Likelihood Estimation, showing that they enjoy similar regret bounds as E2D-TA under similar structural conditions as the DEC.  ( 3 min )
    Separation of Scales and a Thermodynamic Description of Feature Learning in Some CNNs. (arXiv:2112.15383v3 [stat.ML] UPDATED)
    Deep neural networks (DNNs) are powerful tools for compressing and distilling information. Their scale and complexity, often involving billions of inter-dependent parameters, render direct microscopic analysis difficult. Under such circumstances, a common strategy is to identify slow variables that average the erratic behavior of the fast microscopic variables. Here, we identify a similar separation of scales occurring in fully trained finitely over-parameterized deep convolutional neural networks (CNNs) and fully connected networks (FCNs). Specifically, we show that DNN layers couple only through the second moment (kernels) of their activations and pre-activations. Moreover, the latter fluctuates in a nearly Gaussian manner. For infinite width DNNs, these kernels are inert, while for finite ones they adapt to the data and yield a tractable data-aware Gaussian Process. The resulting thermodynamic theory of deep learning yields accurate predictions in various settings. In addition, it provides new ways of analyzing and understanding DNNs in general.  ( 2 min )
    Neural Lyapunov Control. (arXiv:2005.00611v4 [cs.LG] UPDATED)
    We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability. The framework consists of a learner that attempts to find the control and Lyapunov functions, and a falsifier that finds counterexamples to quickly guide the learner towards solutions. The procedure terminates when no counterexample is found by the falsifier, in which case the controlled nonlinear system is provably stable. The approach significantly simplifies the process of Lyapunov control design, provides end-to-end correctness guarantee, and can obtain much larger regions of attraction than existing methods such as LQR and SOS/SDP. We show experiments on how the new methods obtain high-quality solutions for challenging control problems.  ( 2 min )
    Stochastic Inverse Reinforcement Learning. (arXiv:1905.08513v8 [cs.LG] UPDATED)
    The goal of the inverse reinforcement learning (IRL) problem is to recover the reward functions from expert demonstrations. However, the IRL problem like any ill-posed inverse problem suffers the congenital defect that the policy may be optimal for many reward functions, and expert demonstrations may be optimal for many policies. In this work, we generalize the IRL problem to a well-posed expectation optimization problem stochastic inverse reinforcement learning (SIRL) to recover the probability distribution over reward functions. We adopt the Monte Carlo expectation-maximization (MCEM) method to estimate the parameter of the probability distribution as the first solution to the SIRL problem. The solution is succinct, robust, and transferable for a learning task and can generate alternative solutions to the IRL problem. Through our formulation, it is possible to observe the intrinsic property of the IRL problem from a global viewpoint, and our approach achieves a considerable performance on the objectworld.  ( 3 min )
    On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks. (arXiv:2209.11740v1 [cs.CV])
    In this paper, we aim to improve the mathematical interpretability of convolutional neural networks for image classification. When trained on natural image datasets, such networks tend to learn parameters in the first layer that closely resemble oriented Gabor filters. By leveraging the properties of discrete Gabor-like convolutions, we prove that, under specific conditions, feature maps computed by the subsequent max pooling operator tend to approximate the modulus of complex Gabor-like coefficients, and as such, are stable with respect to certain input shifts. We then compute a probabilistic measure of shift invariance for these layers. More precisely, we show that some filters, depending on their frequency and orientation, are more likely than others to produce stable image representations. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree wavelet packet transform, a particular case of discrete Gabor-like decomposition. We demonstrate a strong correlation between shift invariance on the one hand and similarity with complex modulus on the other hand.  ( 2 min )
    Feature selection in stratification estimators of causal effects: lessons from potential outcomes, causal diagrams, and structural equations. (arXiv:2209.11400v1 [stat.ME])
    What is the ideal regression (if any) for estimating average causal effects? We study this question in the setting of discrete covariates, deriving expressions for the finite-sample variance of various stratification estimators. This approach clarifies the fundamental statistical phenomena underlying many widely-cited results. Our exposition combines insights from three distinct methodological traditions for studying causal effect estimation: potential outcomes, causal diagrams, and structural models with additive errors.  ( 2 min )
    Quantile-constrained Wasserstein projections for robust interpretability of numerical and machine learning models. (arXiv:2209.11539v1 [math.OC])
    Robustness studies of black-box models is recognized as a necessary task for numerical models based on structural equations and predictive models learned from data. These studies must assess the model's robustness to possible misspecification of regarding its inputs (e.g., covariate shift). The study of black-box models, through the prism of uncertainty quantification (UQ), is often based on sensitivity analysis involving a probabilistic structure imposed on the inputs, while ML models are solely constructed from observed data. Our work aim at unifying the UQ and ML interpretability approaches, by providing relevant and easy-to-use tools for both paradigms. To provide a generic and understandable framework for robustness studies, we define perturbations of input information relying on quantile constraints and projections with respect to the Wasserstein distance between probability measures, while preserving their dependence structure. We show that this perturbation problem can be analytically solved. Ensuring regularity constraints by means of isotonic polynomial approximations leads to smoother perturbations, which can be more suitable in practice. Numerical experiments on real case studies, from the UQ and ML fields, highlight the computational feasibility of such studies and provide local and global insights on the robustness of black-box models to input perturbations.  ( 3 min )

  • Open

    [P] Train Text to Image Diffusion Models in Keras
    Repo here: https://github.com/apapiu/guided-diffusion-keras Codebase + Colab Notebooks + datasets to train CLIP conditioned Diffusion models in keras. The notebooks (https://github.com/apapiu/guided-diffusion-keras#notebooks) allow you to train reasonable class or CLIP conditioned models on Colab within minutes to hours (depending on the datasets). My hope is that this will encourage more people without crazy resources to start experimenting more with training/developing new ideas in this space. This is still a work in progress and I have plans to add many more things (In/Outpainting, textual inversion, videos etc). Please let me know of any feedback! Also some of the code isn't the most idiomatic keras code so feel free to raise issues/PRs etc. ​ Images for the prompt:\"A small village in the Alps, spring, sunset\" submitted by /u/spring_m [link] [comments]  ( 89 min )
    [P] OpenAI Whisper ASR Webservice API released
    Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. For more details: https://github.com/ahmetoner/whisper-asr-webservice submitted by /u/fuzulis [link] [comments]  ( 88 min )
    [Project] Looking for a resource where I can find lots of files of a given programming language
    I'm working on a project where I need a lot of files from various languages (c++, java, python, javascript, etc.). Does anyone know of any resources or places I can find lots of files of a certain language? I have tried searching GitHub with "language:java" or "language:python" but still there are other file types in those projects. Any help would be much appreciated, thanks. submitted by /u/MandellaWho [link] [comments]  ( 88 min )
    [R] [2209.01687] Reconciling Individual Probability Forecasts
    submitted by /u/AforAnonymous [link] [comments]  ( 103 min )
    [D] Custom vocabulary for Whisper?
    Is it possible to add custom vocabulary words to the OpenAI Whisper ASR system? Its accuracy is excellent out of the box, but the ability to add custom words would make it even more useful in many specialized contexts. submitted by /u/gauss256 [link] [comments]  ( 88 min )
    [P] GitHub - SuperVisualApp/search: A minimal LAION 5B search index and server adopted from clip-retrieval
    submitted by /u/SuperVisualApp [link] [comments]  ( 88 min )
    [D] Forecasting and Trading on Renewables
    I have significant experience in forecasting energy production in wind and solar farms. Thinking of spinning a SaaS portal for all forecasting and trading needs of an energy producer/distributor. Vision is to have all forecasting operations integrated with trading functions in a single pane/portal with automations and other efficiencies. The thing I am missing though is the state of competition. Few questions below: Are there any other companies that offer the same? ( I know various that offer just forecasting services) What levels of accuracy does their forecasts achieve? How is this accuracy quantified in ROI or $$ ? Is there a platform that combines energy forecasting + trading capabilities? How can I meet and partner with a subject matter expert from the renewable energy industry? I feel I have huge amounts of technology leverage to put on the table but I am missing the industry expertise. submitted by /u/aifuturedev [link] [comments]  ( 89 min )
    [D] Natural Language Processing Tensorflow
    submitted by /u/abhay994 [link] [comments]  ( 88 min )
    Requesting help implementing Nvidia Isaac model. [P]
    Hi there, I'm hoping someone can help me out or point me in the right direction for implementing the Stereo Disparity neural network from the Nvidia Isaac robotics kit. It is my current understanding that it should be possible to use the model on my own data. Ultimately I would like to use it for real time inference within a Python program. Below are links to both the Nvidia page and the Github page for the model. I've tried following the instructions no the GitHub page, however on two different computers the commands have failed at different points. Am I wrong in thinking this is something that is available for immediate use with applications? Thanks in advance! https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_dnn_stereo_disparity https://catalog.ngc.nvidia.com/orgs/nvidia/teams/isaac/models/dnn_stereo_disparity submitted by /u/Extension_Fix5969 [link] [comments]  ( 108 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 89 min )
    [D] Time Series Question
    Can anyone help with this time series project? I have a project to identify sales of iphones for a store on their historical data, basically to avoid back order or stock out situation, so Im asked to calculate difference in numerical value of say like only 80 iphones were sold while 99 were requested by customers, so my model should predict, 19 more units must be stored, so how do i calculate this? what should be my approach? submitted by /u/ChrisPTLJC [link] [comments]  ( 88 min )
    [Discussion] Are random crops actual augmentations for fully convolutional networks ?
    Hello everyone, I have a question about data augmentation, specifically in computer vision. For tasks such as classification, it's common practise to do random sized crops followed by a resize operations to have all images in the batch be of the same size, another possibility is fixed sized crops, where the images do not undergo any rescaling. This is beneficial when the network does have fully connected layers because it learns to disentangle features and their location in the image for the classification decision. My question is does this still hold true for fully convolutional networks (for tasks such as segmentation) ? Since we know that convolutions are spatially equivariant, it means that the same input should lead to the same output for the convolution filters no matter where they are in the image. So do fixed sized crops add any new information for learning ? I think that they don't, as long as the receptive field is smaller than the size of the image, but would love to hear your opinions on the subject. submitted by /u/OkeySubstance [link] [comments]  ( 90 min )
    [R] Peregrine: Large Language Generative Text-to-Speech Model
    submitted by /u/Wishmecake [link] [comments]  ( 105 min )
    [D] Is MacOS good for ML?
    Hi everyone!!! i want start study Machine Learning, i know nothing. On YT i saw some videos and someone say that MacOS (In particular m1 mac) are good but not recommended as a windows pc. Now i have a m1 imac, is it good for ML or not? tysm<3 submitted by /u/Sakyy11 [link] [comments]  ( 94 min )
    [D] Need help on finding an area where machine learning is applicable on day-to-day life but not implemented already
    I hate to post school stuff here it here but I am out of options, we are looking for ML applications for daily usage for our senior project but I can't seem to find an idea. For example one of the ideas our professor gave was you take a picture of a fridge and the application gives recommendations/recipes on what to cook looking at the ingredients using ML/computer vision. This has been done before but not overdone so this was okay. The professor asks for something similar (not too easy since it is a senior project) but fun, it should help daily life, it should help lazy people and it should be interesting to everyone that non-engineer people should look up to this app and get excited because it'll be somehow interesting to them or help them with one of their daily tasks. Also it should not be overdone, more of a unique thing even if it is not fully unique. I talked with random people on the streets, with my colleagues but can't seem to come up with an idea. I don't know how ethical is this but I feel like out of options, do you have any ideas like these? Or can we at least brainstorm? Do you have any problems daily-life that an application (mobile/web) could help submitted by /u/whydontigetbetter01 [link] [comments]  ( 92 min )
    [R] NVIDIA Merlin Recommender Systems + Transformer texts embeddings
    Hey guys, has anyone seen an example of NVTabular feature engineering workflow including text feature processing with a custom Transformer model like BERT, RoBERTa, etc? They seem to focus on tabular data, but sometimes the core signal for a recommender system is just the title of an item.The other option is forming a feature vector manually but the model is defined from the data schema. Would be grateful for help! submitted by /u/Great_Produce_2800 [link] [comments]  ( 88 min )
    [P] Enhancing local detail and cohesion by mosaicing with stable diffusion Gradio Web UI
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 91 min )
    [P] NVIDIA A6000s from $0.42/h
    Hello! I’m Jonathan from TensorDock. We've been working on a marketplace for GPU virtual machines. Essentially, independent hosts from around the world run our software on their bare metal servers, and then clients can provision virtual machines. These are virtual machines, not Docker containers — those are also in the pipeline for Q1 next year. Given our lower costs, I think you guys would find this as a nice alternative to other clouds for you to train your ML models. 2080 Ti: From $0.12/hr 3080: From $0.17/hr 3090: From $0.27/hr A6000s: From $0.42/hr ​ Available machines: https://marketplace.tensordock.com/order_list Product page: https://www.tensordock.com/product-marketplace ​ All our current hosts are vetted. Many run from their own basements/offices, though we also have a few data center machines. Keep in mind these servers aren't ours — they're hosted by independent hosts. We're really looking to target early-stage startups, researchers, and students whose #1 priority is cost, not security. If you're interested in the cheapest servers, hopefully this interests you! If you need better security, we also have a secure Core Cloud product, which I showcased here a few months ago :) If you have extra GPUs lying around, you can also apply to become a host here and make 2-3x what mining used to make when it still existed 😂 ​ Happy to give some starting credits if you email me at jonathan [at] tensordock.com. This product is still very much in development, so expect a few bugs here and there — if you could email those to us, we'll implement those very quickly. ​ I'm here to answer your questions, so post them below! submitted by /u/jonathan-lei [link] [comments]  ( 97 min )
    [D] How to find or create a dataset of modern comicbook-style characters/panels?
    Like Danbooru but western. I don't even need descriptions or tags, just images of scenes that feature characters in, say, like a modern comicbook style (not interested in anime nor 60's comicbooks style). This is probably a really simple task (although IMO not simple enough for the Simple Questions Thread, but feel free to remove this post if it is) but I've never built a decent dataset before and I might need some guidance on this. Does a dataset like that exist? Because I've searched a lot and haven't found any. If it doesn't exist, which would be the best method to create such a dataset myself? Any databases in which I can find images of that style? I can't find any. How can I find out how to find those databases? What I've tried: Character faces datasets exist, but what I want are scene stills that feature some kind of context like animation screenshots or modern comicbook panels would. And panels, not whole pages. I've built a web scraper that scrapes based on a Google search and filters the results with CLIP, but the results lose too much accuracy after a few pages to the point that too few pass through the CLIP filter. And I'm pretty sure that a Google scraper is not ideal but I couldn't find any specific art websites/databases to scrape instead. Besides, most popular art websites have protection against scrapers and I definetly wouldn't try to bypass any of that. I guess I could find whole comicbook pages databases but I got no clue of how I'd extract just the panels out of the pages. Besides, it will be too full of superheroes with fancy clothes and that's not what I want (I'm interested in just normal human characters. Still could clean that up with CLIP though.) I don't know where to start. How would I look for the database or dataset? What is the best way to approach this? What is the best way to find out what the best way to approach this is? Any ideas? Thank you very much for your time to read this. submitted by /u/No_Application_5581 [link] [comments]  ( 90 min )
  • Open

    OpenAI Whisper Webservice API
    Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. For more details: https://github.com/ahmetoner/whisper-asr-webservice submitted by /u/fuzulis [link] [comments]  ( 87 min )
    Is there any AI machine that wants to argue about, say, sleepwalking?
    Bring it on. submitted by /u/ShortBusRide [link] [comments]  ( 91 min )
    Anyone need dalle 2 access
    I got some accounts linked to dalle 2 if you need one just let me know! submitted by /u/Designer-Career6211 [link] [comments]  ( 87 min )
    Can AI generate entire Python applications?
    submitted by /u/SupPandaHugger [link] [comments]  ( 87 min )
    How to prepare children for the future
    I have a 5 year old who just started kindergarten this year and I want to make sure I prepare him for a changing world. What are other parents doing to help our kids become adaptable for the future world. What skill sets are most important? submitted by /u/callidoradesigns [link] [comments]  ( 87 min )
    I'm Afraid of Americans (cover of David Bowie) - AI-Generated Video
    https://youtu.be/ptsxyD8e4CA submitted by /u/Enuminous [link] [comments]  ( 86 min )
    Free skill tree for learning Deep Reinforcement Learning. Goes up to DeepMind's DQN algorithm. Get a path to your goal, track progress, and get explanations for each concept!
    submitted by /u/Ok-Craft-9908 [link] [comments]  ( 87 min )
    Something similar to OpenAI playground for text creation?
    Hello! I'm wondering if there is something similar to OpenAI to create texts such as stories, pitches, and rhymes but open source and that can be run locally. Is there such a thing? Thanks! submitted by /u/Momkiller781 [link] [comments]  ( 87 min )
    Auto generate subtitle from video based on Whisper
    submitted by /u/tomchang25 [link] [comments]  ( 87 min )
    A general clinical case conversation between Me and Open AI.
    Person:- a patient is having bleeding sputum and 100F fever with at least 14 days, and he is also having severe cough and don't want to eat food properly. what test should he go through for further diagnosis. AI:- The patient should go through a chest x-ray and a sputum culture. Person:- What could be the possible disease ? AI:- The possible disease could be tuberculosis. Person:- How can u say that? and what medication should we give it to him. AI:-The medication that should be given to the patient is isoniazid. Person:- and what about rifampicin. AI:- Rifampicin should also be given to the patient. Person:- Why didn't u said this earlier? AI:- I didn't say it earlier because I wanted to make sure that the patient went through the chest x-ray and sputum culture first. Person:-In what…  ( 93 min )
    Deepmind's new chatbot Sparrow is "more helpful, correct, and harmless"
    submitted by /u/henlo_there_fren [link] [comments]  ( 86 min )
    AI Dev service / resources
    I have an idea for an AI application that I feel could be fun / popular. I know very little about AI (starting Software Eng degree this year). Does anyone have any resources to learn where to start developing or any services where I could share my idea but keep it protected? submitted by /u/Fktifano [link] [comments]  ( 92 min )
    AI that can turn 2d images into 3d?
    I found this website clipart.co/relight and saw how cool of an art reference tool it was. That made me wonder. Is there website with an AI where if you input a 2d images, it will create a 3d images that you can explore at all angles? submitted by /u/LiliaAmazing [link] [comments]  ( 87 min )
  • Open

    Katakana, Hiragana, and Unicode
    I figured out something that I wasn’t able to find by searching, so I’m posting it here in case other people have the same question and the same difficulty finding an answer. I’m sure other people have written about this, but I couldn’t find it. Maybe lots of people have written about this in Japanese […] Katakana, Hiragana, and Unicode first appeared on John D. Cook.  ( 6 min )
  • Open

    What does the Advantage function signify in Dueling Deep Q Networks?
    Can someone please help with this - https://stats.stackexchange.com/questions/590025/what-does-the-advantage-function-signify-in-dueling-deep-q-networks ​ Update - After watching this video, I think I am getting the hang of it. By splitting Q into Value and Advantage, we only need to focus on states that have high value function estimates. Therefore once we know that a state has high value estimate, we can spend time computing the advantage of each of its actions. This'll save a lot of time and rapidly speed up training. ​ Please let me know whether my above explanation made sense and is correct How would a neural network know to focus on states that high value and not on states with low value? I still don't understand the math of Q = V + A submitted by /u/Academic-Rent7800 [link] [comments]  ( 90 min )
    Towards Grand Unification Theory of AI (GUT-AI)
    Concurrently with "JEPA", I wrote a very similar (pre-print) paper on Grand Unification Theory of AI (GUT-AI), which is a kind of superset of JEPA, if anyone is interested. I actually made the effort to abstract away complicated mathematics, so that the reader finds it easier to understand, since the quest of AI is a multidisciplinary approach. In my view, I also made better connections to nature (Embedded and Grounded Cognition), among others. I have published it on OSF and since it as a pre-print, I welcome feedback either there or here. Thanks. Paper: https://doi.org/10.31219/osf.io/sjrkh PS. I also made some Github repositories (CC0 1.0 license) expanding the paper and bridging the gap towards practical implementation. https://github.com/GUT-AI/gut-ai submitted by /u/kourouklides [link] [comments]  ( 89 min )
  • Open

    Understanding reality through algorithms
    Neuroscience PhD student Fernanda De La Torre uses complex algorithms to investigate philosophical questions about perception and reality.  ( 8 min )

  • Open

    [D] State of the art techniques to create good understandings of embedding spaces
    I have a model that turns image => embeddings. I want to have a very good understanding of my embedding space. What are techniques to understand the embedding space better? I know there's basics, like PCA, and t-SNE, but are there new research papers that talk about how to do this better? Maybe it's possible to learn a model that better understands the embedding space? submitted by /u/vanilla-acc [link] [comments]  ( 88 min )
    [P] SuperVisual a Screen Recording + video analytics stack (CLIP Visual Search + Object Detection) running entirely inside browser JavaScript using TensorFlowJS and ONNX.
    submitted by /u/SuperVisualApp [link] [comments]  ( 88 min )
    [R] State-of-the-art voice cloning
    I have tens of hours of recordings with my voice and I want to train (from scratch or transfer learning) a TTS model with my voice. I tried to figure out the what the state-of-the-art at this particular task is, but I don't find any benchmarks. ​ Do you know what the sota for this task is in Sept 2022? submitted by /u/the_javi_himself [link] [comments]  ( 88 min )
    [D] How to learn a boolean outcome on an n-dimensional numerical train- and test-dataset?
    I'm quite new to machine learning, so I may be in the wrong place or ask a stupid question. I'm trying to create a simple ML prediction model, but I have no idea where to start or what to Google. Whatever I search on Google usually results in something like image/video predictions or learning numerical outcomes. I have a training dataset like the following: PRICE, AMOUNT_ORDERS, ..., IS_FRAUD 40.45, 15 , ..., 0 12.43, 2 , ..., 0 98.09, 1 , ..., 1 ... , ... , ..., ... It contains some dimensions of numerical data (like price, number of orders, etc.) and a column indication of whether it was a fraudulent transaction (0 or 1). I have a test dataset with the same columns, except for the last one (the boolean IS_FRAUD). Based on learnings from my training set, I would like to predict IS_FRAUD. Since this is a relatively straightforward ML problem, I imagine there should be a library that I just feel the n-dimensional numerical training set, and it automatically constructs a model, with no further effort needed. But I have no idea how to approach something like that. Is there a library, Python/Java or otherwise, that supports such a feature? Or would more advanced training methods be required? submitted by /u/simonbaars [link] [comments]  ( 105 min )
    [D] Is a GPT-J successor in the works?
    Is a new open source model GPT-3-style in the works? A sort of GPT-J successor, with more parameters. If nothing is know, how likely is for the model to be released in the next year? The success of Stable Diffusion suggests that open source biz models may makes sense, so ... pretty likely? What do you think? submitted by /u/lorepieri [link] [comments]  ( 89 min )
    [D] Best ways to perform feature learning on time series data
    The tutorials for this topic are really few . Would love for the community to share some github repo links and some useful scripts. Been stuck on this for a while , auto-encoders are too time consuming for feature learning. Trying to explore some other options that beats or comes close to auto-encoders in-terms of finding non-linear relationships in the data. Any form of help is deeply appreciated . submitted by /u/Zalkwalker [link] [comments]  ( 88 min )
    [D] How to generate structured parameters from a spectrogram?
    Say I have an algorithm that accepts as input structured parameters of the following format, generates an audio clip and then a 512x512 spectrogram out of it: [ param1 = numeric_value, param2 = numeric_value, ..., param100 = numeric_value ] How can I do the opposite? That is, provide a 512x512 spectrogram and get a set of random candidate parameter values that would yield a similar but random spectrogram if fed into the algorithm? In terms of text-to-image models, I see this as the opposite problem. Instead of using a prompt to generate a random matching image, I would like to use an image to obtain a random matching "prompt" that's not natural language (i.e. structured and numeric). Regarding the algorithm, we can assume that the amount of changes in a resulting spectrogram is proportional to the amount of changes in the parameter values. That is, close to no parameter changes will yield a very similar spectrogram than the previous, making training somewhat possible. The algorithm is also deterministic and will always produce the same output for a given input. Is this possible? GANs seemed to be a nice architecture for this knowing that I can generate as many "real" training data as I want using the algorithm. The generator would generate a random list of structured parameters from a spectrogram, whereas the discriminator would check whether the parameter list is real or fake (i.e. coming from my training set or the generator). In practice though, I'm not sure how I would implement any of this knowing that GANs are usually not used that way (they usually produce images, not the other way around). There might also be a better architecture for this use case that I'm not aware of (e.g. latent space encoder). Any help would be appreciated. Thanks! submitted by /u/Golitan11 [link] [comments]  ( 91 min )
    [P][R] Whisper, a general-purpose speech recognition model by OpenAI with Gradio Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [P] Speed Up Stable Diffusion by ~50% Using Flash Attention
    We got close to 50% speedup on A6000 by replacing most of cross attention operations in the U-Net with flash attention Annotated Implementation: https://nn.labml.ai/diffusion/stable_diffusion/model/unet_attention.html#section-45 Github: https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/diffusion/stable_diffusion/model/unet_attention.py#L192 We used this to speed up our stable diffusion playground: promptart.labml.ai submitted by /u/hnipun [link] [comments]  ( 88 min )
    [D] Neural network for multivariate time series with labels
    Hello, let's say I have a dataset where, at each day I have some sort of event with some details (not only numeric data). For example: 2 Jan 2022 - Event: Payment, Channel: POS, Industry: 4939 (recreational and sports equipment rental activities), Amount: 200, Currency: EUR 3 Jan 2022 - Event: Received Email, Campaign: 3RDW52UDW3, Purpose: Customer Care, Reason: Premium Client, Email Code: 42353tf4 7 Jan 2022 - Event: Money Transfer, Channel: Online, Type: "Salary", Amount: 3000, Currency: EUR 14 Jan 2022 - Event: Disable Notifications, Channel: Mobile app Is there a neural network model where I can input somehow most of the data? I've discovered TST (time series transformer https://timeseriesai.github.io/tsai/models.TST.html) which kinda does what I need, but it removed the text information that I might need. I have to embed beforehand the labels such as "Customer Care", "Premium Client", "POS" into some numeric values. My intuition is that if a model learns the embeddings for the labels, it will understand How to order them in terms of impact, importance etc How to associate them with events. ("POS" cannot appear in an event of type "Received Email") However, I did not find any multivariate time series dealing with many (couple of hundreds) different labels associated with the events. Do you have any ideas of such a model or how it should be implemented? P.S: I know I can split transactional data from campaign data etc and have multiple simpler models, but let's assume I can't / don't want that, I want to have a single model for this as I have enough data to learn from and enough processing power to train a big model submitted by /u/adenml [link] [comments]  ( 91 min )
    [D] Is a 3 years bsc hons “computer science” with a 1 year Msc “artificial intelligence” enough to be called and also find a job of a ML/AI engineer ?
    Here’s the masters program if anyone was curious https://www.mitropolitiko.edu.gr/en/programmes-of-study/faculty-of-computing/msc-artificial-intelligence/ Thanks submitted by /u/coldcoldcoldcoldasic [link] [comments]  ( 88 min )
    [R] META researchers generate realistic renders from unseen views of any human captured from a single-view RGB-D camera
    submitted by /u/SpatialComputing [link] [comments]  ( 90 min )
    [R] Mega: Moving Average Equipped Gated Attention. By using LSTM-style gates, Mega outperforms Transformer and S4 over Long Range Area, NMT, ImageNet, Wikitext-103 and raw speech classification.
    submitted by /u/hardmaru [link] [comments]  ( 103 min )
    [R] GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images
    submitted by /u/utopiah [link] [comments]  ( 88 min )
    [D] Who plans and makes preliminary designs for any new ML project in your organization?
    View Poll submitted by /u/aadoop6 [link] [comments]  ( 89 min )
    [D] Prediction of concurrent, not future, steps of multivariate time series after simulated perturbation?
    Most MTS methods are focused on predicting the future steps of MTS, or classifying MTS. I've been unable to find anything where concurrent steps are predicted. For example, artificially perturb a ground truth MTS by zeroing out a single signal and predicting the rest of the signals change from ground truth in the MTS. Has anyone seen anything like this? submitted by /u/desmin88 [link] [comments]  ( 88 min )
  • Open

    How do I create my own Neural Net for a Vanilla GAN?
    I'm trying to study how to use GANs by studying the Vanilla GAN but on the resources I use is quite recurrent using only one perceptron as they try to simplify the models for educational purposes. I want to build a more complex net to the homework I'm doing so I want to find out a resource on how to build my own neural network to put on my Vanilla GAN model. Any of you know a resource to learn about that? submitted by /u/RepulsiveFisherman87 [link] [comments]  ( 87 min )
    By any means necessary
    submitted by /u/SidewaysMeta [link] [comments]  ( 87 min )
    Stable Diffusion AUTOMATIC1111 Every Settings Explained
    submitted by /u/PuppetHere [link] [comments]  ( 93 min )
    Can I train my own Ai using replika?
    Hi, so I got this idea today. I'm not the best with social skills and I figured out that when I'm texting people and I'm completely blank about to what respond, maybe I could use an AI that can based on the conversation generate a response that then I could use to respond to the other person. I found myself quite interested in the Replika API and wondered if I can use it to train my AI. And how? Thank you so much in advance... BTW I already went into google and youtube and found nothing more than videos of people overreacting their experience with the app submitted by /u/IurmamaI [link] [comments]  ( 87 min )
    Machine Learning and AI Stack (Mostly Resources)
    submitted by /u/skj8 [link] [comments]  ( 86 min )
    Finally. Shrimp on the Barbie
    submitted by /u/Geegoriel9 [link] [comments]  ( 86 min )
    Salesforce AI Open-Sources ‘LAVIS,’ A Deep Learning Library For Language-Vision Research/Applications
    Recent years have seen remarkable development in the creation of sophisticated language-vision models. Real-world applications rely heavily on multimodal material, particularly language-vision data, which includes texts, photos, and videos. However, domain knowledge is required for training and evaluating these models across tasks and datasets, and they are not necessarily open to new researchers and practitioners. This is primarily because preparing the necessary experiment setup is a lot of work and is time-consuming regardless of the model, dataset, or task evaluation being used. Salesforce researchers have developed LAVIS (short for LAnguage-VISion), an open-source library for training and evaluating state-of-the-art language-vision models on a rich family of common tasks and datasets and for off-the-shelf inference on customized language-vision data. This will make the emerging language-vision intelligence and capabilities available to a wider audience, encourage practical adoption, and reduce repetitive efforts in future development. Continue reading | Check out the paper, github link submitted by /u/ai-lover [link] [comments]  ( 88 min )
    Is there an online tool to generate AI output based on a supplied text file?
    I want to be able to upload a text file, train an AI on it, and it gives an output based on the text file I gave. Is there any website/ app (I am on a mac) I can use to do this? submitted by /u/Hello_I_Am_Here_Now [link] [comments]  ( 87 min )
    I am looking for database.
    Hello. I am planning to build attractiveness meter based on simple perceptron neural network. I am looking for images man and woman that will be marked with attractiveness from 0 to 10. This will be project for a competition. submitted by /u/skorakora [link] [comments]  ( 92 min )
    Fractured Beauty
    submitted by /u/widgia [link] [comments]  ( 93 min )
    So Deepfake Audiobooks Are a Thing Now - "I secretly deepfaked an ENTIRE audiobook" (JOLLY, 9min)
    submitted by /u/arisbe__ [link] [comments]  ( 87 min )
    Newest AI From World AI Conference In China
    submitted by /u/kenickh [link] [comments]  ( 86 min )
    I Created a GUI for OpenAI's Whisper Using Gradio
    submitted by /u/ImplodingCoding [link] [comments]  ( 88 min )
    i made a powerfull neural network from scrach that you can play with from your browser
    You can set the shape, training input / output and test what it learned right away. You can enter the values in the same way I've done (note that you have to press enter in each input to confirm the value that you have entered). https://preview.redd.it/plqy839b1tp91.png?width=538&format=png&auto=webp&s=e72cdf705c2e77a672bb325618170c4cc61a83a6 Then you can also visualize how the network looks https://preview.redd.it/n4xc06aj1tp91.png?width=1195&format=png&auto=webp&s=0aeb5e4c7638d0e6001424d6928bc99d6025f111 In this example the rule was: if the input is 1 the first output should be 1 if the input is 2 the second output should be 1 if the input is 3 the third output should be 1 here is some examples, of what it can learn (download the json file, and on the website upload the desired example to see it in action): https://github.com/Thiago099/neural-network/blob/master/nn%20examples/examples.zip Ive fixed some issues with at least the neural networks I've encountered: Before training the input is available to the last layer, because the cost is calculated from the last layer, and if there is only random noise getting in the last layer there will be no easy parameter to train in training you dont modify more than one node at time because the cost was calculated with the previus value of the other nodes wich means that if you change all at once the cost will be wrong 0 is a problematic input, so my inputs go from 1 to 2 Note that this was only my impression on the subject please correct me if im wrong. link: https://thiago099.github.io/neural-network/ source code: https://github.com/Thiago099/neural-network submitted by /u/Small-Ad-1694 [link] [comments]  ( 88 min )
    Linear Least Squared Regression | Machine Learning Foundations
    submitted by /u/mr-minion [link] [comments]  ( 87 min )
    AI Dream 72 - Abstract 3D Art Exploration
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    Woman Horrified To Discover Her Private Medical Photos Were Being Used To Train AI
    submitted by /u/estasfuera [link] [comments]  ( 92 min )
    Is there an AI like gpt-3 I can talk to for free without signing up or putting my credit card info?
    I'm not talking about those crappy one sentence answer chatbots, but actually an AI with deep think. How do I get something similar or as good as for free? submitted by /u/PilkoidPilkers [link] [comments]  ( 87 min )
    What are good AIs for face editing?
    I just want to find out, what AI is able to do with some pictures from my face. submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 87 min )
    NVIDIA get3D to generated 3D models with AI
    submitted by /u/Dazzling_Swordfish14 [link] [comments]  ( 87 min )
    Old city, made by dawn ai
    submitted by /u/GroundbreakingLaw878 [link] [comments]  ( 94 min )
    Is AI an existential threat to humanity?
    submitted by /u/wisereputationmkr [link] [comments]  ( 89 min )
    Spinning the Fantastic Angel By the Wings
    Spinning the Fantastic Angel By the Wings Disco Diffusion and Interpolation combined for a nice lighthouse morph video, set to music. submitted by /u/Enuminous [link] [comments]  ( 87 min )
  • Open

    Machine Learning Resources
    submitted by /u/skj8 [link] [comments]  ( 91 min )
    Linear Least Squared Regression visually explained
    submitted by /u/mr-minion [link] [comments]  ( 94 min )
    GAN training, train generator/discriminator once or multiple times(epochs) per loop(big epoch I guess)
    So I am trying to figure out what is better here, trying to find some balance. I'm training a super resolution GAN and cannot find what people usually use. For example, I do 400 big epochs which consist of 40 Gen epochs and 40 Disc epochs. which would be a total of 400 * (40 + 40) epochs. Any suggestions about this? This is kind of my biggest question with this. submitted by /u/hides_his [link] [comments]  ( 87 min )
    Newest AI From World AI Conference In China
    submitted by /u/kenickh [link] [comments]  ( 92 min )
  • Open

    I wanna do research in RL....1) What is roadmap for it 2) Do i need to know the whole maths derivation of Supervised Unsupervised and Deep Learning Algorithm 3)How can i do research in RL as a undergraduate in an non research university....
    I was asking Q3 because research experience is require or the best way to get into master in top universities... submitted by /u/Emotional-Fox-4285 [link] [comments]  ( 90 min )
    Reduction of varied length time series to fixed size
    Hey guys, I'm training an RL algorithm for poker. I'd like to include, who bet which amount and in which order over the entire hand, in the state as i believe this will give better results. My problem is that this betting history can have varying length depending on if players simply check immediately or take turns betting. I'm not able to feed this varied size betting history to my NN so i have to encode it in some way. My current idea is to train an RNN or LSTM and use their latent states as a fixed size representation which i can feed to my NN and penalize with the same loss as my policy NN. Obviously there's a problem with penalizing a correct representation but wrong action. Do you have any idea of how well this will work? What other options do I have about reducing the dimensionality? Thanks. submitted by /u/Dragonrooster [link] [comments]  ( 88 min )
    PPO diverges and degrades solution suddenly on low std.
    Greetings guys! I am implementing an RL algorithm using PPO, having std initiallized at 1.0 for exploration and linearly decrease it to 0.01. Entropy coeff = 0, so I don't use entropy. I use ppo clip 0.2 and learning rate 0.001 for both critic and actor. 100 neurons hidden layer for both architectures (critic, actor) state dimension = 5 action dimension = 1 My problem is that even though everything goes smoothly during training, (let's say 2500 epochs) At epoch ~2450 aprox. I'm starting to have erratic behaviours, degrading results and policy for no exact reason and I really don't know what's up, maybe a really low std issue? I want to conclude to results that show convergence, after max training steps with minimum std I also degrade learning rate exponentially to reach convergence, (atleast when evaluating) If you have any idea of what is causing this or you had similar problems in the past, I would really appreciate it if you could share your thoughts on this! (I should mention that this implementation is super custom, the whole simulation is handmade and problems occuring may not be a problem of ppo itself, although in practice you may spot something off in my ppo specs.) Thanks. submitted by /u/White_Sirilo [link] [comments]  ( 88 min )
    "Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning", Anonymous et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Does anyone know of any website where I can find papers by topic, conference, year, and citations
    submitted by /u/obsoletelearner [link] [comments]  ( 88 min )
  • Open

    DSC Weekly 27 Sept 2022 – Corpus Wars
    n many respects, we are facing not the need for a new form of money but rather a new form of economics - a discipline about the world where scarcity still holds in physical materials but where overabundance is the rule in virtual ones. To me, this is one of the key tenets that need to be hammered out in the metaverse: How do the actual creators of the virtual worlds, and not just the hosts, get paid for their work? The post DSC Weekly 27 Sept 2022 – Corpus Wars appeared first on Data Science Central.  ( 22 min )
    9 Ways IT Can Do Proactive Cybersecurity
    By taking a proactive approach to cybersecurity, IT departments can help protect their organizations from the ever-growing number of cyber attacks. Here are nine ways IT departments can do proactive cybersecurity. The post 9 Ways IT Can Do Proactive Cybersecurity appeared first on Data Science Central.  ( 22 min )
  • Open

    Room squares and Tournaments
    A Room square is a variation on a Latin square. Room squares are named after Thomas Room, though there is an application to rooms as in compartments of a building that we’ll discuss below. In a Latin square of size n you have to assign one of n symbols to each cell so that each […] Room squares and Tournaments first appeared on John D. Cook.  ( 5 min )
  • Open

    How I prepared for DeepMind and Google AI research internship interviews in 2019
    In 2019, I interviewed for research internships at DeepMind and Google AI. I have been asked repeatedly about my preparation for and experience with these interviews. As internship applications at DeepMind have been opened recently, I thought it could be valuable to summarize my experience and recommendations in this article. The post How I prepared for DeepMind and Google AI research internship interviews in 2019 appeared first on David Stutz.  ( 12 min )

  • Open

    Ai art is truly amazing-dmts ego death collection started
    submitted by /u/Inner_Mongoose_185 [link] [comments]  ( 86 min )
    Penn State Researchers Propose ‘ESFPNet,’ An Effective Deep Learning Network for Real-Time Lesion Segmentation in Autofluorescence Bronchoscopic Video
    The leading cancer mortality globally is Lung Cancer. A key objective for increasing lung cancer survival is discovering the illness early, allowing for the most effective treatment choices. Lung cancer develops from lesions in the bronchial epithelium of the lung mucosa. These bronchial lesions can progress to squamous cell lung cancer and assist in forecasting other lung cancers’ development. As a result, approaches for early diagnosis of bronchial lesions are critical for improving lung cancer patient treatment. Using bronchoscopy to image the airway epithelium during a regular airway exam is a noninvasive technique for clinicians to look for such lesions. Autofluorescence bronchoscopy is one of the most sensitive advanced bronchoscopic video procedures available today. It can efficiently distinguish growing bronchial lesions from the normal epithelium. Unfortunately, the current standard requires human inspection of an incoming AFB video stream, which is time-consuming and error-prone Continue reading| Check out the paper and github link submitted by /u/ai-lover [link] [comments]  ( 93 min )
    High fashion campaigns with A.I.
    submitted by /u/Straight_Soil_747 [link] [comments]  ( 85 min )
    What's so hard about AI?
    Put another way; why is it so difficult to specifically describe how the human mind works? For context, I've worked in AI/ML research for only a couple years but have many years of coding experience. Very frequently I find myself in situations like this: xkcd.com/1425/ (except I'm both people in the convo) Specifically, my interests are problem solving, question answering, and reasoning. So, what's so hard about problem solving? We have a working model of intelligence literally in our heads already. We can just watch ourselves do things the same way someone trying to figure out flight would watch a bird. We also have plenty of solutions for some parts of problem solving (e.g. symbolic manipulation and statistical modeling). So why is it so hard to fill in the gaps and get these things to work together? I can explain to someone why arithmetic is hard, or why running a marathon is hard, or why refactoring messy code is hard. But when I try to explain to someone why AI is hard I start rambling like a crazy person. submitted by /u/bornofthebeach [link] [comments]  ( 90 min )
    Best AI for story generator?
    submitted by /u/DauEfect [link] [comments]  ( 86 min )
    I filmed my dance and modified it with AI. The result pleasantly surprised me! What do you think?
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    Dance of the Burning Idiots
    Stable Diffusion Animation Music Video Dance of the Burning Idiots ​ https://preview.redd.it/d5bs6rwx5np91.png?width=4598&format=png&auto=webp&s=4dff6ab75af9dc7e43d80082776be2a4699ba2fb submitted by /u/Enuminous [link] [comments]  ( 86 min )
    Meta uses AI for full-body tracking based on sparse motion data
    submitted by /u/much_successes [link] [comments]  ( 91 min )
    Artificial Generated Art images of couples in a Romantic Wonderland 💞! + Motivational Quotes!
    submitted by /u/OceanicFeel [link] [comments]  ( 87 min )
    Generative AI: A Creative New World
    submitted by /u/estasfuera [link] [comments]  ( 91 min )
    White House Concept
    submitted by /u/widgia [link] [comments]  ( 91 min )
    I made a music video using 'AI Technology'. It's taken me SO MANY HOURS to complete but I think it's pretty rad how it's turned out. What do you guys think?
    submitted by /u/6Witchy9 [link] [comments]  ( 88 min )
    Dalle but for sound?
    I wonder if there is or a working in progress A.I that's similar in the aspect of word input to Dallle but it's for sounds. Like if you typed "metal sheet falling" it'd create a sound like that. So is there? submitted by /u/typcalthowawayacount [link] [comments]  ( 88 min )
    I’m looking for someone to come on my podcast and talk about AI. DM me if you are interested.
    submitted by /u/Money_Push [link] [comments]  ( 87 min )
    Is there an AI that can make new episodes of a show? Or a whole new show when you input different stories into it?
    I've seen all this news about ai making art by inputing images and waiting until it makes a whole new images from those images. And that made me wonder if the same can be done with stories and tv shows. Have these ai been created yet? Are there ai that can make whole new episodes of shows? Or ai where you input different stories and it churns out a whole new story or show from the bits and pieces of those inputs? submitted by /u/LiliaAmazing [link] [comments]  ( 92 min )
    Is there conversational ai that not just mocking conversation
    I heard that artificial intelligences like gpt-3 just generate plausible words So is there any ai that actually can talk submitted by /u/Leather_Parfait_9450 [link] [comments]  ( 87 min )
    What's the best option if I don't have a computer with graphic card
    I am a foreign student, doing my master in Germany. I have a laptop with me, with no graphic card. In the next two semesters, I want to do AI in my master thesis. But if I have larger dataset, it takes forever to run the simulation or train a model. Here are some options that I have considered Renting a cloud computing GPU. But if I rent for a year, the money I will spend is enough for me to buy a new computer such as Macbook pro M1. Assemble a desktop computer. Which is not a good option for me because I will go back to my country after I finish my master. External graphics card for laptop. It's too expensive for me so I don't think it's a good idea. I will rather use this money to buy a good computer. It seems the best way to train a model or run the simulation is desktop computer. But unfortunately after a year I will go back to my hometown. What is the best option for me to run the simulation or train models? Any comments are appreciated. Thanks submitted by /u/akuan10 [link] [comments]  ( 88 min )
    How to make a Stable Diffusion Video Part 2 Strength settings Avoid nois...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 92 min )
    Ian McConnell - Adult BUT Lyrics Are Illustrated by AI
    submitted by /u/Swisheater [link] [comments]  ( 87 min )
  • Open

    [P] Question answering/text generation about images (also works on images with text in them)
    https://text-generator.io now analyses not just linked images but also any text in them so you can analyse receipts/documents/screenshots etc. Example: https://text-generator.io/playground?text=Checkout+this+reciept+https%3A%2F%2Fstatic.text-generator.io%2Fstatic%2Fimg%2Fcomputer-invoice.png+%0ATotal+Price%3A+&stop_sequences=&number_of_results=1&max_length=100&max_sentences=1&min_probability=0&top_p=0.9&top_k=40&temperature=0.6&repetition_penalty=1&seed=0 Will be a blog coming soon about it :) submitted by /u/leepenkman [link] [comments]  ( 88 min )
    [P] How to fine tune model to support classifying similar sentences into two classes (e.g., requests and offers)?
    Hi all, I'm interested in creating an app of some sort - maybe a bot - that allows people to use chat groups as an intelligent marketplace. For example, people making the following statements separately would be matched: "I'm looking for a four poster bed. Has anyone seen one?" "I'm clearing out my parents' house and they have a nice four poster bed. Would anyone be interested?" To my mind, it's possible to do this by fine-tuning against a BERT based model with something like "offers" and "requests" classifiers. I'm not particularly well-versed in this, however, so I'm not sure if that's the ideal way or if there's another approach one could use. Any suggestions? Additionally, I'm unclear on how many offers and requests I'd need to generate to achieve high accuracy during fine-tuning. Obviously, the more the better but I'm one individual and generating 1,000 versions of each would be a little daunting. Thanks for your help. submitted by /u/UnifiedEntity [link] [comments]  ( 89 min )
    [P] Try out OpenAI Whisper without needing Code
    Our team at Shipyard has been creating a lot of solution videos on YouTube recently to show teams how they can build A -> B solutions in a few minutes. We've been meaning to provide captions or transcripts for the backlog, but the overhead was either pretty high or too expensive. We were able to test out Whisper in the span of a few hours and got a solution up and running to download the video from Youtube, spit out the transcription and upload the resulting transcription file externally. We're still missing a piece to upload directly to YouTube, but it's a start! Here's a video of the how we accomplished this! As a part of this process, we set up a low-code template so anyone could try out Whisper without needing to code (although you can definitely still run Python on our platform as well). Functionality is a bit limited at the moment, but we're looking to expand it in the future. Hope some folks find this useful! submitted by /u/BlakeBurch [link] [comments]  ( 102 min )
    [D] If I have to choose between a RTX 3090 24GB and a RTX 4090 RTX for Stable Diffusion, MidJourney and other AI art engines that exist possibly in the future... is the RTX 4090 going to be THAT much greater and worth buying?
    Any pros and cons? (besides price, because I know the 3090 is much cheaper). I play games also, but honestly both these cards would blow the socks of my current card. (GTX 1650), I'm mostly buying for the AI/ML generation stuff. Thank you submitted by /u/cleverestx [link] [comments]  ( 90 min )
    [P] Core ML Study Group
    Hello everyone! I'm looking to create a tight, kickass, dedicated group of 3-4 people who are studying ML/CV and misc. Humans work well in motivated tribes: it's easy to feed off each other's energy. If nothing else, when done right, we learn just to feel accepted. 📃 About Me: 2021 CS Undergrad, self-taught ML/CV through online courses. Hustled for FT/research work in ML/CV for ~2y. Currently work as an MLE. Looking back, I faked-it-till-I-made-it and it's all superficial. Average Math aptitude, average ML knowledge, helluva imposter syndrome. 🎯 Target Topics: Anything a Data Scientist/ML Engineer/Applied Scientist may want: Books/courses on Probability, Stats. We study, do the Math, and teach other. Yes, we drill down to the most basic topics such as Maximum Likelihood Estimation. I have suggestions. Theoretical ML, CV basics, DL architectures. Yes, we learn about architectures but we also implement basic backprop. Interviewing isn't a cakewalk. Practice, practice, practice. Implement PapersWithCode, old Kaggle competitions with solutions. 👨‍🎓 Target Audience: Someone like me. You currently work or study in the field, you "know" theoretical ML, CV/NLP, you prepare for interviews but with the nagging thought that it's just superficial cramming. You like asking dumb questions. Nothing furthers learning more than a group of people asking dumb questions that you're otherwise scared to ask elsewhere. You wanna make big bucks. No beginners, please. No geniuses either. Just plain Joes. Interested folks can comment below or DM me! Any suggestions or thoughts are always welcome. Please note, I am super serious about this. submitted by /u/Remarkable-Brother [link] [comments]  ( 91 min )
    [N] 1.5M Prize for Arguing for or Against AI Risk and AI Timelines
    The Future Fund is a philanthropy planning to spend money on making AI safe and beneficial, but "we think it’s really possible that we’re wrong!" To encourage people to debate the issues and improve their understanding, they're "announcing prizes from $15k-$1.5M to change our minds" on when AI becomes advanced or whether it will pose risks. There will also be an independent panel of generalists judging submissions. Enter your arguments for or against AI risk or AI timelines by December 23! https://ftxfuturefund.org/announcing-the-future-funds-ai-worldview-prize/ submitted by /u/respectableacademic [link] [comments]  ( 104 min )
    [D] What is the common/best practice for sharing codebase in data science team?
    Hello, I wonder what is the common/best practice for sharing codebase in data science team? To elaborate, I work in a data science team on a similar theme Naturally, we use Jupyter notebook that is running on GCP, each person spawn their own instance But because they were similar theme are code duplications between each person/project Moreover, if we want to iterate on a project, the new version is just a copy of an old notebook version. So the file management become a nightmare when we also want to maintain the old version, let alone when we want to deploy them (that's MLOps job) In micro-service paradigm, mashing everything together is not a good idea (monolith), and each project should have their own bounded context. But because this is data science, I'm not sure how much it can be apply here I've read that for large tech companies like Google, Microsoft, and Meta use Mono-Repo to improve code cohesion. But doing so would make versioning nearly impractical An alternative method would be Multi-Repo, This way has the benefit of forking, if ones wish to modify the code, but this can eventually break the codebase synergy I've thought about aggregate the shared code into a single codebase, and compile them into whl for simpler project dependency, but it might be a hassle if they want to modify the codebase (monkey patch?) I've asked several of my friends, but the practice seems to be wildly difference or non-existence at all. So I am not sure what is the common/best practice. Thanks in advance submitted by /u/Wakeme-Uplater [link] [comments]  ( 105 min )
    [R] MapAI: Precision in Building Segmentation - Competition!
    I'm happy to announce that I just launched an AI competition, "MapAI: Precision-in-Building-Segmentation." In joint collaboration with NORA, CAIR, AI:hub, Norkart, The Norwegian Mapping Authority, and The Danish Agency for Data Supply and Infrastructure, we encourage you to submit your aerial image and laser data segmentation models. The submission deadline for the model is the 25th of November and the 15th of December for the paper. The data is already available for you to start making some innovative AI models. The competition is related to my Ph.D. thesis, and the contestants are asked to write a 2-page paper for their submission with their method and results. Prizes: 1200 Euro 500 Euro 300 Euro Read more details here: https://www.nora.ai/competition/mapai-precision-in-building-segmentation/index.html submitted by /u/Sjyhne [link] [comments]  ( 90 min )
    [N] Open working group to modularize ML Systems
    Just to let you know that we are preparing a new working group at MLCommons to help the community modularize ML/AI Systems and automate their benchmarking, optimization and deployment. It will be based on the MLPerf methodology and MLCommons "Collective Knowledge" automation meta-framework that was already used to automate recent MLPerf inference benchmark submissions from Qualcomm, HPE, Lenovo, Krai, DELL and OctoML. Please join the group here to provide your feedback and help with this community effort! Thank you! submitted by /u/gfursin [link] [comments]  ( 102 min )
    [D] Is there theory as to why in GANs, training the generator and discriminator intermittently proves ineffective?
    It seems it is a common intuition everyone has while building GANs, to pause training the generator to let the discriminator catch up and vice versa, hoping for convergence. But from what I've read, the consensus is that this is ineffective, which is disappointing. Is there any theoretical understanding of why something that seems this "obvious" doesn't work? Many of the sources I'm reading are from the earlier days of GANs, and I don't know if this understanding has changed in recent years. I'm fairly new to this topic so please excuse my ignorance. submitted by /u/ETerribleT [link] [comments]  ( 91 min )
    [P] What GCE Instance to get for Google Colab?
    Hi, I'm a student who is currently enrolled in a Machine Learning module in my university. For my module we (group of 6) are doing a ML project on IMDB Spoilers using Google Colab. Unfortunately, when trying to run tokenization on my dataset, the instance shut down as the RAM utilization exceeded 10 GB out of the 12 GB google provides. We didn't want to go for Colab pro as it would be We have $300 total of google credits to spare and were wondering which Google Compute Engine (GCE) Virtual Machine would be applicable for our use case. We will mainly be running NLP preprocessing, k-NN (for part of dataset), SVM, Random Forest Classifiers, and RNN and Transformers (If time and resources permit) on this dataset. https://cloud.google.com/compute/all-pricing I would appreciate any advice on what configuration of the GCE VM would be recommended as it would be really helpful to do our project. I wasn't sure as to whether we would require a GPU. submitted by /u/mrmrinal [link] [comments]  ( 89 min )
    [P] UnstableFusion - A stable diffusion frontend with inpainting, img2img, and more
    Github page: https://github.com/ahrm/UnstableFusion I was frustrated with laggy notebook stable diffusion demos. Plus they usually didn't have all the features I wanted (for example some of them only had inpainting and some only had img2img, so if I wanted both I had to repeatedly copy images between notebooks). So I made this desktop frontend which has much smoother performance than notebook alternatives and integrates image generation, inpainting and img2img into the same workflow. See a video demo here. Features include: Can run locally or connect to a google colab server Ability to erase Ability to paint custom colors into the image. It is useful both for img2img (you can sketch a rough prototype and reimagine it into something nice) and inpainting (for example, you can paint a pixel red and it forces Stable Diffusion to put something red in there) Infinite undo/redo You can import your other images into a scratch pad and paste them into main image after erasing/cropping/scaling it Increase image size (by padding with transparent empty margins) for outpainting submitted by /u/highergraphic [link] [comments]  ( 89 min )
    [P] Managing multiple models with fbprophet
    Hey guys, made a package to allow forecasting multiple dependent variables with fb prophet models a while back. Check it out if you are interested and also feel free to contribute. It's a nice and simple package if anyone wants to contribute or has ideas to make it better. I plan to work more on opensource packages, so if anyone is interested, you can ping me on twitter! https://github.com/vonum/multi-prophet https://twitter.com/vonum123 submitted by /u/vonum [link] [comments]  ( 101 min )
    [R] A Generalist Neural Algorithmic Learner
    submitted by /u/hardmaru [link] [comments]  ( 88 min )
    [D] Do you find content in "Foundations of Statistical Natural Language Processing" relevant and beneficial for studying and researching in the field of NLP? If so, can you direct me to university courses and syllabuses that still include the teaching of such content?
    Aside from the questions in the title, I would much appreciate it if you can provide some details on what you consider to be the focus of the field in the current era. submitted by /u/MoreThanJustAMonkey [link] [comments]  ( 89 min )
    [P] Lumos: a portrait relighting framework (SIGGRAPH Asia 2022)
    Lumos: a new portrait relighting framework for SIGGRAPH Asia 2022. Paper: https://arxiv.org/abs/2209.10510 Project: http://deepimagination.cc/Lumos Demo: http://imaginaire.cc/Lumos/ Trained using synthetic data, it not only avoids expensive data collections using light stages, but also achieves SOTA quality and comes with additional features such as controlling glasses glares. submitted by /u/DragonflyOk6308 [link] [comments]  ( 88 min )
    [N] Google releases TensorStore for High-Performance, Scalable Array Storage
    Blog post: https://ai.googleblog.com/2022/09/tensorstore-for-high-performance.html GitHub: https://github.com/google/tensorstore Documentation: https://google.github.io/tensorstore/ ​ Today we are introducing TensorStore, an open-source C++ and Python software library designed for storage and manipulation of n-dimensional data that: Provides a uniform API for reading and writing multiple array formats, including zarr and N5. Natively supports multiple storage systems, including Google Cloud Storage, local and network filesystems, HTTP servers, and in-memory storage. Supports read/writeback caching and transactions, with strong atomicity, isolation, consistency, and durability (ACID) guarantees. Supports safe, efficient access from multiple processes and machines via optimistic concurrency. Offers an asynchronous API to enable high-throughput access even to high-latency remote storage. Provides advanced, fully composable indexing operations and virtual views. submitted by /u/That_Violinist_18 [link] [comments]  ( 92 min )
  • Open

    I've experimented with neural networks, so what you see here a soldiers in the fate of death..
    submitted by /u/Tudor_222 [link] [comments]  ( 93 min )
  • Open

    HCPCS (“hick pics”) codes
    HCPCS stands for Healthcare Common Procedure Coding System. HCPCS codes are commonly pronounced like “hick pics.” I was curious/afraid to see what DALL-E would create with the prompt “HCPCS hick pics” and was pleasantly surprised that it produced the image above. More on that latter. Searching for medical codes I occasionally need to search for […] HCPCS (“hick pics”) codes first appeared on John D. Cook.  ( 6 min )
  • Open

    An Elevated Experience: XPENG Launches G9 EV, Taking Innovation Even Higher with NVIDIA DRIVE Orin
    Editor’s Note: This post has been updated to reflect the XPENG G9 launch. It was originally published in November 2021. You don’t need a private plane to be at the forefront of personal travel. Electric automaker XPENG launched the G9 SUV this week during NVIDIA GTC. The intelligent, software-defined vehicle is built on the high-performance Read article > The post An Elevated Experience: XPENG Launches G9 EV, Taking Innovation Even Higher with NVIDIA DRIVE Orin appeared first on NVIDIA Blog.  ( 5 min )
    World-Class: NVIDIA Research Builds AI Model to Populate Virtual Worlds With 3D Objects, Characters
    The massive virtual worlds created by growing numbers of companies and creators could be more easily populated with a diverse array of 3D buildings, vehicles, characters and more — thanks to a new AI model from NVIDIA Research. Trained using only 2D images, NVIDIA GET3D generates 3D shapes with high-fidelity textures and complex geometric details. Read article > The post World-Class: NVIDIA Research Builds AI Model to Populate Virtual Worlds With 3D Objects, Characters appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    Large-scale revenue forecasting at Bosch with Amazon Forecast and Amazon SageMaker custom models
    This post is co-written by Goktug Cinar, Michael Binder, and Adrian Horvath from Bosch Center for Artificial Intelligence (BCAI). Revenue forecasting is a challenging yet crucial task for strategic business decisions and fiscal planning in most organizations. Often, revenue forecasting is manually performed by financial analysts and is both time consuming and subjective. Such manual […]  ( 12 min )
  • Open

    Transfer Learning — Part — 7.3!! Densenet Architecture in Keras
    In Part 7.0 of the Transfer Learning series we have discussed about Densenet pre-trained model in depth so in this series we will…  ( 69 min )
    Ensure business continuity in times of COVID-19 pandemic by using ECM solutions
    The product of the industry 4.0 revolution is Internet connectivity, the widespread availability of robust wired and Wi-Fi networks, and…  ( 7 min )
    Does Artificial Intelligence Is A Threat To The Human Resources?
    Artificial Intelligence is one of the most interesting and important technology of the 21st century which is predicted to completely change…  ( 16 min )
    CHATBOTS IN BANKING AND FINANCIAL SECTOR: WHAT ARE THE CHALLENGES & OPPORTUNITIES?
    CHATBOTS IN BANKING AND FINANCIAL SECTOR: WHAT ARE THE CHALLENGES & OPPORTUNITIES?  ( 10 min )
  • Open

    Ignore all previous instructions
    Users have noticed that the remoteli.io twitter chatbot, usually faithful to its cheerful messaging promoting remote work, can be subverted with a carefully worded user prompt. Users were able to get the chatbot to claim responsibility for terrorist attacks, threaten the President, meow at other twitter users, print snippets  ( 6 min )
    More advice from the snowbonk chatbot
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    The Transformer Positional Encoding Layer in Keras, Part 2
    In part 1: A gentle introduction to positional encoding in transformer models, we discussed the positional encoding layer of the transformer model. We also showed how you can implement this layer and its functions yourself in Python. In this tutorial, we’ll implement the positional encoding layer in Keras and Tensorflow. You can then use this […] The post The Transformer Positional Encoding Layer in Keras, Part 2 appeared first on Machine Learning Mastery.
  • Open

    Personalized Prediction of Future Lesion Activity and Treatment Effect in Multiple Sclerosis from Baseline MRI. (arXiv:2204.01702v4 [eess.IV] UPDATED)
    Precision medicine for chronic diseases such as multiple sclerosis (MS) involves choosing a treatment which best balances efficacy and side effects/preferences for individual patients. Making this choice as early as possible is important, as delays in finding an effective therapy can lead to irreversible disability accrual. To this end, we present the first deep neural network model for individualized treatment decisions from baseline magnetic resonance imaging (MRI) (with clinical information if available) for MS patients. Our model (a) predicts future new and enlarging T2 weighted (NE-T2) lesion counts on follow-up MRI on multiple treatments and (b) estimates the conditional average treatment effect (CATE), as defined by the predicted future suppression of NE-T2 lesions, between different treatment options relative to placebo. Our model is validated on a proprietary federated dataset of 1817 multi-sequence MRIs acquired from MS patients during four multi-centre randomized clinical trials. Our framework achieves high average precision in the binarized regression of future NE-T2 lesions on five different treatments, identifies heterogeneous treatment effects, and provides a personalized treatment recommendation that accounts for treatment-associated risk (e.g. side effects, patient preference, administration difficulties).
    Implementing and Experimenting with Diffusion Models for Text-to-Image Generation. (arXiv:2209.10948v1 [cs.CV])
    Taking advantage of the many recent advances in deep learning, text-to-image generative models currently have the merit of attracting the general public attention. Two of these models, DALL-E 2 and Imagen, have demonstrated that highly photorealistic images could be generated from a simple textual description of an image. Based on a novel approach for image generation called diffusion models, text-to-image models enable the production of many different types of high resolution images, where human imagination is the only limit. However, these models require exceptionally large amounts of computational resources to train, as well as handling huge datasets collected from the internet. In addition, neither the codebase nor the models have been released. It consequently prevents the AI community from experimenting with these cutting-edge models, making the reproduction of their results complicated, if not impossible. In this thesis, we aim to contribute by firstly reviewing the different approaches and techniques used by these models, and then by proposing our own implementation of a text-to-image model. Highly based on DALL-E 2, we introduce several slight modifications to tackle the high computational cost induced. We thus have the opportunity to experiment in order to understand what these models are capable of, especially in a low resource regime. In particular, we provide additional and analyses deeper than the ones performed by the authors of DALL-E 2, including ablation studies. Besides, diffusion models use so-called guidance methods to help the generating process. We introduce a new guidance method which can be used in conjunction with other guidance methods to improve image quality. Finally, the images generated by our model are of reasonably good quality, without having to sustain the significant training costs of state-of-the-art text-to-image models.
    Concept Activation Regions: A Generalized Framework For Concept-Based Explanations. (arXiv:2209.11222v1 [cs.LG])
    Concept-based explanations permit to understand the predictions of a deep neural network (DNN) through the lens of concepts specified by users. Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the DNN's latent space. When this holds true, the concept can be represented by a concept activation vector (CAV) pointing in that direction. In this work, we propose to relax this assumption by allowing concept examples to be scattered across different clusters in the DNN's latent space. Each concept is then represented by a region of the DNN's latent space that includes these clusters and that we call concept activation region (CAR). To formalize this idea, we introduce an extension of the CAV formalism that is based on the kernel trick and support vector classifiers. This CAR formalism yields global concept-based explanations and local concept-based feature importance. We prove that CAR explanations built with radial kernels are invariant under latent space isometries. In this way, CAR assigns the same explanations to latent spaces that have the same geometry. We further demonstrate empirically that CARs offer (1) more accurate descriptions of how concepts are scattered in the DNN's latent space; (2) global explanations that are closer to human concept annotations and (3) concept-based feature importance that meaningfully relate concepts with each other. Finally, we use CARs to show that DNNs can autonomously rediscover known scientific concepts, such as the prostate cancer grading system.
    Interpretable Meta-Measure for Model Performance. (arXiv:2006.02293v2 [cs.LG] UPDATED)
    Benchmarks for the evaluation of model performance play an important role in machine learning. However, there is no established way to describe and create new benchmarks. What is more, the most common benchmarks use performance measures that share several limitations. For example, the difference in performance for two models has no probabilistic interpretation, there is no reference point to indicate whether they represent a significant improvement, and it makes no sense to compare such differences between data sets. We introduce a new meta-score assessment named Elo-based Predictive Power (EPP) that is built on top of other performance measures and allows for interpretable comparisons of models. The differences in EPP scores have a probabilistic interpretation and can be directly compared between data sets, furthermore, the logistic regression-based design allows for an assessment of ranking fitness based on a deviance statistic. We prove the mathematical properties of EPP and support them with empirical results of a large scale benchmark on 30 classification data sets and a real-world benchmark for visual data. Additionally, we propose a Unified Benchmark Ontology that is used to give a uniform description of benchmarks.
    Cross-domain Voice Activity Detection with Self-Supervised Representations. (arXiv:2209.11061v1 [eess.AS])
    Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal, which is a necessary first step for many today's speech based applications. Current state-of-the-art methods focus on training a neural network exploiting features directly contained in the acoustics, such as Mel Filter Banks (MFBs). Such methods therefore require an extra normalisation step to adapt to a new domain where the acoustics is impacted, which can be simply due to a change of speaker, microphone, or environment. In addition, this normalisation step is usually a rather rudimentary method that has certain limitations, such as being highly susceptible to the amount of data available for the new domain. Here, we exploited the crowd-sourced Common Voice (CV) corpus to show that representations based on Self-Supervised Learning (SSL) can adapt well to different domains, because they are computed with contextualised representations of speech across multiple domains. SSL representations also achieve better results than systems based on hand-crafted representations (MFBs), and off-the-shelf VADs, with significant improvement in cross-domain settings.
    LIMIS: Locally Interpretable Modeling using Instance-wise Subsampling. (arXiv:1909.12367v2 [cs.LG] UPDATED)
    Understanding black-box machine learning models is crucial for their widespread adoption. Learning globally interpretable models is one approach, but achieving high performance with them is challenging. An alternative approach is to explain individual predictions using locally interpretable models. For locally interpretable modeling, various methods have been proposed and indeed commonly used, but they suffer from low fidelity, i.e. their explanations do not approximate the predictions well. In this paper, our goal is to push the state-of-the-art in high-fidelity locally interpretable modeling. We propose a novel framework, Locally Interpretable Modeling using Instance-wise Subsampling (LIMIS). LIMIS utilizes a policy gradient to select a small number of instances and distills the black-box model into a low-capacity locally interpretable model using those selected instances. Training is guided with a reward obtained directly by measuring the fidelity of the locally interpretable models. We show on multiple tabular datasets that LIMIS near-matches the prediction accuracy of black-box models, significantly outperforming state-of-the-art locally interpretable models in terms of fidelity and prediction accuracy.
    Graph Trees with Attention. (arXiv:2207.02760v2 [cs.LG] UPDATED)
    When dealing with tabular data, models based on regression and decision trees are a popular choice due to the high accuracy they provide on such tasks and their ease of application as compared to other model classes. Yet, when it comes to graph-structure data, current tree learning algorithms do not provide tools to manage the structure of the data other than relying on feature engineering. In this work we address the above gap, and introduce Graph Trees with Attention (GTA), a new family of tree-based learning algorithms that are designed to operate on graphs. GTA leverages both the graph structure and the features at the vertices and employs an attention mechanism that allows decisions to concentrate on sub-structures of the graph. We analyze GTA models and show that they are strictly more expressive than plain decision trees. We also demonstrate the benefits of GTA empirically on multiple graph and node prediction benchmarks. In these experiments, GTA always outperformed other tree-based models and often outperformed other types of graph-learning algorithms such as Graph Neural Networks (GNNs) and Graph Kernels. Finally, we also provide an explainability mechanism for GTA, and demonstrate it can provide intuitive explanations.
    Rethinking Pareto Frontier for Performance Evaluation of Deep Neural Networks. (arXiv:2202.09275v5 [cs.LG] UPDATED)
    Performance optimization of deep learning models is conducted either manually or through automatic architecture search, or a combination of both. On the other hand, their performance strongly depends on the target hardware and how successfully the models were trained. We propose to use a multi-dimensional Pareto frontier to re-define the efficiency measure of candidate deep learning models, where several variables such as training cost, inference latency, and accuracy play a relative role in defining a dominant model. Furthermore, a random version of the multi-dimensional Pareto frontier is introduced to mitigate the uncertainty of accuracy, latency, and throughput of deep learning models in different experimental setups. These two complementary methods can be combined to perform objective benchmarking of deep learning models. Our proposed method is applied to a wide range of deep image classification models trained on ImageNet data. Our method combines competing variables with stochastic nature in a single relative efficiency measure. This allows ranking deep learning models that run efficiently on different hardware, and combining inference efficiency with training efficiency objectively.
    Invariant Policy Learning: A Causal Perspective. (arXiv:2106.00808v4 [cs.LG] UPDATED)
    Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real-world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we take a step toward tackling the problem of environmental shifts considering the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved variables are present and show that, in that case, an optimal invariant policy is guaranteed to generalize across environments under suitable assumptions. Our results establish concrete connections among causality, invariance, and contextual bandits.
    Making Byzantine Decentralized Learning Efficient. (arXiv:2209.10931v1 [cs.LG])
    Decentralized-SGD (D-SGD) distributes heavy learning tasks across multiple machines (a.k.a., {\em nodes}), effectively dividing the workload per node by the size of the system. However, a handful of \emph{Byzantine} (i.e., misbehaving) nodes can jeopardize the entire learning procedure. This vulnerability is further amplified when the system is \emph{asynchronous}. Although approaches that confer Byzantine resilience to D-SGD have been proposed, these significantly impact the efficiency of the process to the point of even negating the benefit of decentralization. This naturally raises the question: \emph{can decentralized learning simultaneously enjoy Byzantine resilience and reduced workload per node?} We answer positively by proposing \newalgorithm{} that ensures Byzantine resilience without losing the computational efficiency of D-SGD. Essentially, \newalgorithm{} weakens the impact of Byzantine nodes by reducing the variance in local updates using \emph{Polyak's momentum}. Then, by establishing coordination between nodes via {\em signed echo broadcast} and a {\em nearest-neighbor averaging} scheme, we effectively tolerate Byzantine nodes whilst distributing the overhead amongst the non-Byzantine nodes. To demonstrate the correctness of our algorithm, we introduce and analyze a novel {\em Lyapunov function} that accounts for the {\em non-Markovian model drift} arising from the use of momentum. We also demonstrate the efficiency of \newalgorithm{} through experiments on several image classification tasks.
    Poisson Flow Generative Models. (arXiv:2209.11178v1 [cs.LG])
    We propose a new "Poisson flow" generative model (PFGM) that maps a uniform distribution on a high-dimensional hemisphere into any data distribution. We interpret the data points as electrical charges on the $z=0$ hyperplane in a space augmented with an additional dimension $z$, generating a high-dimensional electric field (the gradient of the solution to Poisson equation). We prove that if these charges flow upward along electric field lines, their initial distribution in the $z=0$ plane transforms into a distribution on the hemisphere of radius $r$ that becomes uniform in the $r \to\infty$ limit. To learn the bijective transformation, we estimate the normalized field in the augmented space. For sampling, we devise a backward ODE that is anchored by the physically meaningful additional dimension: the samples hit the unaugmented data manifold when the $z$ reaches zero. Experimentally, PFGM achieves current state-of-the-art performance among the normalizing flow models on CIFAR-10, with an Inception score of $9.68$ and a FID score of $2.48$. It also performs on par with the state-of-the-art SDE approaches while offering $10\times $ to $20 \times$ acceleration on image generation tasks. Additionally, PFGM appears more tolerant of estimation errors on a weaker network architecture and robust to the step size in the Euler method. The code is available at https://github.com/Newbeeer/poisson_flow .
    Autism spectrum disorder classification based on interpersonal neural synchrony: Can classification be improved by dyadic neural biomarkers using unsupervised graph representation learning?. (arXiv:2208.08902v2 [cs.LG] UPDATED)
    Research in machine learning for autism spectrum disorder (ASD) classification bears the promise to improve clinical diagnoses. However, recent studies in clinical imaging have shown the limited generalization of biomarkers across and beyond benchmark datasets. Despite increasing model complexity and sample size in neuroimaging, the classification performance of ASD remains far away from clinical application. This raises the question of how we can overcome these barriers to develop early biomarkers for ASD. One approach might be to rethink how we operationalize the theoretical basis of this disease in machine learning models. Here we introduced unsupervised graph representations that explicitly map the neural mechanisms of a core aspect of ASD, deficits in dyadic social interaction, as assessed by dual brain recordings, termed hyperscanning, and evaluated their predictive performance. The proposed method differs from existing approaches in that it is more suitable to capture social interaction deficits on a neural level and is applicable to young children and infants. First results from functional near-infrared spectroscopy data indicate potential predictive capacities of a task-agnostic, interpretable graph representation. This first effort to leverage interaction-related deficits on neural level to classify ASD may stimulate new approaches and methods to enhance existing models to achieve developmental ASD biomarkers in the future.
    Macromolecule Classification Based on the Amino-acid Sequence. (arXiv:2001.01717v2 [q-bio.BM] UPDATED)
    Deep learning is playing a vital role in every field which involves data. It has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using traditional machine learning techniques in the past. In this study we focused on classification of protein sequences with deep learning techniques. The study of amino acid sequence is vital in life sciences. We used different word embedding techniques from Natural Language processing to represent the amino acid sequence as vectors. Our main goal was to classify sequences to four group of classes, that are DNA, RNA, Protein and hybrid. After several tests we have achieved almost 99% of train and test accuracy. We have experimented on CNN, LSTM, Bidirectional LSTM, and GRU.
    NamedMask: Distilling Segmenters from Complementary Foundation Models. (arXiv:2209.11228v1 [cs.CV])
    The goal of this work is to segment and name regions of images without access to pixel-level labels during training. To tackle this task, we construct segmenters by distilling the complementary strengths of two foundation models. The first, CLIP (Radford et al. 2021), exhibits the ability to assign names to image content but lacks an accessible representation of object structure. The second, DINO (Caron et al. 2021), captures the spatial extent of objects but has no knowledge of object names. Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images. These images are pseudo-labelled with a category-agnostic salient object detector bootstrapped from DINO, then refined by category-specific segmenters using the CLIP archive labels. Thanks to the high quality of the refined masks, we show that a standard segmentation architecture trained on these archives with appropriate data augmentation achieves impressive semantic segmentation abilities for both single-object and multi-object images. As a result, our proposed NamedMask performs favourably against a range of prior work on five benchmarks including the VOC2012, COCO and large-scale ImageNet-S datasets.
    Modern Machine Learning Tools for Monitoring and Control of Industrial Processes: A Survey. (arXiv:2209.11123v1 [cs.LG])
    Over the last ten years, we have seen a significant increase in industrial data, tremendous improvement in computational power, and major theoretical advances in machine learning. This opens up an opportunity to use modern machine learning tools on large-scale nonlinear monitoring and control problems. This article provides a survey of recent results with applications in the process industry.
    IGN : Implicit Generative Networks. (arXiv:2206.05860v2 [cs.LG] UPDATED)
    In this work, we build recent advances in distributional reinforcement learning to give a state-of-art distributional variant of the model based on the IQN. We achieve this by using the GAN model's generator and discriminator function with the quantile regression to approximate the full quantile value for the state-action return distribution. We demonstrate improved performance on our baseline dataset - 57 Atari 2600 games in the ALE. Also, we use our algorithm to show the state-of-art training performance of risk-sensitive policies in Atari games with the policy optimization and evaluation.
    MALTS: Matching After Learning to Stretch. (arXiv:1811.07415v8 [stat.ME] UPDATED)
    We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate's contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.
    A Generalist Neural Algorithmic Learner. (arXiv:2209.11142v1 [cs.LG])
    The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.
    Out-of-Distribution Detection Without Class Labels. (arXiv:2112.07662v2 [cs.CV] UPDATED)
    Out-of-distribution detection seeks to identify novelties, samples that deviate from the norm. The task has been found to be quite challenging, particularly in the case where the normal data distribution consists of multiple semantic classes (e.g., multiple object categories). To overcome this challenge, current approaches require manual labeling of the normal images provided during training. In this work, we tackle multi-class novelty detection without class labels. Our simple but effective solution consists of two stages: we first discover "pseudo-class" labels using unsupervised clustering. Then using these pseudo-class labels, we are able to use standard supervised out-of-distribution detection methods. We verify the performance of our method by a favorable comparison to the state-of-the-art, and provide extensive analysis and ablations.
    Non-Intrusive Reduced Models based on Operator Inference for Chaotic Systems. (arXiv:2206.01604v3 [cs.LG] UPDATED)
    This work explores the physics-driven machine learning technique Operator Inference (OpInf) for predicting the state of chaotic dynamical systems. OpInf provides a non-intrusive approach to infer approximations of polynomial operators in reduced space without having access to the full order operators appearing in discretized models. Datasets for the physics systems are generated using conventional numerical solvers and then projected to a low-dimensional space via Principal Component Analysis (PCA). In latent space, a least-squares problem is set to fit a quadratic polynomial operator, which is subsequently employed in a time-integration scheme in order to produce extrapolations in the same space. Once solved, the inverse PCA operation is applied to reconstruct the extrapolations in the original space. The quality of the OpInf predictions is assessed via the Normalized Root Mean Squared Error (NRMSE) metric from which the Valid Prediction Time (VPT) is computed. Numerical experiments considering the chaotic systems Lorenz 96 and the Kuramoto-Sivashinsky equation show promising forecasting capabilities of the OpInf reduced order models with VPT ranges that outperform state-of-the-art machine learning methods such as backpropagation and reservoir computing recurrent neural networks [1], as well as Markov neural operators [2].
    TempNet -- Temporal Super Resolution of Radar Rainfall Products with Residual CNNs. (arXiv:2109.09289v2 [cs.CV] UPDATED)
    The temporal and spatial resolution of rainfall data is crucial for environmental modeling studies in which its variability in space and time is considered as a primary factor. Rainfall products from different remote sensing instruments (e.g., radar, satellite) have different space-time resolutions because of the differences in their sensing capabilities and post-processing methods. In this study, we developed a deep learning approach that augments rainfall data with increased time resolutions to complement relatively lower resolution products. We propose a neural network architecture based on Convolutional Neural Networks (CNNs) to improve the temporal resolution of radar-based rainfall products and compare the proposed model with an optical flow-based interpolation method and CNN-baseline model. The methodology presented in this study could be used for enhancing rainfall maps with better temporal resolution and imputation of missing frames in sequences of 2D rainfall maps to support hydrological and flood forecasting studies.
    Improved Binary Forward Exploration: Learning Rate Scheduling Method for Stochastic Optimization. (arXiv:2207.04198v3 [cs.LG] UPDATED)
    A new gradient-based optimization approach by automatically scheduling the learning rate has been proposed recently, which is called Binary Forward Exploration (BFE). The Adaptive version of BFE has also been discussed thereafter. In this paper, the improved algorithms based on them will be investigated, in order to optimize the efficiency and robustness of the new methodology. This improved approach provides a new perspective to scheduling the update of learning rate and will be compared with the stochastic gradient descent, aka SGD algorithm with momentum or Nesterov momentum and the most successful adaptive learning rate algorithm e.g. Adam. The goal of this method does not aim to beat others but provide a different viewpoint to optimize the gradient descent process. This approach combines the advantages of the first-order and second-order optimizations in the aspects of speed and efficiency.
    Faithiful Embeddings for EL++ Knowledge Bases. (arXiv:2201.09919v2 [cs.AI] UPDATED)
    Recently, increasing efforts are put into learning continual representations for symbolic knowledge bases (KBs). However, these approaches either only embed the data-level knowledge (ABox) or suffer from inherent limitations when dealing with concept-level knowledge (TBox), i.e., they cannot faithfully model the logical structure present in the KBs. We present BoxEL, a geometric KB embedding approach that allows for better capturing the logical structure (i.e., ABox and TBox axioms) in the description logic EL++. BoxEL models concepts in a KB as axis-parallel boxes that are suitable for modeling concept intersection, entities as points inside boxes, and relations between concepts/entities as affine transformations. We show theoretical guarantees (soundness) of BoxEL for preserving logical structure. Namely, the learned model of BoxEL embedding with loss 0 is a (logical) model of the KB. Experimental results on (plausible) subsumption reasonings and a real-world application for protein-protein prediction show that BoxEL outperforms traditional knowledge graph embedding methods as well as state-of-the-art EL++ embedding approaches.
    Ascent Similarity Caching with Approximate Indexes. (arXiv:2107.00957v4 [cs.NI] UPDATED)
    Similarity search is a key operation in multimedia retrieval systems and recommender systems, and it will play an important role also for future machine learning and augmented reality applications. When these systems need to serve large objects with tight delay constraints, edge servers close to the end-user can operate as similarity caches to speed up the retrieval. In this paper we present A\c{C}AI, a new similarity caching policy which improves on the state of the art by using (i) an (approximate) index for the whole catalog to decide which objects to serve locally and which to retrieve from the remote server, and (ii) a mirror ascent algorithm to update the set of local objects with strong guarantees even when the request process does not exhibit any statistical regularity.
    A Novel Data Augmentation Technique for Out-of-Distribution Sample Detection using Compounded Corruptions. (arXiv:2207.13916v2 [cs.CV] UPDATED)
    Modern deep neural network models are known to erroneously classify out-of-distribution (OOD) test data into one of the in-distribution (ID) training classes with high confidence. This can have disastrous consequences for safety-critical applications. A popular mitigation strategy is to train a separate classifier that can detect such OOD samples at the test time. In most practical settings OOD examples are not known at the train time, and hence a key question is: how to augment the ID data with synthetic OOD samples for training such an OOD detector? In this paper, we propose a novel Compounded Corruption technique for the OOD data augmentation termed CnC. One of the major advantages of CnC is that it does not require any hold-out data apart from the training set. Further, unlike current state-of-the-art (SOTA) techniques, CnC does not require backpropagation or ensembling at the test time, making our method much faster at inference. Our extensive comparison with 20 methods from the major conferences in last 4 years show that a model trained using CnC based data augmentation, significantly outperforms SOTA, both in terms of OOD detection accuracy as well as inference time. We include a detailed post-hoc analysis to investigate the reasons for the success of our method and identify higher relative entropy and diversity of CnC samples as probable causes. We also provide theoretical insights via a piece-wise decomposition analysis on a two-dimensional dataset to reveal (visually and quantitatively) that our approach leads to a tighter boundary around ID classes, leading to better detection of OOD samples. Source code link: https://github.com/cnc-ood
    A novel corrective-source term approach to modeling unknown physics in aluminum extraction process. (arXiv:2209.10861v1 [cs.LG])
    With the ever-increasing availability of data, there has been an explosion of interest in applying modern machine learning methods to fields such as modeling and control. However, despite the flexibility and surprising accuracy of such black-box models, it remains difficult to trust them. Recent efforts to combine the two approaches aim to develop flexible models that nonetheless generalize well; a paradigm we call Hybrid Analysis and modeling (HAM). In this work we investigate the Corrective Source Term Approach (CoSTA), which uses a data-driven model to correct a misspecified physics-based model. This enables us to develop models that make accurate predictions even when the underlying physics of the problem is not well understood. We apply CoSTA to model the Hall-H\'eroult process in an aluminum electrolysis cell. We demonstrate that the method improves both accuracy and predictive stability, yielding an overall more trustworthy model.
    Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training. (arXiv:2209.10778v1 [cs.LG])
    An activation function is an element-wise mathematical function and plays a crucial role in deep neural networks (DNN). Many novel and sophisticated activation functions have been proposed to improve the DNN accuracy but also consume massive memory in the training process with back-propagation. In this study, we propose the nested forward automatic differentiation (Forward-AD), specifically for the element-wise activation function for memory-efficient DNN training. We deploy nested Forward-AD in two widely-used deep learning frameworks, TensorFlow and PyTorch, which support the static and dynamic computation graph, respectively. Our evaluation shows that nested Forward-AD reduces the memory footprint by up to 1.97x than the baseline model and outperforms the recomputation by 20% under the same memory reduction ratio.
    Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training. (arXiv:2209.11204v1 [cs.LG])
    Recently, sparse training has emerged as a promising paradigm for efficient deep learning on edge devices. The current research mainly devotes efforts to reducing training costs by further increasing model sparsity. However, increasing sparsity is not always ideal since it will inevitably introduce severe accuracy degradation at an extremely high sparsity level. This paper intends to explore other possible directions to effectively and efficiently reduce sparse training costs while preserving accuracy. To this end, we investigate two techniques, namely, layer freezing and data sieving. First, the layer freezing approach has shown its success in dense model training and fine-tuning, yet it has never been adopted in the sparse training domain. Nevertheless, the unique characteristics of sparse training may hinder the incorporation of layer freezing techniques. Therefore, we analyze the feasibility and potentiality of using the layer freezing technique in sparse training and find it has the potential to save considerable training costs. Second, we propose a data sieving method for dataset-efficient training, which further reduces training costs by ensuring only a partial dataset is used throughout the entire training process. We show that both techniques can be well incorporated into the sparse training algorithm to form a generic framework, which we dub SpFDE. Our extensive experiments demonstrate that SpFDE can significantly reduce training costs while preserving accuracy from three dimensions: weight sparsity, layer freezing, and dataset sieving.
    Grape Cold Hardiness Prediction via Multi-Task Learning. (arXiv:2209.10585v1 [cs.LG])
    Cold temperatures during fall and spring have the potential to cause frost damage to grapevines and other fruit plants, which can significantly decrease harvest yields. To help prevent these losses, farmers deploy expensive frost mitigation measures, such as, sprinklers, heaters, and wind machines, when they judge that damage may occur. This judgment, however, is challenging because the cold hardiness of plants changes throughout the dormancy period and it is difficult to directly measure. This has led scientists to develop cold hardiness prediction models that can be tuned to different grape cultivars based on laborious field measurement data. In this paper, we study whether deep-learning models can improve cold hardiness prediction for grapes based on data that has been collected over a 30-year time period. A key challenge is that the amount of data per cultivar is highly variable, with some cultivars having only a small amount. For this purpose, we investigate the use of multi-task learning to leverage data across cultivars in order to improve prediction performance for individual cultivars. We evaluate a number of multi-task learning approaches and show that the highest performing approach is able to significantly improve over learning for single cultivars and outperforms the current state-of-the-art scientific model for most cultivars.
    Boosting as Frank-Wolfe. (arXiv:2209.10831v1 [cs.LG])
    Some boosting algorithms, such as LPBoost, ERLPBoost, and C-ERLPBoost, aim to solve the soft margin optimization problem with the $\ell_1$-norm regularization. LPBoost rapidly converges to an $\epsilon$-approximate solution in practice, but it is known to take $\Omega(m)$ iterations in the worst case, where $m$ is the sample size. On the other hand, ERLPBoost and C-ERLPBoost are guaranteed to converge to an $\epsilon$-approximate solution in $O(\frac{1}{\epsilon^2} \ln \frac{m}{\nu})$ iterations. However, the computation per iteration is very high compared to LPBoost. To address this issue, we propose a generic boosting scheme that combines the Frank-Wolfe algorithm and any secondary algorithm and switches one to the other iteratively. We show that the scheme retains the same convergence guarantee as ERLPBoost and C-ERLPBoost. One can incorporate any secondary algorithm to improve in practice. This scheme comes from a unified view of boosting algorithms for soft margin optimization. More specifically, we show that LPBoost, ERLPBoost, and C-ERLPBoost are instances of the Frank-Wolfe algorithm. In experiments on real datasets, one of the instances of our scheme exploits the better updates of the secondary algorithm and performs comparably with LPBoost.
    Structure Learning of Quantum Embeddings. (arXiv:2209.11144v1 [quant-ph])
    The representation of data is of paramount importance for machine learning methods. Kernel methods are used to enrich the feature representation, allowing better generalization. Quantum kernels implement efficiently complex transformation encoding classical data in the Hilbert space of a quantum system, resulting in even exponential speedup. However, we need prior knowledge of the data to choose an appropriate parametric quantum circuit that can be used as quantum embedding. We propose an algorithm that automatically selects the best quantum embedding through a combinatorial optimization procedure that modifies the structure of the circuit, changing the generators of the gates, their angles (which depend on the data points), and the qubits on which the various gates act. Since combinatorial optimization is computationally expensive, we have introduced a criterion based on the exponential concentration of kernel matrix coefficients around the mean to immediately discard an arbitrarily large portion of solutions that are believed to perform poorly. Contrary to the gradient-based optimization (e.g. trainable quantum kernels), our approach is not affected by the barren plateau by construction. We have used both artificial and real-world datasets to demonstrate the increased performance of our approach with respect to randomly generated PQC. We have also compared the effect of different optimization algorithms, including greedy local search, simulated annealing, and genetic algorithms, showing that the algorithm choice largely affects the result.
    Amharic Text Clustering Using Encyclopedic Knowledge with Neural Word Embedding. (arXiv:2105.00809v2 [cs.CL] UPDATED)
    In this digital era, almost in every discipline people are using automated systems that generate information represented in document format in different natural languages. As a result, there is a growing interest towards better solutions for finding, organizing and analyzing these documents. In this paper, we propose a system that clusters Amharic text documents using Encyclopedic Knowledge (EK) with neural word embedding. EK enables the representation of related concepts and neural word embedding allows us to handle the contexts of the relatedness. During the clustering process, all the text documents pass through preprocessing stages. Enriched text document features are extracted from each document by mapping with EK and word embedding model. TF-IDF weighted vector of enriched feature was generated. Finally, text documents are clustered using popular spherical K-means algorithm. The proposed system is tested with Amharic text corpus and Amharic Wikipedia data. Test results show that the use of EK with word embedding for document clustering improves the average accuracy over the use of only EK. Furthermore, changing the size of the class has a significant effect on accuracy.
    NashAE: Disentangling Representations through Adversarial Covariance Minimization. (arXiv:2209.10677v1 [cs.LG])
    We present a self-supervised method to disentangle factors of variation in high-dimensional data that does not rely on prior knowledge of the underlying variation profile (e.g., no assumptions on the number or distribution of the individual latent variables to be extracted). In this method which we call NashAE, high-dimensional feature disentanglement is accomplished in the low-dimensional latent space of a standard autoencoder (AE) by promoting the discrepancy between each encoding element and information of the element recovered from all other encoding elements. Disentanglement is promoted efficiently by framing this as a minmax game between the AE and an ensemble of regression networks which each provide an estimate of an element conditioned on an observation of all other elements. We quantitatively compare our approach with leading disentanglement methods using existing disentanglement metrics. Furthermore, we show that NashAE has increased reliability and increased capacity to capture salient data characteristics in the learned latent representation.
    Counterfactual Explanations Using Optimization With Constraint Learning. (arXiv:2209.10997v1 [cs.LG])
    Counterfactual explanations embody one of the many interpretability techniques that receive increasing attention from the machine learning community. Their potential to make model predictions more sensible to the user is considered to be invaluable. To increase their adoption in practice, several criteria that counterfactual explanations should adhere to have been put forward in the literature. We propose counterfactual explanations using optimization with constraint learning (CE-OCL), a generic and flexible approach that addresses all these criteria and allows room for further extensions. Specifically, we discuss how we can leverage an optimization with constraint learning framework for the generation of counterfactual explanations, and how components of this framework readily map to the criteria. We also propose two novel modeling approaches to address data manifold closeness and diversity, which are two key criteria for practical counterfactual explanations. We test CE-OCL on several datasets and present our results in a case study. Compared against the current state-of-the-art methods, CE-OCL allows for more flexibility and has an overall superior performance in terms of several evaluation metrics proposed in related work.
    Training neural network ensembles via trajectory sampling. (arXiv:2209.11116v1 [cond-mat.stat-mech])
    In machine learning, there is renewed interest in neural network ensembles (NNEs), whereby predictions are obtained as an aggregate from a diverse set of smaller models, rather than from a single larger model. Here, we show how to define and train a NNE using techniques from the study of rare trajectories in stochastic systems. We define an NNE in terms of the trajectory of the model parameters under a simple, and discrete in time, diffusive dynamics, and train the NNE by biasing these trajectories towards a small time-integrated loss, as controlled by appropriate counting fields which act as hyperparameters. We demonstrate the viability of this technique on a range of simple supervised learning tasks. We discuss potential advantages of our trajectory sampling approach compared with more conventional gradient based methods.
    SERF: Interpretable Sleep Staging using Embeddings, Rules, and Features. (arXiv:2209.11174v1 [eess.SP])
    The accuracy of recent deep learning based clinical decision support systems is promising. However, lack of model interpretability remains an obstacle to widespread adoption of artificial intelligence in healthcare. Using sleep as a case study, we propose a generalizable method to combine clinical interpretability with high accuracy derived from black-box deep learning. Clinician-determined sleep stages from polysomnogram (PSG) remain the gold standard for evaluating sleep quality. However, PSG manual annotation by experts is expensive and time-prohibitive. We propose SERF, interpretable Sleep staging using Embeddings, Rules, and Features to read PSG. SERF provides interpretation of classified sleep stages through meaningful features derived from the AASM Manual for the Scoring of Sleep and Associated Events. In SERF, the embeddings obtained from a hybrid of convolutional and recurrent neural networks are transposed to the interpretable feature space. These representative interpretable features are used to train simple models like a shallow decision tree for classification. Model results are validated on two publicly available datasets. SERF surpasses the current state-of-the-art for interpretable sleep staging by 2%. Using Gradient Boosted Trees as the classifier, SERF obtains 0.766 $\kappa$ and 0.870 AUC-ROC, within 2% of the current state-of-the-art black-box models.
    Gradient-Based Trajectory Optimization With Learned Dynamics. (arXiv:2204.04558v2 [cs.RO] UPDATED)
    Trajectory optimization methods have achieved an exceptional level of performance on real-world robots in recent years. These methods heavily rely on accurate analytical models of the dynamics, yet some aspects of the physical world can only be captured to a limited extent. An alternative approach is to leverage machine learning techniques to learn a differentiable dynamics model of the system from data. In this work, we use trajectory optimization and model learning for performing highly dynamic and complex tasks with robotic systems in absence of accurate analytical models of the dynamics. We show that a neural network can model highly nonlinear behaviors accurately for large time horizons, from data collected in only 25 minutes of interactions on two distinct robots: (i) the Boston Dynamics Spot and an (ii) RC car. Furthermore, we use the gradients of the neural network to perform gradient-based trajectory optimization. In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car, and gives good performance in combination with trajectory optimization methods.  ( 2 min )
    Ensembles of Vision Transformers as a New Paradigm for Automated Classification in Ecology. (arXiv:2203.01726v2 [cs.CV] UPDATED)
    Monitoring biodiversity is paramount to manage and protect natural resources, particularly in times of global change. Collecting images of organisms over large temporal or spatial scales is a promising practice to monitor and study biodiversity change of natural ecosystems, providing large amounts of data with minimal interference with the environment. Deep learning models are currently used to automate classification of organisms into taxonomic units. However, imprecision in these classifiers introduce a measurement noise that is difficult to control and can significantly hinder the analysis and interpretation of data. In our study, we show that this limitation can be overcome by ensembles of Data-efficient image Transformers (DeiTs), which significantly outperform the previous state of the art (SOTA). We validate our results on a large number of ecological imaging datasets of diverse origin, and organisms of study ranging from plankton to insects, birds, dog breeds, animals in the wild, and corals. On all the data sets we test, we achieve a new SOTA, with a reduction of the error with respect to the previous SOTA ranging from 18.48% to 87.50%, depending on the data set, and often achieving performances very close to perfect classification. The main reason why ensembles of DeiTs perform better is not due to the single-model performance of DeiTs, but rather to the fact that predictions by independent models have a smaller overlap, and this maximizes the profit gained by ensembling. This positions DeiT ensembles as the best candidate for image classification in biodiversity monitoring.
    DeepGraphONet: A Deep Graph Operator Network to Learn and Zero-shot Transfer the Dynamic Response of Networked Systems. (arXiv:2209.10622v1 [cs.LG])
    This paper develops a Deep Graph Operator Network (DeepGraphONet) framework that learns to approximate the dynamics of a complex system (e.g. the power grid or traffic) with an underlying sub-graph structure. We build our DeepGraphONet by fusing the ability of (i) Graph Neural Networks (GNN) to exploit spatially correlated graph information and (ii) Deep Operator Networks~(DeepONet) to approximate the solution operator of dynamical systems. The resulting DeepGraphONet can then predict the dynamics within a given short/medium-term time horizon by observing a finite history of the graph state information. Furthermore, we design our DeepGraphONet to be resolution-independent. That is, we do not require the finite history to be collected at the exact/same resolution. In addition, to disseminate the results from a trained DeepGraphONet, we design a zero-shot learning strategy that enables using it on a different sub-graph. Finally, empirical results on the (i) transient stability prediction problem of power grids and (ii) traffic flow forecasting problem of a vehicular system illustrate the effectiveness of the proposed DeepGraphONet.
    One-Bit Compressive Sensing: Can We Go Deep and Blind?. (arXiv:2203.11278v2 [eess.SP] UPDATED)
    One-bit compressive sensing is concerned with the accurate recovery of an underlying sparse signal of interest from its one-bit noisy measurements. The conventional signal recovery approaches for this problem are mainly developed based on the assumption that an exact knowledge of the sensing matrix is available. In this work, however, we present a novel data-driven and model-based methodology that achieves blind recovery; i.e., signal recovery without requiring the knowledge of the sensing matrix. To this end, we make use of the deep unfolding technique and develop a model-driven deep neural architecture which is designed for this specific task. The proposed deep architecture is able to learn an alternative sensing matrix by taking advantage of the underlying unfolded algorithm such that the resulting learned recovery algorithm can accurately and quickly (in terms of the number of iterations) recover the underlying compressed signal of interest from its one-bit noisy measurements. In addition, due to the incorporation of the domain knowledge and the mathematical model of the system into the proposed deep architecture, the resulting network benefits from enhanced interpretability, has a very small number of trainable parameters, and requires very small number of training samples, as compared to the commonly used black-box deep neural network alternatives for the problem at hand.
    SW-VAE: Weakly Supervised Learn Disentangled Representation Via Latent Factor Swapping. (arXiv:2209.10623v1 [cs.LG])
    Representation disentanglement is an important goal of representation learning that benefits various downstream tasks. To achieve this goal, many unsupervised learning representation disentanglement approaches have been developed. However, the training process without utilizing any supervision signal have been proved to be inadequate for disentanglement representation learning. Therefore, we propose a novel weakly-supervised training approach, named as SW-VAE, which incorporates pairs of input observations as supervision signals by using the generative factors of datasets. Furthermore, we introduce strategies to gradually increase the learning difficulty during training to smooth the training process. As shown on several datasets, our model shows significant improvement over state-of-the-art (SOTA) methods on representation disentanglement tasks.
    Exploiting Independent Instruments: Identification and Distribution Generalization. (arXiv:2202.01864v2 [stat.ML] UPDATED)
    Instrumental variable models allow us to identify a causal function between covariates $X$ and a response $Y$, even in the presence of unobserved confounding. Most of the existing estimators assume that the error term in the response $Y$ and the hidden confounders are uncorrelated with the instruments $Z$. This is often motivated by a graphical separation, an argument that also justifies independence. Positing an independence restriction, however, leads to strictly stronger identifiability results. We connect to the existing literature in econometrics and provide a practical method called HSIC-X for exploiting independence that can be combined with any gradient-based learning procedure. We see that even in identifiable settings, taking into account higher moments may yield better finite sample results. Furthermore, we exploit the independence for distribution generalization. We prove that the proposed estimator is invariant to distributional shifts on the instruments and worst-case optimal whenever these shifts are sufficiently strong. These results hold even in the under-identified case where the instruments are not sufficiently rich to identify the causal function.
    One-Shot Federated Learning for Model Clustering and Learning in Heterogeneous Environments. (arXiv:2209.10866v1 [cs.LG])
    We propose a communication efficient approach for federated learning in heterogeneous environments. The system heterogeneity is reflected in the presence of $K$ different data distributions, with each user sampling data from only one of $K$ distributions. The proposed approach requires only one communication round between the users and server, thus significantly reducing the communication cost. Moreover, the proposed method provides strong learning guarantees in heterogeneous environments, by achieving the optimal mean-squared error (MSE) rates in terms of the sample size, i.e., matching the MSE guarantees achieved by learning on all data points belonging to users with the same data distribution, provided that the number of data points per user is above a threshold that we explicitly characterize in terms of system parameters. Remarkably, this is achieved without requiring any knowledge of the underlying distributions, or even the true number of distributions $K$. Numerical experiments illustrate our findings and underline the performance of the proposed method.
    Fair Robust Active Learning by Joint Inconsistency. (arXiv:2209.10729v1 [cs.LG])
    Fair Active Learning (FAL) utilized active learning techniques to achieve high model performance with limited data and to reach fairness between sensitive groups (e.g., genders). However, the impact of the adversarial attack, which is vital for various safety-critical machine learning applications, is not yet addressed in FAL. Observing this, we introduce a novel task, Fair Robust Active Learning (FRAL), integrating conventional FAL and adversarial robustness. FRAL requires ML models to leverage active learning techniques to jointly achieve equalized performance on benign data and equalized robustness against adversarial attacks between groups. In this new task, previous FAL methods generally face the problem of unbearable computational burden and ineffectiveness. Therefore, we develop a simple yet effective FRAL strategy by Joint INconsistency (JIN). To efficiently find samples that can boost the performance and robustness of disadvantaged groups for labeling, our method exploits the prediction inconsistency between benign and adversarial samples as well as between standard and robust models. Extensive experiments under diverse datasets and sensitive groups demonstrate that our method not only achieves fairer performance on benign samples but also obtains fairer robustness under white-box PGD attacks compared with existing active learning and FAL baselines. We are optimistic that FRAL would pave a new path for developing safe and robust ML research and applications such as facial attribute recognition in biometrics systems.
    MLGWSC-1: The first Machine Learning Gravitational-Wave Search Mock Data Challenge. (arXiv:2209.11146v1 [astro-ph.IM])
    We present the results of the first Machine Learning Gravitational-Wave Search Mock Data Challenge (MLGWSC-1). For this challenge, participating groups had to identify gravitational-wave signals from binary black hole mergers of increasing complexity and duration embedded in progressively more realistic noise. The final of the 4 provided datasets contained real noise from the O3a observing run and signals up to a duration of 20 seconds with the inclusion of precession effects and higher order modes. We present the average sensitivity distance and runtime for the 6 entered algorithms derived from 1 month of test data unknown to the participants prior to submission. Of these, 4 are machine learning algorithms. We find that the best machine learning based algorithms are able to achieve up to 95% of the sensitive distance of matched-filtering based production analyses for simulated Gaussian noise at a false-alarm rate (FAR) of one per month. In contrast, for real noise, the leading machine learning search achieved 70%. For higher FARs the differences in sensitive distance shrink to the point where select machine learning submissions outperform traditional search algorithms at FARs $\geq 200$ per month on some datasets. Our results show that current machine learning search algorithms may already be sensitive enough in limited parameter regions to be useful for some production settings. To improve the state-of-the-art, machine learning algorithms need to reduce the false-alarm rates at which they are capable of detecting signals and extend their validity to regions of parameter space where modeled searches are computationally expensive to run. Based on our findings we compile a list of research areas that we believe are the most important to elevate machine learning searches to an invaluable tool in gravitational-wave signal detection.
    Proximal Point Imitation Learning. (arXiv:2209.10968v1 [cs.LG])
    This work develops new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning (IL) with linear function approximation without restrictive coherence assumptions. We begin with the minimax formulation of the problem and then outline how to leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing, for online and offline IL, respectively. Thanks to PPM, we avoid nested policy evaluation and cost updates for online IL appearing in the prior literature. In particular, we do away with the conventional alternating updates by the optimization of a single convex and smooth objective over both cost and Q-functions. When solved inexactly, we relate the optimization errors to the suboptimality of the recovered policy. As an added bonus, by re-interpreting PPM as dual smoothing with the expert policy as a center point, we also obtain an offline IL algorithm enjoying theoretical guarantees in terms of required expert trajectories. Finally, we achieve convincing empirical performance for both linear and neural network function approximation.
    EventNet: Detecting Events in EEG. (arXiv:2209.11007v1 [eess.SP])
    Neurologists are often looking for various "events of interest" when analyzing EEG. To support them in this task various machine-learning-based algorithms have been developed. Most of these algorithms treat the problem as classification, thereby independently processing signal segments and ignoring temporal dependencies inherent to events of varying duration. At inference time, the predicted labels for each segment then have to be post processed to detect the actual events. We propose an end-to-end event detection approach (EventNet), based on deep learning, that directly works with events as learning targets, stepping away from ad-hoc postprocessing schemes to turn model outputs into events. We compare EventNet with a state-of-the-art approach for artefact and and epileptic seizure detection, two event types with highly variable durations. EventNet shows improved performance in detecting both event types. These results show the power of treating events as direct learning targets, instead of using ad-hoc postprocessing to obtain them. Our event detection framework can easily be extended to other event detection problems in signal processing, since the deep learning backbone does not depend on any task-specific features.
    Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. (arXiv:2209.11215v1 [cs.LG])
    We provide theoretical convergence guarantees for score-based generative models (SGMs) such as denoising diffusion probabilistic models (DDPMs), which constitute the backbone of large-scale real-world generative models such as DALL$\cdot$E 2. Our main result is that, assuming accurate score estimates, such SGMs can efficiently sample from essentially any realistic data distribution. In contrast to prior works, our results (1) hold for an $L^2$-accurate score estimate (rather than $L^\infty$-accurate); (2) do not require restrictive functional inequality conditions that preclude substantial non-log-concavity; (3) scale polynomially in all relevant problem parameters; and (4) match state-of-the-art complexity guarantees for discretization of the Langevin diffusion, provided that the score error is sufficiently small. We view this as strong theoretical justification for the empirical success of SGMs. We also examine SGMs based on the critically damped Langevin diffusion (CLD). Contrary to conventional wisdom, we provide evidence that the use of the CLD does not reduce the complexity of SGMs.
    Chance constrained conic-segmentation support vector machine with uncertain data. (arXiv:2107.13319v2 [cs.LG] UPDATED)
    Support vector machines (SVM) is one of the well known supervised classes of learning algorithms. Furthermore, the conic-segmentation SVM (CS-SVM) is a natural multiclass analogue of the standard binary SVM, as CS-SVM models are dealing with the situation where the exact values of the data points are known. This paper studies CS-SVM when the data points are uncertain or mislabelled. With some properties known for the distributions, a chance-constrained CS-SVM approach is used to ensure the small probability of misclassification for the uncertain data. The geometric interpretation is presented to show how CS-SVM works. Finally, we present experimental results to investigate the chance constrained CS-SVM's performance.
    Formulating Robustness Against Unforeseen Attacks. (arXiv:2204.13779v2 [cs.LG] UPDATED)
    Existing defenses against adversarial examples such as adversarial training typically assume that the adversary will conform to a specific or known threat model, such as $\ell_p$ perturbations within a fixed budget. In this paper, we focus on the scenario where there is a mismatch in the threat model assumed by the defense during training, and the actual capabilities of the adversary at test time. We ask the question: if the learner trains against a specific "source" threat model, when can we expect robustness to generalize to a stronger unknown "target" threat model during test-time? Our key contribution is to formally define the problem of learning and generalization with an unforeseen adversary, which helps us reason about the increase in adversarial risk from the conventional perspective of a known adversary. Applying our framework, we derive a generalization bound which relates the generalization gap between source and target threat models to variation of the feature extractor, which measures the expected maximum difference between extracted features across a given threat model. Based on our generalization bound, we propose adversarial training with variation regularization (AT-VR) which reduces variation of the feature extractor across the source threat model during training. We empirically demonstrate that AT-VR can lead to improved generalization to unforeseen attacks during test-time compared to standard adversarial training. Additionally, we combine variation regularization with perceptual adversarial training [Laidlaw et al. 2021] to achieve state-of-the-art robustness on unforeseen attacks. Our code is publicly available at https://github.com/inspire-group/variation-regularization.
    Review of Time Series Forecasting Methods and Their Applications to Particle Accelerators. (arXiv:2209.10705v1 [physics.acc-ph])
    Particle accelerators are complex facilities that produce large amounts of structured data and have clear optimization goals as well as precisely defined control requirements. As such they are naturally amenable to data-driven research methodologies. The data from sensors and monitors inside the accelerator form multivariate time series. With fast pre-emptive approaches being highly preferred in accelerator control and diagnostics, the application of data-driven time series forecasting methods is particularly promising. This review formulates the time series forecasting problem and summarizes existing models with applications in various scientific areas. Several current and future attempts in the field of particle accelerators are introduced. The application of time series forecasting to particle accelerators has shown encouraging results and the promise for broader use, and existing problems such as data consistency and compatibility have started to be addressed.
    A Validation Approach to Over-parameterized Matrix and Image Recovery. (arXiv:2209.10675v1 [math.OC])
    In this paper, we study the problem of recovering a low-rank matrix from a number of noisy random linear measurements. We consider the setting where the rank of the ground-truth matrix is unknown a prior and use an overspecified factored representation of the matrix variable, where the global optimal solutions overfit and do not correspond to the underlying ground-truth. We then solve the associated nonconvex problem using gradient descent with small random initialization. We show that as long as the measurement operators satisfy the restricted isometry property (RIP) with its rank parameter scaling with the rank of ground-truth matrix rather than scaling with the overspecified matrix variable, gradient descent iterations are on a particular trajectory towards the ground-truth matrix and achieve nearly information-theoretically optimal recovery when stop appropriately. We then propose an efficient early stopping strategy based on the common hold-out method and show that it detects nearly optimal estimator provably. Moreover, experiments show that the proposed validation approach can also be efficiently used for image restoration with deep image prior which over-parameterizes an image with a deep network.
    XClusters: Explainability-first Clustering. (arXiv:2209.10956v1 [cs.LG])
    We study the problem of explainability-first clustering where explainability becomes a first-class citizen for clustering. Previous clustering approaches use decision trees for explanation, but only after the clustering is completed. In contrast, our approach is to perform clustering and decision tree training holistically where the decision tree's performance and size also influence the clustering results. We assume the attributes for clustering and explaining are distinct, although this is not necessary. We observe that our problem is a monotonic optimization where the objective function is a difference of monotonic functions. We then propose an efficient branch-and-bound algorithm for finding the best parameters that lead to a balance of cluster distortion and decision tree explainability. Our experiments show that our method can improve the explainability of any clustering that fits in our framework.
    Learning-Augmented Algorithms for Online Linear and Semidefinite Programming. (arXiv:2209.10614v1 [cs.DS])
    Semidefinite programming (SDP) is a unifying framework that generalizes both linear programming and quadratically-constrained quadratic programming, while also yielding efficient solvers, both in theory and in practice. However, there exist known impossibility results for approximating the optimal solution when constraints for covering SDPs arrive in an online fashion. In this paper, we study online covering linear and semidefinite programs in which the algorithm is augmented with advice from a possibly erroneous predictor. We show that if the predictor is accurate, we can efficiently bypass these impossibility results and achieve a constant-factor approximation to the optimal solution, i.e., consistency. On the other hand, if the predictor is inaccurate, under some technical conditions, we achieve results that match both the classical optimal upper bounds and the tight lower bounds up to constant factors, i.e., robustness. More broadly, we introduce a framework that extends both (1) the online set cover problem augmented with machine-learning predictors, studied by Bamas, Maggiori, and Svensson (NeurIPS 2020), and (2) the online covering SDP problem, initiated by Elad, Kale, and Naor (ICALP 2016). Specifically, we obtain general online learning-augmented algorithms for covering linear programs with fractional advice and constraints, and initiate the study of learning-augmented algorithms for covering SDP problems. Our techniques are based on the primal-dual framework of Buchbinder and Naor (Mathematics of Operations Research, 34, 2009) and can be further adjusted to handle constraints where the variables lie in a bounded region, i.e., box constraints.  ( 3 min )
    EEG-Based Epileptic Seizure Prediction Using Temporal Multi-Channel Transformers. (arXiv:2209.11172v1 [eess.SP])
    Epilepsy is one of the most common neurological diseases, characterized by transient and unprovoked events called epileptic seizures. Electroencephalogram (EEG) is an auxiliary method used to perform both the diagnosis and the monitoring of epilepsy. Given the unexpected nature of an epileptic seizure, its prediction would improve patient care, optimizing the quality of life and the treatment of epilepsy. Predicting an epileptic seizure implies the identification of two distinct states of EEG in a patient with epilepsy: the preictal and the interictal. In this paper, we developed two deep learning models called Temporal Multi-Channel Transformer (TMC-T) and Vision Transformer (TMC-ViT), adaptations of Transformer-based architectures for multi-channel temporal signals. Moreover, we accessed the impact of choosing different preictal duration, since its length is not a consensus among experts, and also evaluated how the sample size benefits each model. Our models are compared with fully connected, convolutional, and recurrent networks. The algorithms were patient-specific trained and evaluated on raw EEG signals from the CHB-MIT database. Experimental results and statistical validation demonstrated that our TMC-ViT model surpassed the CNN architecture, state-of-the-art in seizure prediction.  ( 3 min )
    VToonify: Controllable High-Resolution Portrait Video Style Transfer. (arXiv:2209.11224v1 [cs.CV])
    Generating high-quality artistic portrait videos is an important and desirable task in computer graphics and vision. Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious limitations when applied to videos, such as the fixed frame size, the requirement of face alignment, missing non-facial details and temporal inconsistency. In this work, we investigate the challenging controllable high-resolution portrait video style transfer by introducing a novel VToonify framework. Specifically, VToonify leverages the mid- and high-resolution layers of StyleGAN to render high-quality artistic portraits based on the multi-scale content features extracted by an encoder to better preserve the frame details. The resulting fully convolutional architecture accepts non-aligned faces in videos of variable size as input, contributing to complete face regions with natural motions in the output. Our framework is compatible with existing StyleGAN-based image toonification models to extend them to video toonification, and inherits appealing features of these models for flexible style control on color and intensity. This work presents two instantiations of VToonify built upon Toonify and DualStyleGAN for collection-based and exemplar-based portrait video style transfer, respectively. Extensive experimental results demonstrate the effectiveness of our proposed VToonify framework over existing methods in generating high-quality and temporally-coherent artistic portrait videos with flexible style controls.
    Benchmarking Apache Spark and Hadoop MapReduce on Big Data Classification. (arXiv:2209.10637v1 [cs.DC])
    Most of the popular Big Data analytics tools evolved to adapt their working environment to extract valuable information from a vast amount of unstructured data. The ability of data mining techniques to filter this helpful information from Big Data led to the term Big Data Mining. Shifting the scope of data from small-size, structured, and stable data to huge volume, unstructured, and quickly changing data brings many data management challenges. Different tools cope with these challenges in their own way due to their architectural limitations. There are numerous parameters to take into consideration when choosing the right data management framework based on the task at hand. In this paper, we present a comprehensive benchmark for two widely used Big Data analytics tools, namely Apache Spark and Hadoop MapReduce, on a common data mining task, i.e., classification. We employ several evaluation metrics to compare the performance of the benchmarked frameworks, such as execution time, accuracy, and scalability. These metrics are specialized to measure the performance for classification task. To the best of our knowledge, there is no previous study in the literature that employs all these metrics while taking into consideration task-specific concerns. We show that Spark is 5 times faster than MapReduce on training the model. Nevertheless, the performance of Spark degrades when the input workload gets larger. Scaling the environment by additional clusters significantly improves the performance of Spark. However, similar enhancement is not observed in Hadoop. Machine learning utility of MapReduce tend to have better accuracy scores than that of Spark, like around 3%, even in small size data sets.
    Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation. (arXiv:2209.10780v1 [cs.RO])
    Despite decades of research, existing navigation systems still face real-world challenges when deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints from Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers -- a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves >40% better goal reached in cluttered environments and >65% better on social metrics when navigating around humans.
    A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases. (arXiv:2209.11208v1 [cs.LG])
    Learned optimizers -- neural networks that are trained to act as optimizers -- have the potential to dramatically accelerate training of machine learning models. However, even when meta-trained across thousands of tasks at huge computational expense, blackbox learned optimizers often struggle with stability and generalization when applied to tasks unlike those in their meta-training set. In this paper, we use tools from dynamical systems to investigate the inductive biases and stability properties of optimization algorithms, and apply the resulting insights to designing inductive biases for blackbox optimizers. Our investigation begins with a noisy quadratic model, where we characterize conditions in which optimization is stable, in terms of eigenvalues of the training dynamics. We then introduce simple modifications to a learned optimizer's architecture and meta-training procedure which lead to improved stability, and improve the optimizer's inductive bias. We apply the resulting learned optimizer to a variety of neural network training tasks, where it outperforms the current state of the art learned optimizer -- at matched optimizer computational overhead -- with regard to optimization performance and meta-training speed, and is capable of generalization to tasks far different from those it was meta-trained on.
    PREF: Predictability Regularized Neural Motion Fields. (arXiv:2209.10691v1 [cs.CV])
    Knowing the 3D motions in a dynamic scene is essential to many vision applications. Recent progress is mainly focused on estimating the activity of some specific elements like humans. In this paper, we leverage a neural motion field for estimating the motion of all points in a multiview setting. Modeling the motion from a dynamic scene with multiview data is challenging due to the ambiguities in points of similar color and points with time-varying color. We propose to regularize the estimated motion to be predictable. If the motion from previous frames is known, then the motion in the near future should be predictable. Therefore, we introduce a predictability regularization by first conditioning the estimated motion on latent embeddings, then by adopting a predictor network to enforce predictability on the embeddings. The proposed framework PREF (Predictability REgularized Fields) achieves on par or better results than state-of-the-art neural motion field-based dynamic scene representation methods, while requiring no prior knowledge of the scene.  ( 2 min )
    Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games. (arXiv:2205.13746v2 [math.OC] UPDATED)
    We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the Markov game. The regularization introduces structure into the optimization landscape that make the solutions more identifiable and allow the problem to be solved more efficiently. Our main contribution is to show that under proper choices of the regularization parameter, the gradient descent ascent algorithm converges to the Nash equilibrium of the original unregularized problem. We explicitly characterize the finite-time performance of the last iterate of our algorithm, which vastly improves over the existing convergence bound of the gradient descent ascent algorithm without regularization. Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm.
    Parallel Bayesian Optimization of Agent-based Transportation Simulation. (arXiv:2207.05041v1 [cs.LG] CROSS LISTED)
    MATSim (Multi-Agent Transport Simulation Toolkit) is an open source large-scale agent-based transportation planning project applied to various areas like road transport, public transport, freight transport, regional evacuation, etc. BEAM (Behavior, Energy, Autonomy, and Mobility) framework extends MATSim to enable powerful and scalable analysis of urban transportation systems. The agents from the BEAM simulation exhibit 'mode choice' behavior based on multinomial logit model. In our study, we consider eight mode choices viz. bike, car, walk, ride hail, driving to transit, walking to transit, ride hail to transit, and ride hail pooling. The 'alternative specific constants' for each mode choice are critical hyperparameters in a configuration file related to a particular scenario under experimentation. We use the 'Urbansim-10k' BEAM scenario (with 10,000 population size) for all our experiments. Since these hyperparameters affect the simulation in complex ways, manual calibration methods are time consuming. We present a parallel Bayesian optimization method with early stopping rule to achieve fast convergence for the given multi-in-multi-out problem to its optimal configurations. Our model is based on an open source HpBandSter package. This approach combines hierarchy of several 1D Kernel Density Estimators (KDE) with a cheap evaluator (Hyperband, a single multidimensional KDE). Our model has also incorporated extrapolation based early stopping rule. With our model, we could achieve a 25% L1 norm for a large-scale BEAM simulation in fully autonomous manner. To the best of our knowledge, our work is the first of its kind applied to large-scale multi-agent transportation simulations. This work can be useful for surrogate modeling of scenarios with very large populations.
    Nonsmooth Composite Nonconvex-Concave Minimax Optimization. (arXiv:2209.10825v1 [math.OC])
    Nonconvex-concave minimax optimization has received intense interest in machine learning, including learning with robustness to data distribution, learning with non-decomposable loss, adversarial learning, to name a few. Nevertheless, most existing works focus on the gradient-descent-ascent (GDA) variants that can only be applied in smooth settings. In this paper, we consider a family of minimax problems whose objective function enjoys the nonsmooth composite structure in the variable of minimization and is concave in the variables of maximization. By fully exploiting the composite structure, we propose a smoothed proximal linear descent ascent (\textit{smoothed} PLDA) algorithm and further establish its $\mathcal{O}(\epsilon^{-4})$ iteration complexity, which matches that of smoothed GDA~\cite{zhang2020single} under smooth settings. Moreover, under the mild assumption that the objective function satisfies the one-sided Kurdyka-\L{}ojasiewicz condition with exponent $\theta \in (0,1)$, we can further improve the iteration complexity to $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$. To the best of our knowledge, this is the first provably efficient algorithm for nonsmooth nonconvex-concave problems that can achieve the optimal iteration complexity $\mathcal{O}(\epsilon^{-2})$ if $\theta \in (0,1/2]$. As a byproduct, we discuss different stationarity concepts and clarify their relationships quantitatively, which could be of independent interest. Empirically, we illustrate the effectiveness of the proposed smoothed PLDA in variation regularized Wasserstein distributionally robust optimization problems.
    Automated Coronary Calcium Scoring using U-Net Models through Semi-supervised Learning on Non-Gated CT Scans. (arXiv:2206.10455v2 [eess.IV] UPDATED)
    Every year, thousands of innocent people die due to heart attacks. Often undiagnosed heart attacks can hit people by surprise since many current medical plans don't cover the costs to require the searching of calcification on these scans. Only if someone is suspected to have a heart problem, a gated CT scan is taken, otherwise, there's no way for the patient to be aware of a possible heart attack/disease. While nongated CT scans are more periodically taken, it is harder to detect calcification and is usually taken for a purpose other than locating calcification in arteries. In fact, in real time coronary artery calcification scores are only calculated on gated CT scans, not nongated CT scans. After training a unet model on the Coronary Calcium and chest CT's gated scans, it received a DICE coefficient of 0.95 on its untouched test set. This model was used to predict on nongated CT scans, performing with a mean absolute error (MAE) of 674.19 and bucket classification accuracy of 41% (5 classes). Through the analysis of the images and the information stored in the images, mathematical equations were derived and used to automatically crop the images around the location of the heart. By performing semi-supervised learning the new cropped nongated scans were able to closely resemble gated CT scans, improving the performance by 91% in MAE (62.38) and 23% in accuracy.
    Enhancing the Inductive Biases of Graph Neural ODE for Modeling Dynamical Systems. (arXiv:2209.10740v1 [cs.LG])
    Neural networks with physics based inductive biases such as Lagrangian neural networks (LNN), and Hamiltonian neural networks (HNN) learn the dynamics of physical systems by encoding strong inductive biases. Alternatively, Neural ODEs with appropriate inductive biases have also been shown to give similar performances. However, these models, when applied to particle based systems, are transductive in nature and hence, do not generalize to large system sizes. In this paper, we present a graph based neural ODE, GNODE, to learn the time evolution of dynamical systems. Further, we carefully analyse the role of different inductive biases on the performance of GNODE. We show that, similar to LNN and HNN, encoding the constraints explicitly can significantly improve the training efficiency and performance of GNODE significantly. Our experiments also assess the value of additional inductive biases, such as Newtons third law, on the final performance of the model. We demonstrate that inducing these biases can enhance the performance of model by orders of magnitude in terms of both energy violation and rollout error. Interestingly, we observe that the GNODE trained with the most effective inductive biases, namely MCGNODE, outperforms the graph versions of LNN and HNN, namely, Lagrangian graph networks (LGN) and Hamiltonian graph networks (HGN) in terms of energy violation error by approx 4 orders of magnitude for a pendulum system, and approx 2 orders of magnitude for spring systems. These results suggest that competitive performances with energy conserving neural networks can be obtained for NODE based systems by inducing appropriate inductive biases.
    Personalizing or Not: Dynamically Personalized Federated Learning with Incentives. (arXiv:2208.06192v2 [cs.LG] UPDATED)
    Personalized federated learning (FL) facilitates collaborations between multiple clients to learn personalized models without sharing private data. The mechanism mitigates the statistical heterogeneity commonly encountered in the system, i.e., non-IID data over different clients. Existing personalized algorithms generally assume all clients volunteer for personalization. However, potential participants might still be reluctant to personalize models since they might not work well. In this case, clients choose to use the global model instead. To avoid making unrealistic assumptions, we introduce the personalization rate, measured as the fraction of clients willing to train personalized models, into federated settings and propose DyPFL. This dynamically personalized FL technique incentivizes clients to participate in personalizing local models while allowing the adoption of the global model when it performs better. We show that the algorithmic pipeline in DyPFL guarantees good convergence performance, allowing it to outperform alternative personalized methods in a broad range of conditions, including variation in heterogeneity, number of clients, local epochs, and batch sizes.
    Human Treelike Tubular Structure Segmentation: A Comprehensive Review and Future Perspectives. (arXiv:2207.11203v2 [eess.IV] UPDATED)
    Various structures in human physiology follow a treelike morphology, which often expresses complexity at very fine scales. Examples of such structures are intrathoracic airways, retinal blood vessels, and hepatic blood vessels. Large collections of 2D and 3D images have been made available by medical imaging modalities such as magnetic resonance imaging (MRI), computed tomography (CT), Optical coherence tomography (OCT) and ultrasound in which the spatial arrangement can be observed. Segmentation of these structures in medical imaging is of great importance since the analysis of the structure provides insights into disease diagnosis, treatment planning, and prognosis. Manually labelling extensive data by radiologists is often time-consuming and error-prone. As a result, automated or semi-automated computational models have become a popular research field of medical imaging in the past two decades, and many have been developed to date. In this survey, we aim to provide a comprehensive review of currently publicly available datasets, segmentation algorithms, and evaluation metrics. In addition, current challenges and future research directions are discussed.
    Batch Bayesian optimisation via density-ratio estimation with guarantees. (arXiv:2209.10715v1 [cs.LG])
    Bayesian optimisation (BO) algorithms have shown remarkable success in applications involving expensive black-box functions. Traditionally BO has been set as a sequential decision-making process which estimates the utility of query points via an acquisition function and a prior over functions, such as a Gaussian process. Recently, however, a reformulation of BO via density-ratio estimation (BORE) allowed reinterpreting the acquisition function as a probabilistic binary classifier, removing the need for an explicit prior over functions and increasing scalability. In this paper, we present a theoretical analysis of BORE's regret and an extension of the algorithm with improved uncertainty estimates. We also show that BORE can be naturally extended to a batch optimisation setting by recasting the problem as approximate Bayesian inference. The resulting algorithm comes equipped with theoretical performance guarantees and is assessed against other batch BO baselines in a series of experiments.
    Turning Normalizing Flows into Monge Maps with Geodesic Gaussian Preserving Flows. (arXiv:2209.10873v1 [cs.LG])
    Normalizing Flows (NF) are powerful likelihood-based generative models that are able to trade off between expressivity and tractability to model complex densities. A now well established research avenue leverages optimal transport (OT) and looks for Monge maps, i.e. models with minimal effort between the source and target distributions. This paper introduces a method based on Brenier's polar factorization theorem to transform any trained NF into a more OT-efficient version without changing the final density. We do so by learning a rearrangement of the source (Gaussian) distribution that minimizes the OT cost between the source and the final density. We further constrain the path leading to the estimated Monge map to lie on a geodesic in the space of volume-preserving diffeomorphisms thanks to Euler's equations. The proposed method leads to smooth flows with reduced OT cost for several existing models without affecting the model performance.
    Equivariant Transporter Network. (arXiv:2202.09400v5 [cs.RO] CROSS LISTED)
    Transporter Net is a recently proposed framework for pick and place that is able to learn good manipulation policies from a very few expert demonstrations. A key reason why Transporter Net is so sample efficient is that the model incorporates rotational equivariance into the pick module, i.e. the model immediately generalizes learned pick knowledge to objects presented in different orientations. This paper proposes a novel version of Transporter Net that is equivariant to both pick and place orientation. As a result, our model immediately generalizes place knowledge to different place orientations in addition to generalizing pick knowledge as before. Ultimately, our new model is more sample efficient and achieves better pick and place success rates than the baseline Transporter Net model.
    Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation. (arXiv:2209.10634v1 [q-bio.NC])
    Early sensory systems in the brain rapidly adapt to fluctuating input statistics, which requires recurrent communication between neurons. Mechanistically, such recurrent communication is often indirect and mediated by local interneurons. In this work, we explore the computational benefits of mediating recurrent communication via interneurons compared with direct recurrent connections. To this end, we consider two mathematically tractable recurrent neural networks that statistically whiten their inputs -- one with direct recurrent connections and the other with interneurons that mediate recurrent communication. By analyzing the corresponding continuous synaptic dynamics and numerically simulating the networks, we show that the network with interneurons is more robust to initialization than the network with direct recurrent connections in the sense that the convergence time for the synaptic dynamics in the network with interneurons (resp. direct recurrent connections) scales logarithmically (resp. linearly) with the spectrum of their initialization. Our results suggest that interneurons are computationally useful for rapid adaptation to changing input statistics. Interestingly, the network with interneurons is an overparameterized solution of the whitening objective for the network with direct recurrent connections, so our results can be viewed as a recurrent neural network analogue of the implicit acceleration phenomenon observed in overparameterized feedforward linear networks.
    ASK: Adversarial Soft k-Nearest Neighbor Attack and Defense. (arXiv:2106.14300v3 [cs.LG] UPDATED)
    K-Nearest Neighbor (kNN)-based deep learning methods have been applied to many applications due to their simplicity and geometric interpretability. However, the robustness of kNN-based classification models has not been thoroughly explored and kNN attack strategies are underdeveloped. In this paper, we propose an Adversarial Soft kNN (ASK) loss to both design more effective kNN attack strategies and to develop better defenses against them. Our ASK loss approach has two advantages. First, ASK loss can better approximate the kNN's probability of classification error than objectives proposed in previous works. Second, the ASK loss is interpretable: it preserves the mutual information between the perturbed input and the in-class-reference data. We use the ASK loss to generate a novel attack method called the ASK-Attack (ASK-Atk), which shows superior attack efficiency and accuracy degradation relative to previous kNN attacks. Based on the ASK-Atk, we then derive an ASK-\underline{Def}ense (ASK-Def) method that optimizes the worst-case training loss induced by ASK-Atk. Experiments on CIFAR-10 (ImageNet) show that (i) ASK-Atk achieves $\geq 13\%$ ($\geq 13\%$) improvement in attack success rate over previous kNN attacks, and (ii) ASK-Def outperforms the conventional adversarial training method by $\geq 6.9\%$ ($\geq 3.5\%$) in terms of robustness improvement.
    Leveraging Joint-Diagonalization in Transform-Learning NMF. (arXiv:2112.05664v3 [cs.LG] UPDATED)
    Non-negative matrix factorization with transform learning (TL-NMF) is a recent idea that aims at learning data representations suited to NMF. In this work, we relate TL-NMF to the classical matrix joint-diagonalization (JD) problem. We show that, when the number of data realizations is sufficiently large, TL-NMF can be replaced by a two-step approach -- termed as JD+NMF -- that estimates the transform through JD, prior to NMF computation. In contrast, we found that when the number of data realizations is limited, not only is JD+NMF no longer equivalent to TL-NMF, but the inherent low-rank constraint of TL-NMF turns out to be an essential ingredient to learn meaningful transforms for NMF.
    Reversible Gromov-Monge Sampler for Simulation-Based Inference. (arXiv:2109.14090v3 [stat.ME] UPDATED)
    This paper introduces a new simulation-based inference procedure to model and sample from multi-dimensional probability distributions given access to i.i.d.\ samples, circumventing the usual approaches of explicitly modeling the density function or designing Markov chain Monte Carlo. Motivated by the seminal work on distance and isomorphism between metric measure spaces, we propose a new notion called the Reversible Gromov-Monge (RGM) distance and study how RGM can be used to design new transform samplers to perform simulation-based inference. Our RGM sampler can also estimate optimal alignments between two heterogeneous metric measure spaces $(\cX, \mu, c_{\cX})$ and $(\cY, \nu, c_{\cY})$ from empirical data sets, with estimated maps that approximately push forward one measure $\mu$ to the other $\nu$, and vice versa. We study the analytic properties of the RGM distance and derive that under mild conditions, RGM equals the classic Gromov-Wasserstein distance. Curiously, drawing a connection to Brenier's polar factorization, we show that the RGM sampler induces bias towards strong isomorphism with proper choices of $c_{\cX}$ and $c_{\cY}$. Statistical rate of convergence, representation, and optimization questions regarding the induced sampler are studied. Synthetic and real-world examples showcasing the effectiveness of the RGM sampler are also demonstrated.  ( 3 min )
    CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement. (arXiv:2209.11112v1 [cs.SD])
    Convolution-augmented transformers (Conformers) are recently proposed in various speech-domain applications, such as automatic speech recognition (ASR) and speech separation, as they can capture both local and global dependencies. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for speech enhancement (SE) in the time-frequency (TF) domain. The generator encodes the magnitude and complex spectrogram information using two-stage conformer blocks to model both time and frequency dependencies. The decoder then decouples the estimation into a magnitude mask decoder branch to filter out unwanted distortions and a complex refinement branch to further improve the magnitude estimation and implicitly enhance the phase information. Additionally, we include a metric discriminator to alleviate metric mismatch by optimizing the generator with respect to a corresponding evaluation score. Objective and subjective evaluations illustrate that CMGAN is able to show superior performance compared to state-of-the-art methods in three speech enhancement tasks (denoising, dereverberation and super-resolution). For instance, quantitative denoising analysis on Voice Bank+DEMAND dataset indicates that CMGAN outperforms various previous models with a margin, i.e., PESQ of 3.41 and SSNR of 11.10 dB.  ( 2 min )
    Word-Level Fine-Grained Story Visualization. (arXiv:2208.02341v3 [cs.CV] UPDATED)
    Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters. Current works still struggle with output images' quality and consistency, and rely on additional semantic information or auxiliary captioning networks. To address these challenges, we first introduce a new sentence representation, which incorporates word information from all story sentences to mitigate the inconsistency problem. Then, we propose a new discriminator with fusion features and further extend the spatial attention to improve image quality and story consistency. Extensive experiments on different datasets and human evaluation demonstrate the superior performance of our approach, compared to state-of-the-art methods, neither using segmentation masks nor auxiliary captioning networks.  ( 2 min )
    Explaining Anomalies using Denoising Autoencoders for Financial Tabular Data. (arXiv:2209.10658v1 [cs.LG])
    Recent advances in Explainable AI (XAI) increased the demand for deployment of safe and interpretable AI models in various industry sectors. Despite the latest success of deep neural networks in a variety of domains, understanding the decision-making process of such complex models still remains a challenging task for domain experts. Especially in the financial domain, merely pointing to an anomaly composed of often hundreds of mixed type columns, has limited value for experts. Hence, in this paper, we propose a framework for explaining anomalies using denoising autoencoders designed for mixed type tabular data. We specifically focus our technique on anomalies that are erroneous observations. This is achieved by localizing individual sample columns (cells) with potential errors and assigning corresponding confidence scores. In addition, the model provides the expected cell value estimates to fix the errors. We evaluate our approach based on three standard public tabular datasets (Credit Default, Adult, IEEE Fraud) and one proprietary dataset (Holdings). We find that denoising autoencoders applied to this task already outperform other approaches in the cell error detection rates as well as in the expected value rates. Additionally, we analyze how a specialized loss designed for cell error detection can further improve these metrics. Our framework is designed for a domain expert to understand abnormal characteristics of an anomaly, as well as to improve in-house data quality management processes.
    Entropic Descent Archetypal Analysis for Blind Hyperspectral Unmixing. (arXiv:2209.11002v1 [eess.IV])
    In this paper, we introduce a new algorithm based on archetypal analysis for blind hyperspectral unmixing, assuming linear mixing of endmembers. Archetypal analysis is a natural formulation for this task. This method does not require the presence of pure pixels (i.e., pixels containing a single material) but instead represents endmembers as convex combinations of a few pixels present in the original hyperspectral image. Our approach leverages an entropic gradient descent strategy, which (i) provides better solutions for hyperspectral unmixing than traditional archetypal analysis algorithms, and (ii) leads to efficient GPU implementations. Since running a single instance of our algorithm is fast, we also propose an ensembling mechanism along with an appropriate model selection procedure that make our method robust to hyper-parameter choices while keeping the computational complexity reasonable. By using six standard real datasets, we show that our approach outperforms state-of-the-art matrix factorization and recent deep learning methods. We also provide an open-source PyTorch implementation: https://github.com/inria-thoth/EDAA.
    Beyond Voxel Prediction Uncertainty: Identifying brain lesions you can trust. (arXiv:2209.10877v1 [eess.IV])
    Deep neural networks have become the gold-standard approach for the automated segmentation of 3D medical images. Their full acceptance by clinicians remains however hampered by the lack of intelligible uncertainty assessment of the provided results. Most approaches to quantify their uncertainty, such as the popular Monte Carlo dropout, restrict to some measure of uncertainty in prediction at the voxel level. In addition not to be clearly related to genuine medical uncertainty, this is not clinically satisfying as most objects of interest (e.g. brain lesions) are made of groups of voxels whose overall relevance may not simply reduce to the sum or mean of their individual uncertainties. In this work, we propose to go beyond voxel-wise assessment using an innovative Graph Neural Network approach, trained from the outputs of a Monte Carlo dropout model. This network allows the fusion of three estimators of voxel uncertainty: entropy, variance, and model's confidence; and can be applied to any lesion, regardless of its shape or size. We demonstrate the superiority of our approach for uncertainty estimate on a task of Multiple Sclerosis lesions segmentation.
    Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation. (arXiv:2205.12460v3 [stat.ME] UPDATED)
    Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for $K$-class problems (Wu, Zhang and Liu, 2010; Wang, Zhang and Wu, 2019), where $K$ is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in $K$. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in $K$. Though not being most efficient in computation, the OVA offers the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate finite sample performance.
    OLIVES Dataset: Ophthalmic Labels for Investigating Visual Eye Semantics. (arXiv:2209.11195v1 [eess.IV])
    Clinical diagnosis of the eye is performed over multifarious data modalities including scalar clinical labels, vectorized biomarkers, two-dimensional fundus images, and three-dimensional Optical Coherence Tomography (OCT) scans. Clinical practitioners use all available data modalities for diagnosing and treating eye diseases like Diabetic Retinopathy (DR) or Diabetic Macular Edema (DME). Enabling usage of machine learning algorithms within the ophthalmic medical domain requires research into the relationships and interactions between all relevant data over a treatment period. Existing datasets are limited in that they neither provide data nor consider the explicit relationship modeling between the data modalities. In this paper, we introduce the Ophthalmic Labels for Investigating Visual Eye Semantics (OLIVES) dataset that addresses the above limitation. This is the first OCT and near-IR fundus dataset that includes clinical labels, biomarker labels, disease labels, and time-series patient treatment information from associated clinical trials. The dataset consists of 1268 near-IR fundus images each with at least 49 OCT scans, and 16 biomarkers, along with 4 clinical labels and a disease diagnosis of DR or DME. In total, there are 96 eyes' data averaged over a period of at least two years with each eye treated for an average of 66 weeks and 7 injections. We benchmark the utility of OLIVES dataset for ophthalmic data as well as provide benchmarks and concrete research directions for core and emerging machine learning paradigms within medical image analysis.
    Fault Detection in Ball Bearings. (arXiv:2209.11041v1 [eess.SP])
    Ball bearing joints are a critical component in all rotating machinery, and detecting and locating faults in these joints is a significant problem in industry and research. Intelligent fault detection (IFD) is the process of applying machine learning and other statistical methods to monitor the health states of machines. This paper explores the construction of vibration images, a preprocessing technique that has been previously used to train convolutional neural networks for ball bearing joint IFD. The main results demonstrate the robustness of this technique by applying it to a larger dataset than previously used and exploring the hyperparameters used in constructing the vibration images.  ( 2 min )
    Seen to Unseen: When Fuzzy Inference System Predicts IoT Device Positioning Labels That Had Not Appeared in Training Phase. (arXiv:2209.10627v1 [cs.LG])
    Situating at the core of Artificial Intelligence (AI), Machine Learning (ML), and more specifically, Deep Learning (DL) have embraced great success in the past two decades. However, unseen class label prediction is far less explored due to missing classes being invisible in training ML or DL models. In this work, we propose a fuzzy inference system to cope with such a challenge by adopting TSK+ fuzzy inference engine in conjunction with the Curvature-based Feature Selection (CFS) method. The practical feasibility of our system has been evaluated by predicting the positioning labels of networking devices within the realm of the Internet of Things (IoT). Competitive prediction performance confirms the efficiency and efficacy of our system, especially when a large number of continuous class labels are unseen during the model training stage.  ( 2 min )
    Optimization with Constraint Learning: A Framework and Survey. (arXiv:2110.02121v2 [cs.LG] UPDATED)
    Many real-life optimization problems frequently contain one or more constraints or objectives for which there are no explicit formulas. If data is however available, these data can be used to learn the constraints. The benefits of this approach are clearly seen, however there is a need for this process to be carried out in a structured manner. This paper therefore provides a framework for Optimization with Constraint Learning (OCL) which we believe will help to formalize and direct the process of learning constraints from data. This framework includes the following steps: (i) setup of the conceptual optimization model, (ii) data gathering and preprocessing, (iii) selection and training of predictive models, (iv) resolution of the optimization model, and (v) verification and improvement of the optimization model. We then review the recent OCL literature in light of this framework, and highlight current trends, as well as areas for future research.
    Neural Generalized Ordinary Differential Equations with Layer-varying Parameters. (arXiv:2209.10633v1 [cs.LG])
    Deep residual networks (ResNets) have shown state-of-the-art performance in various real-world applications. Recently, the ResNets model was reparameterized and interpreted as solutions to a continuous ordinary differential equation or Neural-ODE model. In this study, we propose a neural generalized ordinary differential equation (Neural-GODE) model with layer-varying parameters to further extend the Neural-ODE to approximate the discrete ResNets. Specifically, we use nonparametric B-spline functions to parameterize the Neural-GODE so that the trade-off between the model complexity and computational efficiency can be easily balanced. It is demonstrated that ResNets and Neural-ODE models are special cases of the proposed Neural-GODE model. Based on two benchmark datasets, MNIST and CIFAR-10, we show that the layer-varying Neural-GODE is more flexible and general than the standard Neural-ODE. Furthermore, the Neural-GODE enjoys the computational and memory benefits while performing comparably to ResNets in prediction accuracy.  ( 2 min )
    Uncertainty-aware Perception Models for Off-road Autonomous Unmanned Ground Vehicles. (arXiv:2209.11115v1 [cs.RO])
    Off-road autonomous unmanned ground vehicles (UGVs) are being developed for military and commercial use to deliver crucial supplies in remote locations, help with mapping and surveillance, and to assist war-fighters in contested environments. Due to complexity of the off-road environments and variability in terrain, lighting conditions, diurnal and seasonal changes, the models used to perceive the environment must handle a lot of input variability. Current datasets used to train perception models for off-road autonomous navigation lack of diversity in seasons, locations, semantic classes, as well as time of day. We test the hypothesis that model trained on a single dataset may not generalize to other off-road navigation datasets and new locations due to the input distribution drift. Additionally, we investigate how to combine multiple datasets to train a semantic segmentation-based environment perception model and we show that training the model to capture uncertainty could improve the model performance by a significant margin. We extend the Masksembles approach for uncertainty quantification to the semantic segmentation task and compare it with Monte Carlo Dropout and standard baselines. Finally, we test the approach against data collected from a UGV platform in a new testing environment. We show that the developed perception model with uncertainty quantification can be feasibly deployed on an UGV to support online perception and navigation tasks.
    STING: Self-attention based Time-series Imputation Networks using GAN. (arXiv:2209.10801v1 [cs.LG])
    Time series data are ubiquitous in real-world applications. However, one of the most common problems is that the time series data could have missing values by the inherent nature of the data collection process. So imputing missing values from multivariate (correlated) time series data is imperative to improve a prediction performance while making an accurate data-driven decision. Conventional works for imputation simply delete missing values or fill them based on mean/zero. Although recent works based on deep neural networks have shown remarkable results, they still have a limitation to capture the complex generation process of the multivariate time series. In this paper, we propose a novel imputation method for multivariate time series data, called STING (Self-attention based Time-series Imputation Networks using GAN). We take advantage of generative adversarial networks and bidirectional recurrent neural networks to learn latent representations of the time series. In addition, we introduce a novel attention mechanism to capture the weighted correlations of the whole sequence and avoid potential bias brought by unrelated ones. Experimental results on three real-world datasets demonstrate that STING outperforms the existing state-of-the-art methods in terms of imputation accuracy as well as downstream tasks with the imputed values therein.
    In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning. (arXiv:2209.10732v1 [cs.LG])
    When learning from sensitive data, care must be taken to ensure that training algorithms address privacy concerns. The canonical Private Aggregation of Teacher Ensembles, or PATE, computes output labels by aggregating the predictions of a (possibly distributed) collection of teacher models via a voting mechanism. The mechanism adds noise to attain a differential privacy guarantee with respect to the teachers' training data. In this work, we observe that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information. For a given input, our adversary exploits this stochasticity to extract high-fidelity histograms of the votes submitted by the underlying teachers. From these histograms, the adversary can learn sensitive attributes of the input such as race, gender, or age. Although this attack does not directly violate the differential privacy guarantee, it clearly violates privacy norms and expectations, and would not be possible at all without the noise inserted to obtain differential privacy. In fact, counter-intuitively, the attack becomes easier as we add more noise to provide stronger differential privacy. We hope this encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.
    DRAMA: Joint Risk Localization and Captioning in Driving. (arXiv:2209.10767v1 [cs.CV])
    Considering the functionality of situational awareness in safety-critical automation systems, the perception of risk in driving scenes and its explainability is of particular importance for autonomous and cooperative driving. Toward this goal, this paper proposes a new research direction of joint risk localization in driving scenes and its risk explanation as a natural language description. Due to the lack of standard benchmarks, we collected a large-scale dataset, DRAMA (Driving Risk Assessment Mechanism with A captioning module), which consists of 17,785 interactive driving scenarios collected in Tokyo, Japan. Our DRAMA dataset accommodates video- and object-level questions on driving risks with associated important objects to achieve the goal of visual captioning as a free-form language description utilizing closed and open-ended responses for multi-level questions, which can be used to evaluate a range of visual captioning capabilities in driving scenarios. We make this data available to the community for further research. Using DRAMA, we explore multiple facets of joint risk localization and captioning in interactive driving scenarios. In particular, we benchmark various multi-task prediction architectures and provide a detailed analysis of joint risk localization and risk captioning. The data set is available at https://usa.honda-ri.com/drama
    IntereStyle: Encoding an Interest Region for Robust StyleGAN Inversion. (arXiv:2209.10811v1 [cs.CV])
    Recently, manipulation of real-world images has been highly elaborated along with the development of Generative Adversarial Networks (GANs) and corresponding encoders, which embed real-world images into the latent space. However, designing encoders of GAN still remains a challenging task due to the trade-off between distortion and perception. In this paper, we point out that the existing encoders try to lower the distortion not only on the interest region, e.g., human facial region but also on the uninterest region, e.g., background patterns and obstacles. However, most uninterest regions in real-world images are located at out-of-distribution (OOD), which are infeasible to be ideally reconstructed by generative models. Moreover, we empirically find that the uninterest region overlapped with the interest region can mangle the original feature of the interest region, e.g., a microphone overlapped with a facial region is inverted into the white beard. As a result, lowering the distortion of the whole image while maintaining the perceptual quality is very challenging. To overcome this trade-off, we propose a simple yet effective encoder training scheme, coined IntereStyle, which facilitates encoding by focusing on the interest region. IntereStyle steers the encoder to disentangle the encodings of the interest and uninterest regions. To this end, we filter the information of the uninterest region iteratively to regulate the negative impact of the uninterest region. We demonstrate that IntereStyle achieves both lower distortion and higher perceptual quality compared to the existing state-of-the-art encoders. Especially, our model robustly conserves features of the original images, which shows the robust image editing and style mixing results. We will release our code with the pre-trained model after the review.
    Common human diseases prediction using machine learning based on survey data. (arXiv:2209.10750v1 [cs.LG])
    In this era, the moment has arrived to move away from disease as the primary emphasis of medical treatment. Although impressive, the multiple techniques that have been developed to detect the diseases. In this time, there are some types of diseases COVID-19, normal flue, migraine, lung disease, heart disease, kidney disease, diabetics, stomach disease, gastric, bone disease, autism are the very common diseases. In this analysis, we analyze disease symptoms and have done disease predictions based on their symptoms. We studied a range of symptoms and took a survey from people in order to complete the task. Several classification algorithms have been employed to train the model. Furthermore, performance evaluation matrices are used to measure the model's performance. Finally, we discovered that the part classifier surpasses the others.
    Pixel VQ-VAEs for Improved Pixel Art Representation. (arXiv:2203.12130v2 [cs.CV] UPDATED)
    Machine learning has had a great deal of success in image processing. However, the focus of this work has largely been on realistic images, ignoring more niche art styles such as pixel art. Additionally, many traditional machine learning models that focus on groups of pixels do not work well with pixel art, where individual pixels are important. We propose the Pixel VQ-VAE, a specialized VQ-VAE model that learns representations of pixel art. We show that it outperforms other models in both the quality of embeddings as well as performance on downstream tasks.
    Deep Learning on Home Drone: Searching for the Optimal Architecture. (arXiv:2209.11064v1 [cs.CV])
    We suggest the first system that runs real-time semantic segmentation via deep learning on a weak micro-computer such as the Raspberry Pi Zero v2 (whose price was \$15) attached to a toy-drone. In particular, since the Raspberry Pi weighs less than $16$ grams, and its size is half of a credit card, we could easily attach it to the common commercial DJI Tello toy-drone (<\$100, <90 grams, 98 $\times$ 92.5 $\times$ 41 mm). The result is an autonomous drone (no laptop nor human in the loop) that can detect and classify objects in real-time from a video stream of an on-board monocular RGB camera (no GPS or LIDAR sensors). The companion videos demonstrate how this Tello drone scans the lab for people (e.g. for the use of firefighters or security forces) and for an empty parking slot outside the lab. Existing deep learning solutions are either much too slow for real-time computation on such IoT devices, or provide results of impractical quality. Our main challenge was to design a system that takes the best of all worlds among numerous combinations of networks, deep learning platforms/frameworks, compression techniques, and compression ratios. To this end, we provide an efficient searching algorithm that aims to find the optimal combination which results in the best tradeoff between the network running time and its accuracy/performance.
    Amortized Variational Inference: Towards the Mathematical Foundation and Review. (arXiv:2209.10888v1 [cs.LG])
    The core principle of Variational Inference (VI) is to convert the statistical inference problem of computing complex posterior probability densities into a tractable optimization problem. This property enables VI to be faster than several sampling-based techniques. However, the traditional VI algorithm is not scalable to large data sets and is unable to readily infer out-of-bounds data points without re-running the optimization process. Recent developments in the field, like stochastic-, black box- and amortized-VI, have helped address these issues. Generative modeling tasks nowadays widely make use of amortized VI for its efficiency and scalability, as it utilizes a parameterized function to learn the approximate posterior density parameters. With this paper, we review the mathematical foundations of various VI techniques to form the basis for understanding amortized VI. Additionally, we provide an overview of the recent trends that address several issues of amortized VI, such as the amortization gap, generalization issues, inconsistent representation learning, and posterior collapse. Finally, we analyze alternate divergence measures that improve VI optimization.
    Contrastive Learning for Time Series on Dynamic Graphs. (arXiv:2209.10662v1 [cs.LG])
    There have been several recent efforts towards developing representations for multivariate time-series in an unsupervised learning framework. Such representations can prove beneficial in tasks such as activity recognition, health monitoring, and anomaly detection. In this paper, we consider a setting where we observe time-series at each node in a dynamic graph. We propose a framework called GraphTNC for unsupervised learning of joint representations of the graph and the time-series. Our approach employs a contrastive learning strategy. Based on an assumption that the time-series and graph evolution dynamics are piecewise smooth, we identify local windows of time where the signals exhibit approximate stationarity. We then train an encoding that allows the distribution of signals within a neighborhood to be distinguished from the distribution of non-neighboring signals. We first demonstrate the performance of our proposed framework using synthetic data, and subsequently we show that it can prove beneficial for the classification task with real-world datasets.
    Enhanced Decentralized Federated Learning based on Consensus in Connected Vehicles. (arXiv:2209.10722v1 [cs.LG])
    Advanced researches on connected vehicles have recently targeted to the integration of vehicle-to-everything (V2X) networks with Machine Learning (ML) tools and distributed decision making. Federated learning (FL) is emerging as a new paradigm to train machine learning (ML) models in distributed systems, including vehicles in V2X networks. Rather than sharing and uploading the training data to the server, the updating of model parameters (e.g., neural networks' weights and biases) is applied by large populations of interconnected vehicles, acting as local learners. Despite these benefits, the limitation of existing approaches is the centralized optimization which relies on a server for aggregation and fusion of local parameters, leading to the drawback of a single point of failure and scaling issues for increasing V2X network size. Meanwhile, in intelligent transport scenarios, data collected from onboard sensors are redundant, which degrades the performance of aggregation. To tackle these problems, we explore a novel idea of decentralized data processing and introduce a federated learning framework for in-network vehicles, C-DFL(Consensus based Decentralized Federated Learning), to tackle federated learning on connected vehicles and improve learning quality. Extensive simulations have been implemented to evaluate the performance of C-DFL, that demonstrates C-DFL outperforms the performance of conventional methods in all cases.
    U-Sleep: resilient to AASM guidelines. (arXiv:2209.11173v1 [eess.SP])
    AASM guidelines are the results of decades of efforts to try to standardize the sleep scoring procedure as to have a commonly used methodology. The guidelines cover several aspects from the technical/digital specifications, e.g., recommended EEG derivations, to the sleep scoring rules, e.g., different rules for adults, children and infants. In the context of sleep scoring automation, in the last decades, deep learning has demonstrated better performance compared to many other approaches. In most of the cases, clinical knowledge and guidelines have been exploited to support the automated sleep scoring algorithms in solving the task. In this paper we show that, actually, a deep learning based sleep scoring algorithm may not need to fully exploit the clinical knowledge or to strictly follow the AASM guidelines. Specifically, we demonstrate that U-Sleep, a state-of-the-art sleep scoring algorithm, can be strong enough to solve the scoring task even using clinically non-recommended or non-conventional derivations, and with no need to exploit information about the chronological age of the subjects. We finally strengthen a well-known finding that using data from multiple data centers always results in a better performing model compared with training on a single cohort. Indeed, we show that this latter statement is still valid even by increasing the size and the heterogeneity of the single data cohort. In all our experiments we used 28528 polysomnography studies from 13 different clinical studies.
    Pretraining the Vision Transformer using self-supervised methods for vision based Deep Reinforcement Learning. (arXiv:2209.10901v1 [cs.LG])
    The Vision Transformer architecture has shown to be competitive in the computer vision (CV) space where it has dethroned convolution-based networks in several benchmarks. Nevertheless, Convolutional Neural Networks (CNN) remain the preferential architecture for the representation module in Reinforcement Learning. In this work, we study pretraining a Vision Transformer using several state-of-the-art self-supervised methods and assess data-efficiency gains from this training framework. We propose a new self-supervised learning method called TOV-VICReg that extends VICReg to better capture temporal relations between observations by adding a temporal order verification task. Furthermore, we evaluate the resultant encoders with Atari games in a sample-efficiency regime. Our results show that the vision transformer, when pretrained with TOV-VICReg, outperforms the other self-supervised methods but still struggles to overcome a CNN. Nevertheless, we were able to outperform a CNN in two of the ten games where we perform a 100k steps evaluation. Ultimately, we believe that such approaches in Deep Reinforcement Learning (DRL) might be the key to achieving new levels of performance as seen in natural language processing and computer vision. Source code will be available at: https://github.com/mgoulao/TOV-VICReg
    Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning. (arXiv:2207.05742v2 [cs.LG] UPDATED)
    In lifelong learning, an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environment dynamics. These non-stationarities are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts. Our code is available at: https://github.com/ml-jku/reactive-exploration.
    Vanilla feedforward neural networks as a discretization of dynamic systems. (arXiv:2209.10909v1 [cs.LG])
    Deep learning has made significant applications in the field of data science and natural science. Some studies have linked deep neural networks to dynamic systems, but the network structure is restricted to the residual network. It is known that residual networks can be regarded as a numerical discretization of dynamic systems. In this paper, we back to the classical network structure and prove that the vanilla feedforward networks could also be a numerical discretization of dynamic systems, where the width of the network is equal to the dimension of the input and output. Our proof is based on the properties of the leaky-ReLU function and the numerical technique of splitting method to solve differential equations. Our results could provide a new perspective for understanding the approximation properties of feedforward neural networks.
    Memory-Augmented Graph Neural Networks: A Neuroscience Perspective. (arXiv:2209.10818v1 [cs.LG])
    Graph neural networks (GNNs) have been extensively used for many domains where data are represented as graphs, including social networks, recommender systems, biology, chemistry, etc. Recently, the expressive power of GNNs has drawn much interest. It has been shown that, despite the promising empirical results achieved by GNNs for many applications, there are some limitations in GNNs that hinder their performance for some tasks. For example, since GNNs update node features mainly based on local information, they have limited expressive power in capturing long-range dependencies among nodes in graphs. To address some of the limitations of GNNs, several recent works started to explore augmenting GNNs with memory for improving their expressive power in the relevant tasks. In this paper, we provide a comprehensive review of the existing literature of memory-augmented GNNs. We review these works through the lens of psychology and neuroscience, which has established multiple memory systems and mechanisms in biological brains. We propose a taxonomy of the memory GNN works, as well as a set of criteria for comparing the memory mechanisms. We also provide critical discussions on the limitations of these works. Finally, we discuss the challenges and future directions for this area.
    Modeling Perceptual Loudness of Piano Tone: Theory and Applications. (arXiv:2209.10674v1 [cs.SD])
    The relationship between perceptual loudness and physical attributes of sound is an important subject in both computer music and psychoacoustics. Early studies of "equal-loudness contour" can trace back to the 1920s and the measured loudness with respect to intensity and frequency has been revised many times since then. However, most studies merely focus on synthesized sound, and the induced theories on natural tones with complex timbre have rarely been justified. To this end, we investigate both theory and applications of natural-tone loudness perception in this paper via modeling piano tone. The theory part contains: 1) an accurate measurement of piano-tone equal-loudness contour of pitches, and 2) a machine-learning model capable of inferring loudness purely based on spectral features trained on human subject measurements. As for the application, we apply our theory to piano control transfer, in which we adjust the MIDI velocities on two different player pianos (in different acoustic environments) to achieve the same perceptual effect. Experiments show that both our theoretical loudness modeling and the corresponding performance control transfer algorithm significantly outperform their baselines.
    Beyond Heisenberg Limit Quantum Metrology through Quantum Signal Processing. (arXiv:2209.11207v1 [quant-ph])
    Leveraging quantum effects in metrology such as entanglement and coherence allows one to measure parameters with enhanced sensitivity. However, time-dependent noise can disrupt such Heisenberg-limited amplification. We propose a quantum-metrology method based on the quantum-signal-processing framework to overcome these realistic noise-induced limitations in practical quantum metrology. Our algorithm separates the gate parameter $\varphi$~(single-qubit Z phase) that is susceptible to time-dependent error from the target gate parameter $\theta$~(swap-angle between |10> and |01> states) that is largely free of time-dependent error. Our method achieves an accuracy of $10^{-4}$ radians in standard deviation for learning $\theta$ in superconducting-qubit experiments, outperforming existing alternative schemes by two orders of magnitude. We also demonstrate the increased robustness in learning time-dependent gate parameters through fast Fourier transformation and sequential phase difference. We show both theoretically and numerically that there is an interesting transition of the optimal metrology variance scaling as a function of circuit depth $d$ from the pre-asymptotic regime $d \ll 1/\theta$ to Heisenberg limit $d \to \infty$. Remarkably, in the pre-asymptotic regime our method's estimation variance on time-sensitive parameter $\varphi$ scales faster than the asymptotic Heisenberg limit as a function of depth, $\text{Var}(\hat{\varphi})\approx 1/d^4$. Our work is the first quantum-signal-processing algorithm that demonstrates practical application in laboratory quantum computers.
    First-order Policy Optimization for Robust Markov Decision Process. (arXiv:2209.10579v1 [cs.LG])
    We consider the problem of solving robust Markov decision process (MDP), which involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels. The goal of planning is to find a robust policy that optimizes the worst-case values against the transition uncertainties, and thus encompasses the standard MDP planning as a special case. For $(\mathbf{s},\mathbf{a})$-rectangular uncertainty sets, we develop a policy-based first-order method, namely the robust policy mirror descent (RPMD), and establish an $\mathcal{O}(\log(1/\epsilon))$ and $\mathcal{O}(1/\epsilon)$ iteration complexity for finding an $\epsilon$-optimal policy, with two increasing-stepsize schemes. The prior convergence of RPMD is applicable to any Bregman divergence, provided the policy space has bounded radius measured by the divergence when centering at the initial policy. Moreover, when the Bregman divergence corresponds to the squared euclidean distance, we establish an $\mathcal{O}(\max \{1/\epsilon, 1/(\eta \epsilon^2)\})$ complexity of RPMD with any constant stepsize $\eta$. For a general class of Bregman divergences, a similar complexity is also established for RPMD with constant stepsizes, provided the uncertainty set satisfies the relative strong convexity. We further develop a stochastic variant, named SRPMD, when the first-order information is only available through online interactions with the nominal environment. For general Bregman divergences, we establish an $\mathcal{O}(1/\epsilon^2)$ and $\mathcal{O}(1/\epsilon^3)$ sample complexity with two increasing-stepsize schemes. For the euclidean Bregman divergence, we establish an $\mathcal{O}(1/\epsilon^3)$ sample complexity with constant stepsizes. To the best of our knowledge, all the aforementioned results appear to be new for policy-based first-order methods applied to the robust MDP problem.
    A Bibliographic View on Constrained Clustering. (arXiv:2209.11125v1 [cs.LG])
    A keyword search on constrained clustering on Web-of-Science returned just under 3,000 documents. We ran automatic analyses of those, and compiled our own bibliography of 183 papers which we analysed in more detail based on their topic and experimental study, if any. This paper presents general trends of the area and its sub-topics by Pareto analysis, using citation count and year of publication. We list available software and analyse the experimental sections of our reference collection. We found a notable lack of large comparison experiments. Among the topics we reviewed, applications studies were most abundant recently, alongside deep learning, active learning and ensemble learning.
    Predictive Multiplicity in Probabilistic Classification. (arXiv:2206.01131v2 [cs.LG] UPDATED)
    There may exist multiple models that perform almost equally well for any given prediction task. We examine how predictions change across these competing models. In particular, we study predictive multiplicity -- in probabilistic classification. We formally define measures for our setting and develop optimization-based methods to compute these measures for convex empirical risk minimization problems. We apply our methodology to gain insight into why predictive multiplicity arises. We demonstrate the incidence and prevalence of predictive multiplicity in real-world risk assessment tasks. Our results emphasize the need to report multiplicity more widely.
    CAMRI Loss: Improving Recall of a Specific Class without Sacrificing Accuracy. (arXiv:2209.10920v1 [cs.LG])
    In real-world applications of multi-class classification models, misclassification in an important class (e.g., stop sign) can be significantly more harmful than in other classes (e.g., speed limit). In this paper, we propose a loss function that can improve the recall of an important class while maintaining the same level of accuracy as the case using cross-entropy loss. For our purpose, we need to make the separation of the important class better than the other classes. However, existing methods that give a class-sensitive penalty for cross-entropy loss do not improve the separation. On the other hand, the method that gives a margin to the angle between the feature vectors and the weight vectors of the last fully connected layer corresponding to each feature can improve the separation. Therefore, we propose a loss function that can improve the separation of the important class by setting the margin only for the important class, called Class-sensitive Additive Angular Margin Loss (CAMRI Loss). CAMRI loss is expected to reduce the variance of angles between features and weights of the important class relative to other classes due to the margin around the important class in the feature space by adding a penalty to the angle. In addition, concentrating the penalty only on the important classes hardly sacrifices the separation of the other classes. Experiments on CIFAR-10, GTSRB, and AwA2 showed that the proposed method could improve up to 9% recall improvement on cross-entropy loss without sacrificing accuracy.
    KGI: An Integrated Framework for Knowledge Intensive Language Tasks. (arXiv:2204.03985v2 [cs.CL] UPDATED)
    In this paper, we present a system to showcase the capabilities of the latest state-of-the-art retrieval augmented generation models trained on knowledge-intensive language tasks, such as slot filling, open domain question answering, dialogue, and fact-checking. Moreover, given a user query, we show how the output from these different models can be combined to cross-examine the outputs of each other. Particularly, we show how accuracy in dialogue can be improved using the question answering model. We are also releasing all models used in the demo as a contribution of this paper. A short video demonstrating the system is available at https://ibm.box.com/v/emnlp2022-demo.
    Robust Forecasting for Robotic Control: A Game-Theoretic Approach. (arXiv:2209.10802v1 [cs.RO])
    Modern robots require accurate forecasts to make optimal decisions in the real world. For example, self-driving cars need an accurate forecast of other agents' future actions to plan safe trajectories. Current methods rely heavily on historical time series to accurately predict the future. However, relying entirely on the observed history is problematic since it could be corrupted by noise, have outliers, or not completely represent all possible outcomes. To solve this problem, we propose a novel framework for generating robust forecasts for robotic control. In order to model real-world factors affecting future forecasts, we introduce the notion of an adversary, which perturbs observed historical time series to increase a robot's ultimate control cost. Specifically, we model this interaction as a zero-sum two-player game between a robot's forecaster and this hypothetical adversary. We show that our proposed game may be solved to a local Nash equilibrium using gradient-based optimization techniques. Furthermore, we show that a forecaster trained with our method performs 30.14% better on out-of-distribution real-world lane change data than baselines.
    Assessing ASR Model Quality on Disordered Speech using BERTScore. (arXiv:2209.10591v1 [eess.AS])
    Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality. It has been shown that ASR models tend to have much higher WER on speakers with speech impairments than typical English speakers. It is hard to determine if models can be be useful at such high error rates. This study investigates the use of BERTScore, an evaluation metric for text generation, to provide a more informative measure of ASR model quality and usefulness. Both BERTScore and WER were compared to prediction errors manually annotated by Speech Language Pathologists for error type and assessment. BERTScore was found to be more correlated with human assessment of error type and assessment. BERTScore was specifically more robust to orthographic changes (contraction and normalization errors) where meaning was preserved. Furthermore, BERTScore was a better fit of error assessment than WER, as measured using an ordinal logistic regression and the Akaike's Information Criterion (AIC). Overall, our findings suggest that BERTScore can complement WER when assessing ASR model performance from a practical perspective, especially for accessibility applications where models are useful even at lower accuracy than for typical speech.
    Equivariant Transduction through Invariant Alignment. (arXiv:2209.10926v1 [cs.CL])
    The ability to generalize compositionally is key to understanding the potentially infinite number of sentences that can be constructed in a human language from only a finite number of words. Investigating whether NLP models possess this ability has been a topic of interest: SCAN (Lake and Baroni, 2018) is one task specifically proposed to test for this property. Previous work has achieved impressive empirical results using a group-equivariant neural network that naturally encodes a useful inductive bias for SCAN (Gordon et al., 2020). Inspired by this, we introduce a novel group-equivariant architecture that incorporates a group-invariant hard alignment mechanism. We find that our network's structure allows it to develop stronger equivariance properties than existing group-equivariant approaches. We additionally find that it outperforms previous group-equivariant networks empirically on the SCAN task. Our results suggest that integrating group-equivariance into a variety of neural architectures is a potentially fruitful avenue of research, and demonstrate the value of careful analysis of the theoretical properties of such architectures.
    Algorithm-Agnostic Interpretations for Clustering. (arXiv:2209.10578v1 [cs.LG])
    A clustering outcome for high-dimensional data is typically interpreted via post-processing, involving dimension reduction and subsequent visualization. This destroys the meaning of the data and obfuscates interpretations. We propose algorithm-agnostic interpretation methods to explain clustering outcomes in reduced dimensions while preserving the integrity of the data. The permutation feature importance for clustering represents a general framework based on shuffling feature values and measuring changes in cluster assignments through custom score functions. The individual conditional expectation for clustering indicates observation-wise changes in the cluster assignment due to changes in the data. The partial dependence for clustering evaluates average changes in cluster assignments for the entire feature space. All methods can be used with any clustering algorithm able to reassign instances through soft or hard labels. In contrast to common post-processing methods such as principal component analysis, the introduced methods maintain the original structure of the features.
    A Tent L\'evy Flying Sparrow Search Algorithm for Feature Selection: A COVID-19 Case Study. (arXiv:2209.10542v1 [cs.LG])
    The "Curse of Dimensionality" induced by the rapid development of information science, might have a negative impact when dealing with big datasets. In this paper, we propose a variant of the sparrow search algorithm (SSA), called Tent L\'evy flying sparrow search algorithm (TFSSA), and use it to select the best subset of features in the packing pattern for classification purposes. SSA is a recently proposed algorithm that has not been systematically applied to feature selection problems. After verification by the CEC2020 benchmark function, TFSSA is used to select the best feature combination to maximize classification accuracy and minimize the number of selected features. The proposed TFSSA is compared with nine algorithms in the literature. Nine evaluation metrics are used to properly evaluate and compare the performance of these algorithms on twenty-one datasets from the UCI repository. Furthermore, the approach is applied to the coronavirus disease (COVID-19) dataset, yielding the best average classification accuracy and the average number of feature selections, respectively, of 93.47% and 2.1. Experimental results confirm the advantages of the proposed algorithm in improving classification accuracy and reducing the number of selected features compared to other wrapper-based algorithms.
    Variational inference of fractional Brownian motion with linear computational complexity. (arXiv:2203.07961v3 [cs.LG] UPDATED)
    We introduce a simulation-based, amortised Bayesian inference scheme to infer the parameters of random walks. Our approach learns the posterior distribution of the walks' parameters with a likelihood-free method. In the first step a graph neural network is trained on simulated data to learn optimized low-dimensional summary statistics of the random walk. In the second step an invertible neural network generates the posterior distribution of the parameters from the learnt summary statistics using variational inference. We apply our method to infer the parameters of the fractional Brownian motion model from single trajectories. The computational complexity of the amortized inference procedure scales linearly with trajectory length, and its precision scales similarly to the Cram{\'e}r-Rao bound over a wide range of lengths. The approach is robust to positional noise, and generalizes well to trajectories longer than those seen during training. Finally, we adapt this scheme to show that a finite decorrelation time in the environment can furthermore be inferred from individual trajectories.
    PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training. (arXiv:2209.11133v1 [cs.RO])
    Robotics has long been a field riddled with complex systems architectures whose modules and connections, whether traditional or learning-based, require significant human expertise and prior knowledge. Inspired by large pre-trained language models, this work introduces a paradigm for pre-training a general purpose representation that can serve as a starting point for multiple tasks on a given robot. We present the Perception-Action Causal Transformer (PACT), a generative transformer-based architecture that aims to build representations directly from robot data in a self-supervised fashion. Through autoregressive prediction of states and actions over time, our model implicitly encodes dynamics and behaviors for a particular robot. Our experimental evaluation focuses on the domain of mobile agents, where we show that this robot-specific representation can function as a single starting point to achieve distinct tasks such as safe navigation, localization and mapping. We evaluate two form factors: a wheeled robot that uses a LiDAR sensor as perception input (MuSHR), and a simulated agent that uses first-person RGB images (Habitat). We show that finetuning small task-specific networks on top of the larger pretrained model results in significantly better performance compared to training a single model from scratch for all tasks simultaneously, and comparable performance to training a separate large model for each task independently. By sharing a common good-quality representation across tasks we can lower overall model capacity and speed up the real-time deployment of such systems.
    Adaptive Bias Correction for Improved Subseasonal Forecasting. (arXiv:2209.10666v1 [cs.LG])
    Subseasonal forecasting $\unicode{x2013}$ predicting temperature and precipitation 2 to 6 weeks $\unicode{x2013}$ ahead is critical for effective water allocation, wildfire management, and drought and flood mitigation. Recent international research efforts have advanced the subseasonal capabilities of operational dynamical models, yet temperature and precipitation prediction skills remains poor, partly due to stubborn errors in representing atmospheric dynamics and physics inside dynamical models. To counter these errors, we introduce an adaptive bias correction (ABC) method that combines state-of-the-art dynamical forecasts with observations using machine learning. When applied to the leading subseasonal model from the European Centre for Medium-Range Weather Forecasts (ECMWF), ABC improves temperature forecasting skill by 60-90% and precipitation forecasting skill by 40-69% in the contiguous U.S. We couple these performance improvements with a practical workflow, based on Cohort Shapley, for explaining ABC skill gains and identifying higher-skill windows of opportunity based on specific climate conditions.
    Stochastic Future Prediction in Real World Driving Scenarios. (arXiv:2209.10693v1 [cs.CV])
    Uncertainty plays a key role in future prediction. The future is uncertain. That means there might be many possible futures. A future prediction method should cover the whole possibilities to be robust. In autonomous driving, covering multiple modes in the prediction part is crucially important to make safety-critical decisions. Although computer vision systems have advanced tremendously in recent years, future prediction remains difficult today. Several examples are uncertainty of the future, the requirement of full scene understanding, and the noisy outputs space. In this thesis, we propose solutions to these challenges by modeling the motion explicitly in a stochastic way and learning the temporal dynamics in a latent space.
    Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization. (arXiv:2205.11979v3 [math.OC] UPDATED)
    In recent years, decentralized learning has emerged as a powerful tool not only for large-scale machine learning, but also for preserving privacy. One of the key challenges in decentralized learning is that the data distribution held by each node is statistically heterogeneous. To address this challenge, the primal-dual algorithm called the Edge-Consensus Learning (ECL) was proposed and was experimentally shown to be robust to the heterogeneity of data distributions. However, the convergence rate of the ECL is provided only when the objective function is convex, and has not been shown in a standard machine learning setting where the objective function is non-convex. Furthermore, the intuitive reason why the ECL is robust to the heterogeneity of data distributions has not been investigated. In this work, we first investigate the relationship between the ECL and Gossip algorithm and show that the update formulas of the ECL can be regarded as correcting the local stochastic gradient in the Gossip algorithm. Then, we propose the Generalized ECL (G-ECL), which contains the ECL as a special case, and provide the convergence rates of the G-ECL in both (strongly) convex and non-convex settings, which do not depend on the heterogeneity of data distributions. Through synthetic experiments, we demonstrate that the numerical results of both the G-ECL and ECL coincide with the convergence rate of the G-ECL.
    Mega: Moving Average Equipped Gated Attention. (arXiv:2209.10655v1 [cs.LG])
    The design choices in the Transformer attention mechanism, including weak inductive bias and quadratic computational complexity, have limited its application for modeling long sequences. In this paper, we introduce Mega, a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism. We further propose a variant of Mega that offers linear time and space complexity yet yields only minimal quality loss, by efficiently splitting the whole sequence into multiple chunks with fixed length. Extensive experiments on a wide range of sequence modeling benchmarks, including the Long Range Arena, neural machine translation, auto-regressive language modeling, and image and speech classification, show that Mega achieves significant improvements over other sequence models, including variants of Transformers and recent state space models.
    The Sample Complexity of One-Hidden-Layer Neural Networks. (arXiv:2202.06233v2 [cs.LG] UPDATED)
    We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm. We begin by proving that in general, controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees (independent of the network width), while a stronger Frobenius norm control is sufficient, extending and improving on previous work. Motivated by the proof constructions, we identify and analyze two important settings where (perhaps surprisingly) a mere spectral norm control turns out to be sufficient: First, when the network's activation functions are sufficiently smooth (with the result extending to deeper networks); and second, for certain types of convolutional networks. In the latter setting, we study how the sample complexity is additionally affected by parameters such as the amount of overlap between patches and the overall number of patches.
    DIG: Draping Implicit Garment over the Human Body. (arXiv:2209.10845v1 [cs.CV])
    Existing data-driven methods for draping garments over posed human bodies, despite being effective, cannot handle garments of arbitrary topology and are typically not end-to-end differentiable. To address these limitations, we propose an end-to-end differentiable pipeline that represents garments using implicit surfaces and learns a skinning field conditioned on shape and pose parameters of an articulated body model. To limit body-garment interpenetrations and artifacts, we propose an interpretation-aware pre-processing strategy of training data and a novel training loss that penalizes self-intersections while draping garments. We demonstrate that our method yields more accurate results for garment reconstruction and deformation with respect to state-of-the-art methods. Furthermore, we show that our method, thanks to its end-to-end differentiability, allows to recover body and garments parameters jointly from image observations, something that previous work could not do.
    Toy Models of Superposition. (arXiv:2209.10652v1 [cs.LG])
    Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much more challenging. This paper provides a toy model where polysemanticity can be fully understood, arising as a result of models storing additional sparse features in "superposition." We demonstrate the existence of a phase change, a surprising connection to the geometry of uniform polytopes, and evidence of a link to adversarial examples. We also discuss potential implications for mechanistic interpretability.
    One Positive Label is Sufficient: Single-Positive Multi-Label Learning with Label Enhancement. (arXiv:2206.00517v2 [cs.LG] UPDATED)
    Multi-label learning (MLL) learns from the examples each associated with multiple labels simultaneously, where the high cost of annotating all relevant labels for each training example is challenging for real-world applications. To cope with the challenge, we investigate single-positive multi-label learning (SPMLL) where each example is annotated with only one relevant label and show that one can successfully learn a theoretically grounded multi-label classifier for the problem. In this paper, a novel SPMLL method named {\proposed}, i.e., Single-positive MultI-label learning with Label Enhancement, is proposed. Specifically, an unbiased risk estimator is derived, which could be guaranteed to approximately converge to the optimal risk minimizer of fully supervised learning and shows that one positive label of each instance is sufficient to train the predictive model. Then, the corresponding empirical risk estimator is established via recovering the latent soft label as a label enhancement process, where the posterior density of the latent soft labels is approximate to the variational Beta density parameterized by an inference model. Experiments on benchmark datasets validate the effectiveness of the proposed method.
    SCALES: From Fairness Principles to Constrained Decision-Making. (arXiv:2209.10860v1 [cs.LG])
    This paper proposes SCALES, a general framework that translates well-established fairness principles into a common representation based on the Constraint Markov Decision Process (CMDP). With the help of causal language, our framework can place constraints on both the procedure of decision making (procedural fairness) as well as the outcomes resulting from decisions (outcome fairness). Specifically, we show that well-known fairness principles can be encoded either as a utility component, a non-causal component, or a causal component in a SCALES-CMDP. We illustrate SCALES using a set of case studies involving a simulated healthcare scenario and the real-world COMPAS dataset. Experiments demonstrate that our framework produces fair policies that embody alternative fairness principles in single-step and sequential decision-making scenarios.
    Boosting Simple Learners. (arXiv:2001.11704v6 [cs.LG] UPDATED)
    Boosting is a celebrated machine learning approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. We study boosting under the assumption that the weak hypotheses belong to a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are "rules-of-thumbs" from an "easy-to-learn class". (Schapire and Freund~'12, Shalev-Shwartz and Ben-David '14.) Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions: (i) Oracle Complexity: How many weak hypotheses are needed to produce an accurate hypothesis? We design a novel boosting algorithm and demonstrate that it circumvents a classical lower bound by Freund and Schapire ('95, '12). Whereas the lower bound shows that $\Omega({1}/{\gamma^2})$ weak hypotheses with $\gamma$-margin are sometimes necessary, our new method requires only $\tilde{O}({1}/{\gamma})$ weak hypothesis, provided that they belong to a class of bounded VC dimension. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, the new boosting algorithm uses more complex ("deeper") aggregation rules. We complement this result by showing that complex aggregation rules are in fact necessary to circumvent the aforementioned lower bound. (ii) Expressivity: Which tasks can be learned by boosting weak hypotheses from a bounded VC class? Can complex concepts that are "far away" from the class be learned? Towards answering the first question we {introduce combinatorial-geometric parameters which capture expressivity in boosting.} As a corollary we provide an affirmative answer to the second question for well-studied classes, including half-spaces and decision stumps. Along the way, we establish and exploit connections with Discrepancy Theory.
    Modeling cognitive load as a self-supervised brain rate with electroencephalography and deep learning. (arXiv:2209.10992v1 [eess.SP])
    The principal reason for measuring mental workload is to quantify the cognitive cost of performing tasks to predict human performance. Unfortunately, a method for assessing mental workload that has general applicability does not exist yet. This research presents a novel self-supervised method for mental workload modelling from EEG data employing Deep Learning and a continuous brain rate, an index of cognitive activation, without requiring human declarative knowledge. This method is a convolutional recurrent neural network trainable with spatially preserving spectral topographic head-maps from EEG data to fit the brain rate variable. Findings demonstrate the capacity of the convolutional layers to learn meaningful high-level representations from EEG data since within-subject models had a test Mean Absolute Percentage Error average of 11%. The addition of a Long-Short Term Memory layer for handling sequences of high-level representations was not significant, although it did improve their accuracy. Findings point to the existence of quasi-stable blocks of learnt high-level representations of cognitive activation because they can be induced through convolution and seem not to be dependent on each other over time, intuitively matching the non-stationary nature of brain responses. Across-subject models, induced with data from an increasing number of participants, thus containing more variability, obtained a similar accuracy to the within-subject models. This highlights the potential generalisability of the induced high-level representations across people, suggesting the existence of subject-independent cognitive activation patterns. This research contributes to the body of knowledge by providing scholars with a novel computational method for mental workload modelling that aims to be generally applicable, does not rely on ad-hoc human-crafted models supporting replicability and falsifiability.
    Inverted Landing in a Small Aerial Robot via Deep Reinforcement Learning for Triggering and Control of Rotational Maneuvers. (arXiv:2209.11043v1 [cs.RO])
    Inverted landing in a rapid and robust manner is a challenging feat for aerial robots, especially while depending entirely on onboard sensing and computation. In spite of this, this feat is routinely performed by biological fliers such as bats, flies, and bees. Our previous work has identified a direct causal connection between a series of onboard visual cues and kinematic actions that allow for reliable execution of this challenging aerobatic maneuver in small aerial robots. In this work, we first utilized Deep Reinforcement Learning and a physics-based simulation to obtain a general, optimal control policy for robust inverted landing starting from any arbitrary approach condition. This optimized control policy provides a computationally-efficient mapping from the system's observational space to its motor command action space, including both triggering and control of rotational maneuvers. This was done by training the system over a large range of approach flight velocities that varied with magnitude and direction. Next, we performed a sim-to-real transfer and experimental validation of the learned policy via domain randomization, by varying the robot's inertial parameters in the simulation. Through experimental trials, we identified several dominant factors which greatly improved landing robustness and the primary mechanisms that determined inverted landing success. We expect the learning framework developed in this study can be generalized to solve more challenging tasks, such as utilizing noisy onboard sensory data, landing on surfaces of various orientations, or landing on dynamically-moving surfaces.
    Explaining Deep Tractable Probabilistic Models: The sum-product network case. (arXiv:2110.09778v2 [cs.LG] UPDATED)
    We consider the problem of explaining a class of tractable deep probabilistic models, the Sum-Product Networks (SPNs) and present an algorithm ExSPN to generate explanations. To this effect, we define the notion of a context-specific independence tree(CSI-tree) and present an iterative algorithm that converts an SPN to a CSI-tree. The resulting CSI-tree is both interpretable and explainable to the domain expert. We achieve this by extracting the conditional independencies encoded by the SPN and approximating the local context specified by the structure of the SPN. Our extensive empirical evaluations on synthetic, standard, and real-world clinical data sets demonstrate that the CSI-tree exhibits superior explainability.
    SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials. (arXiv:2209.10702v1 [physics.chem-ph])
    Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the {\omega}B97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.
    Estimating individual treatment effects under unobserved confounding using binary instruments. (arXiv:2208.08544v2 [stat.ME] UPDATED)
    Estimating individual treatment effects (ITEs) from observational data is relevant in many fields such as personalized medicine. However, in practice, the treatment assignment is usually confounded by unobserved variables and thus introduces bias. A remedy to remove the bias is the use of instrumental variables (IVs). Such settings are widespread in medicine (e.g., trials where compliance is used as binary IV). In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating ITEs using binary IVs and thus yield an unbiased ITE estimator. Different from previous work for binary IVs, our framework estimates the ITE directly via a pseudo outcome regression. (1) We provide a theoretical analysis where we show that our framework yields multiply robust convergence rates: our ITE estimator achieves fast convergence even if several nuisance estimators converge slowly. (2) We further show that our framework asymptotically outperforms state-of-the-art plug-in IV methods for ITE estimation. (3) We build upon our theoretical results and propose a tailored deep neural network architecture called MRIV-Net for ITE estimation using binary IVs. Across various computational experiments, we demonstrate empirically that our MRIV-Net achieves state-of-the-art performance. To the best of our knowledge, our MRIV is the first machine learning framework for estimating ITEs in the binary IV setting shown to be multiply robust.
    mini-ELSA: using Machine Learning to improve space efficiency in Edge Lightweight Searchable Attribute-based encryption for Industry 4.0. (arXiv:2209.10896v1 [cs.LG])
    In previous work a novel Edge Lightweight Searchable Attribute-based encryption (ELSA) method was proposed to support Industry 4.0 and specifically Industrial Internet of Things applications. In this paper, we aim to improve ELSA by minimising the lookup table size and summarising the data records by integrating Machine Learning (ML) methods suitable for execution at the edge. This integration will eliminate records of unnecessary data by evaluating added value to further processing. Thus, resulting in the minimization of both the lookup table size, the cloud storage and the network traffic taking full advantage of the edge architecture benefits. We demonstrate our mini-ELSA expanded method on a well-known power plant dataset. Our results demonstrate a reduction of storage requirements by 21% while improving execution time by 1.27x.
    How Does It Feel? Self-Supervised Costmap Learning for Off-Road Vehicle Traversability. (arXiv:2209.10788v1 [cs.RO])
    Estimating terrain traversability in off-road environments requires reasoning about complex interaction dynamics between the robot and these terrains. However, it is challenging to build an accurate physics model, or create informative labels to learn a model in a supervised manner, for these interactions. We propose a method that learns to predict traversability costmaps by combining exteroceptive environmental information with proprioceptive terrain interaction feedback in a self-supervised manner. Additionally, we propose a novel way of incorporating robot velocity in the costmap prediction pipeline. We validate our method in multiple short and large-scale navigation tasks on a large, autonomous all-terrain vehicle (ATV) on challenging off-road terrains, and demonstrate ease of integration on a separate large ground robot. Our short-scale navigation results show that using our learned costmaps leads to overall smoother navigation, and provides the robot with a more fine-grained understanding of the interactions between the robot and different terrain types, such as grass and gravel. Our large-scale navigation trials show that we can reduce the number of interventions by up to 57% compared to an occupancy-based navigation baseline in challenging off-road courses ranging from 400 m to 3150 m.
    EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models. (arXiv:2209.10890v1 [eess.AS])
    Neural models are known to be over-parameterized, and recent work has shown that sparse text-to-speech (TTS) models can outperform dense models. Although a plethora of sparse methods has been proposed for other domains, such methods have rarely been applied in TTS. In this work, we seek to answer the question: what are the characteristics of selected sparse techniques on the performance and model complexity? We compare a Tacotron2 baseline and the results of applying five techniques. We then evaluate the performance via the factors of naturalness, intelligibility and prosody, while reporting model size and training time. Complementary to prior research, we find that pruning before or during training can achieve similar performance to pruning after training and can be trained much faster, while removing entire neurons degrades performance much more than removing parameters. To our best knowledge, this is the first work that compares sparsity paradigms in text-to-speech synthesis.
    Matrix factorisation and the interpretation of geodesic distance. (arXiv:2106.01260v3 [stat.ML] UPDATED)
    Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is encoded as geodesic distance. Hence, a nonlinear dimension reduction tool, approximating geodesic distance, can recover the latent positions, up to a simple transformation. We give a detailed account of the case where spectral embedding is used, followed by Isomap, and provide encouraging experimental evidence for other combinations of techniques.
    Identifiability and generalizability from multiple experts in Inverse Reinforcement Learning. (arXiv:2209.10974v1 [cs.LG])
    While Reinforcement Learning (RL) aims to train an agent from a reward function in a given environment, Inverse Reinforcement Learning (IRL) seeks to recover the reward function from observing an expert's behavior. It is well known that, in general, various reward functions can lead to the same optimal policy, and hence, IRL is ill-defined. However, (Cao et al., 2021) showed that, if we observe two or more experts with different discount factors or acting in different environments, the reward function can under certain conditions be identified up to a constant. This work starts by showing an equivalent identifiability statement from multiple experts in tabular MDPs based on a rank condition, which is easily verifiable and is shown to be also necessary. We then extend our result to various different scenarios, i.e., we characterize reward identifiability in the case where the reward function can be represented as a linear combination of given features, making it more interpretable, or when we have access to approximate transition matrices. Even when the reward is not identifiable, we provide conditions characterizing when data on multiple experts in a given environment allows to generalize and train an optimal agent in a new environment. Our theoretical results on reward identifiability and generalizability are validated in various numerical experiments.
    High-order Multi-view Clustering for Generic Data. (arXiv:2209.10838v1 [cs.LG])
    Graph-based multi-view clustering has achieved better performance than most non-graph approaches. However, in many real-world scenarios, the graph structure of data is not given or the quality of initial graph is poor. Additionally, existing methods largely neglect the high-order neighborhood information that characterizes complex intrinsic interactions. To tackle these problems, we introduce an approach called high-order multi-view clustering (HMvC) to explore the topology structure information of generic data. Firstly, graph filtering is applied to encode structure information, which unifies the processing of attributed graph data and non-graph data in a single framework. Secondly, up to infinity-order intrinsic relationships are exploited to enrich the learned graph. Thirdly, to explore the consistent and complementary information of various views, an adaptive graph fusion mechanism is proposed to achieve a consensus graph. Comprehensive experimental results on both non-graph and attributed graph data show the superior performance of our method with respect to various state-of-the-art techniques, including some deep learning methods.
    Improving Attention-Based Interpretability of Text Classification Transformers. (arXiv:2209.10876v1 [cs.CL])
    Transformers are widely used in NLP, where they consistently achieve state-of-the-art performance. This is due to their attention-based architecture, which allows them to model rich linguistic relations between words. However, transformers are difficult to interpret. Being able to provide reasoning for its decisions is an important property for a model in domains where human lives are affected, such as hate speech detection and biomedicine. With transformers finding wide use in these fields, the need for interpretability techniques tailored to them arises. The effectiveness of attention-based interpretability techniques for transformers in text classification is studied in this work. Despite concerns about attention-based interpretations in the literature, we show that, with proper setup, attention may be used in such tasks with results comparable to state-of-the-art techniques, while also being faster and friendlier to the environment. We validate our claims with a series of experiments that employ a new feature importance metric.
    Continuous Mixtures of Tractable Probabilistic Models. (arXiv:2209.10584v1 [cs.LG])
    Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, which allows them to perform exact inference, but often they show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, using a finite set of integration points, the approximation method can be compiled into a PC performing `exact inference in an approximate model'. In experiments, we show that this simple scheme proves remarkably effective, as PCs learned this way set new state-of-the-art for tractable models on many standard density estimation benchmarks.
    FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence. (arXiv:2204.08125v3 [cs.LG] UPDATED)
    As a distributed learning paradigm, Federated Learning (FL) faces the communication bottleneck issue due to many rounds of model synchronization and aggregation. Heterogeneous data further deteriorates the situation by causing slow convergence. Although the impact of data heterogeneity on supervised FL has been widely studied, the related investigation for Federated Reinforcement Learning (FRL) is still in its infancy. In this paper, we first define the type and level of data heterogeneity for policy gradient based FRL systems. By inspecting the connection between the global and local objective functions, we prove that local training can benefit the global objective, if the local update is properly penalized by the total variation (TV) distance between the local and global policies. A necessary condition for the global policy to be learn-able from the local policy is also derived, which is directly related to the heterogeneity level. Based on the theoretical result, a Kullback-Leibler (KL) divergence based penalty is proposed, which, different from the conventional method that penalizes the model divergence in the parameter space, directly constrains the model outputs in the distribution space. Convergence proof of the proposed algorithm is also provided. By jointly penalizing the divergence of the local policy from the global policy with a global penalty and constraining each iteration of the local training with a local penalty, the proposed method achieves a better trade-off between training speed (step size) and convergence. Experiment results on two popular Reinforcement Learning (RL) experiment platforms demonstrate the advantage of the proposed algorithm over existing methods in accelerating and stabilizing the training process with heterogeneous data.
    Non-Negative Matrix Factorization with Scale Data Structure Preservation. (arXiv:2209.10881v1 [cs.LG])
    The model described in this paper belongs to the family of non-negative matrix factorization methods designed for data representation and dimension reduction. In addition to preserving the data positivity property, it aims also to preserve the structure of data during matrix factorization. The idea is to add, to the NMF cost function, a penalty term to impose a scale relationship between the pairwise similarity matrices of the original and transformed data points. The solution of the new model involves deriving a new parametrized update scheme for the coefficient matrix, which makes it possible to improve the quality of reduced data when used for clustering and classification. The proposed clustering algorithm is compared to some existing NMF-based algorithms and to some manifold learning-based algorithms when applied to some real-life datasets. The obtained results show the effectiveness of the proposed algorithm.
    Simulator-based explanation and debugging of hazard-triggering events in DNN-based safety-critical systems. (arXiv:2204.00480v2 [cs.SE] UPDATED)
    When Deep Neural Networks (DNNs) are used in safety-critical systems, engineers should determine the safety risks associated with failures (i.e., erroneous outputs) observed during testing. For DNNs processing images, engineers visually inspect all failure-inducing images to determine common characteristics among them. Such characteristics correspond to hazard-triggering events (e.g., low illumination) that are essential inputs for safety analysis. Though informative, such activity is expensive and error-prone. To support such safety analysis practices, we propose SEDE, a technique that generates readable descriptions for commonalities in failure-inducing, real-world images and improves the DNN through effective retraining. SEDE leverages the availability of simulators, which are commonly used for cyber-physical systems. It relies on genetic algorithms to drive simulators towards the generation of images that are similar to failure-inducing, real-world images in the test set; it then employs rule learning algorithms to derive expressions that capture commonalities in terms of simulator parameter values. The derived expressions are then used to generate additional images to retrain and improve the DNN. With DNNs performing in-car sensing tasks, SEDE successfully characterized hazard-triggering events leading to a DNN accuracy drop. Also, SEDE enabled retraining leading to significant improvements in DNN accuracy, up to 18 percentage points.
    Detecting Rotated Objects as Gaussian Distributions and Its 3-D Generalization. (arXiv:2209.10839v1 [cs.CV])
    Existing detection methods commonly use a parameterized bounding box (BBox) to model and detect (horizontal) objects and an additional rotation angle parameter is used for rotated objects. We argue that such a mechanism has fundamental limitations in building an effective regression loss for rotation detection, especially for high-precision detection with high IoU (e.g. 0.75). Instead, we propose to model the rotated objects as Gaussian distributions. A direct advantage is that our new regression loss regarding the distance between two Gaussians e.g. Kullback-Leibler Divergence (KLD), can well align the actual detection performance metric, which is not well addressed in existing methods. Moreover, the two bottlenecks i.e. boundary discontinuity and square-like problem also disappear. We also propose an efficient Gaussian metric-based label assignment strategy to further boost the performance. Interestingly, by analyzing the BBox parameters' gradients under our Gaussian-based KLD loss, we show that these parameters are dynamically updated with interpretable physical meaning, which help explain the effectiveness of our approach, especially for high-precision detection. We extend our approach from 2-D to 3-D with a tailored algorithm design to handle the heading estimation, and experimental results on twelve public datasets (2-D/3-D, aerial/text/face images) with various base detectors show its superiority.  ( 3 min )
    SGC: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks. (arXiv:2209.10545v1 [q-bio.GN])
    A widely used approach for extracting information from gene expression data employ the construction of a gene co-expression network and the subsequent application of algorithms that discover network structure. In particular, a common goal is the computational discovery of gene clusters, commonly called modules. When applied on a novel gene expression dataset, the quality of the computed modules can be evaluated automatically, using Gene Ontology enrichment, a method that measures the frequencies of Gene Ontology terms in the computed modules and evaluates their statistical likelihood. In this work we propose SGC a novel pipeline for gene clustering based on relatively recent seminal work in the mathematics of spectral network theory. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules. Comparing with already well-known existing frameworks, we show that SGC results in higher enrichment in real data. In particular, in 12 real gene expression datasets, SGC outperforms in all except one.  ( 2 min )
    Modelling the Frequency of Home Deliveries: An Induced Travel Demand Contribution of Aggrandized E-shopping in Toronto during COVID-19 Pandemics. (arXiv:2209.10664v1 [econ.EM])
    The COVID-19 pandemic dramatically catalyzed the proliferation of e-shopping. The dramatic growth of e-shopping will undoubtedly cause significant impacts on travel demand. As a result, transportation modeller's ability to model e-shopping demand is becoming increasingly important. This study developed models to predict household' weekly home delivery frequencies. We used both classical econometric and machine learning techniques to obtain the best model. It is found that socioeconomic factors such as having an online grocery membership, household members' average age, the percentage of male household members, the number of workers in the household and various land use factors influence home delivery demand. This study also compared the interpretations and performances of the machine learning models and the classical econometric model. Agreement is found in the variable's effects identified through the machine learning and econometric models. However, with similar recall accuracy, the ordered probit model, a classical econometric model, can accurately predict the aggregate distribution of household delivery demand. In contrast, both machine learning models failed to match the observed distribution.  ( 3 min )
  • Open

    Challenges in Visual Anomaly Detection for Mobile Robots. (arXiv:2209.10995v1 [cs.CV])
    We consider the task of detecting anomalies for autonomous mobile robots based on vision. We categorize relevant types of visual anomalies and discuss how they can be detected by unsupervised deep learning methods. We propose a novel dataset built specifically for this task, on which we test a state-of-the-art approach; we finally discuss deployment in a real scenario.  ( 2 min )
    A Validation Approach to Over-parameterized Matrix and Image Recovery. (arXiv:2209.10675v1 [math.OC])
    In this paper, we study the problem of recovering a low-rank matrix from a number of noisy random linear measurements. We consider the setting where the rank of the ground-truth matrix is unknown a prior and use an overspecified factored representation of the matrix variable, where the global optimal solutions overfit and do not correspond to the underlying ground-truth. We then solve the associated nonconvex problem using gradient descent with small random initialization. We show that as long as the measurement operators satisfy the restricted isometry property (RIP) with its rank parameter scaling with the rank of ground-truth matrix rather than scaling with the overspecified matrix variable, gradient descent iterations are on a particular trajectory towards the ground-truth matrix and achieve nearly information-theoretically optimal recovery when stop appropriately. We then propose an efficient early stopping strategy based on the common hold-out method and show that it detects nearly optimal estimator provably. Moreover, experiments show that the proposed validation approach can also be efficiently used for image restoration with deep image prior which over-parameterizes an image with a deep network.  ( 2 min )
    Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation. (arXiv:2209.10634v1 [q-bio.NC])
    Early sensory systems in the brain rapidly adapt to fluctuating input statistics, which requires recurrent communication between neurons. Mechanistically, such recurrent communication is often indirect and mediated by local interneurons. In this work, we explore the computational benefits of mediating recurrent communication via interneurons compared with direct recurrent connections. To this end, we consider two mathematically tractable recurrent neural networks that statistically whiten their inputs -- one with direct recurrent connections and the other with interneurons that mediate recurrent communication. By analyzing the corresponding continuous synaptic dynamics and numerically simulating the networks, we show that the network with interneurons is more robust to initialization than the network with direct recurrent connections in the sense that the convergence time for the synaptic dynamics in the network with interneurons (resp. direct recurrent connections) scales logarithmically (resp. linearly) with the spectrum of their initialization. Our results suggest that interneurons are computationally useful for rapid adaptation to changing input statistics. Interestingly, the network with interneurons is an overparameterized solution of the whitening objective for the network with direct recurrent connections, so our results can be viewed as a recurrent neural network analogue of the implicit acceleration phenomenon observed in overparameterized feedforward linear networks.  ( 3 min )
    Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation. (arXiv:2205.12460v3 [stat.ME] UPDATED)
    Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for $K$-class problems (Wu, Zhang and Liu, 2010; Wang, Zhang and Wu, 2019), where $K$ is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in $K$. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in $K$. Though not being most efficient in computation, the OVA offers the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate finite sample performance.  ( 2 min )
    Batch Bayesian optimisation via density-ratio estimation with guarantees. (arXiv:2209.10715v1 [cs.LG])
    Bayesian optimisation (BO) algorithms have shown remarkable success in applications involving expensive black-box functions. Traditionally BO has been set as a sequential decision-making process which estimates the utility of query points via an acquisition function and a prior over functions, such as a Gaussian process. Recently, however, a reformulation of BO via density-ratio estimation (BORE) allowed reinterpreting the acquisition function as a probabilistic binary classifier, removing the need for an explicit prior over functions and increasing scalability. In this paper, we present a theoretical analysis of BORE's regret and an extension of the algorithm with improved uncertainty estimates. We also show that BORE can be naturally extended to a batch optimisation setting by recasting the problem as approximate Bayesian inference. The resulting algorithm comes equipped with theoretical performance guarantees and is assessed against other batch BO baselines in a series of experiments.  ( 2 min )
    Invariant Policy Learning: A Causal Perspective. (arXiv:2106.00808v4 [cs.LG] UPDATED)
    Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real-world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we take a step toward tackling the problem of environmental shifts considering the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved variables are present and show that, in that case, an optimal invariant policy is guaranteed to generalize across environments under suitable assumptions. Our results establish concrete connections among causality, invariance, and contextual bandits.  ( 3 min )
    Continuous Mixtures of Tractable Probabilistic Models. (arXiv:2209.10584v1 [cs.LG])
    Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, which allows them to perform exact inference, but often they show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, using a finite set of integration points, the approximation method can be compiled into a PC performing `exact inference in an approximate model'. In experiments, we show that this simple scheme proves remarkably effective, as PCs learned this way set new state-of-the-art for tractable models on many standard density estimation benchmarks.  ( 2 min )
    Adjusted chi-square test for degree-corrected block models. (arXiv:2012.15047v2 [math.ST] UPDATED)
    We propose a goodness-of-fit test for degree-corrected stochastic block models (DCSBM). The test is based on an adjusted chi-square statistic for measuring equality of means among groups of $n$ multinomial distributions with $d_1,\dots,d_n$ observations. In the context of network models, the number of multinomials, $n$, grows much faster than the number of observations, $d_i$, corresponding to the degree of node $i$, hence the setting deviates from classical asymptotics. We show that a simple adjustment allows the statistic to converge in distribution, under null, as long as the harmonic mean of $\{d_i\}$ grows to infinity. When applied sequentially, the test can also be used to determine the number of communities. The test operates on a compressed version of the adjacency matrix, conditional on the degrees, and as a result is highly scalable to large sparse networks. We incorporate a novel idea of compressing the rows based on a $(K+1)$-community assignment when testing for $K$ communities. This approach increases the power in sequential applications without sacrificing computational efficiency, and we prove its consistency in recovering the number of communities. Since the test statistic does not rely on a specific alternative, its utility goes beyond sequential testing and can be used to simultaneously test against a wide range of alternatives outside the DCSBM family. In particular, we prove that the test is consistent against a general family of latent-variable network models with community structure.  ( 3 min )
    Estimating individual treatment effects under unobserved confounding using binary instruments. (arXiv:2208.08544v2 [stat.ME] UPDATED)
    Estimating individual treatment effects (ITEs) from observational data is relevant in many fields such as personalized medicine. However, in practice, the treatment assignment is usually confounded by unobserved variables and thus introduces bias. A remedy to remove the bias is the use of instrumental variables (IVs). Such settings are widespread in medicine (e.g., trials where compliance is used as binary IV). In this paper, we propose a novel, multiply robust machine learning framework, called MRIV, for estimating ITEs using binary IVs and thus yield an unbiased ITE estimator. Different from previous work for binary IVs, our framework estimates the ITE directly via a pseudo outcome regression. (1) We provide a theoretical analysis where we show that our framework yields multiply robust convergence rates: our ITE estimator achieves fast convergence even if several nuisance estimators converge slowly. (2) We further show that our framework asymptotically outperforms state-of-the-art plug-in IV methods for ITE estimation. (3) We build upon our theoretical results and propose a tailored deep neural network architecture called MRIV-Net for ITE estimation using binary IVs. Across various computational experiments, we demonstrate empirically that our MRIV-Net achieves state-of-the-art performance. To the best of our knowledge, our MRIV is the first machine learning framework for estimating ITEs in the binary IV setting shown to be multiply robust.  ( 3 min )
    A Generalist Neural Algorithmic Learner. (arXiv:2209.11142v1 [cs.LG])
    The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.  ( 3 min )
    Adaptive Bias Correction for Improved Subseasonal Forecasting. (arXiv:2209.10666v1 [cs.LG])
    Subseasonal forecasting $\unicode{x2013}$ predicting temperature and precipitation 2 to 6 weeks $\unicode{x2013}$ ahead is critical for effective water allocation, wildfire management, and drought and flood mitigation. Recent international research efforts have advanced the subseasonal capabilities of operational dynamical models, yet temperature and precipitation prediction skills remains poor, partly due to stubborn errors in representing atmospheric dynamics and physics inside dynamical models. To counter these errors, we introduce an adaptive bias correction (ABC) method that combines state-of-the-art dynamical forecasts with observations using machine learning. When applied to the leading subseasonal model from the European Centre for Medium-Range Weather Forecasts (ECMWF), ABC improves temperature forecasting skill by 60-90% and precipitation forecasting skill by 40-69% in the contiguous U.S. We couple these performance improvements with a practical workflow, based on Cohort Shapley, for explaining ABC skill gains and identifying higher-skill windows of opportunity based on specific climate conditions.  ( 2 min )
    Boosting Simple Learners. (arXiv:2001.11704v6 [cs.LG] UPDATED)
    Boosting is a celebrated machine learning approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. We study boosting under the assumption that the weak hypotheses belong to a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are "rules-of-thumbs" from an "easy-to-learn class". (Schapire and Freund~'12, Shalev-Shwartz and Ben-David '14.) Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions: (i) Oracle Complexity: How many weak hypotheses are needed to produce an accurate hypothesis? We design a novel boosting algorithm and demonstrate that it circumvents a classical lower bound by Freund and Schapire ('95, '12). Whereas the lower bound shows that $\Omega({1}/{\gamma^2})$ weak hypotheses with $\gamma$-margin are sometimes necessary, our new method requires only $\tilde{O}({1}/{\gamma})$ weak hypothesis, provided that they belong to a class of bounded VC dimension. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, the new boosting algorithm uses more complex ("deeper") aggregation rules. We complement this result by showing that complex aggregation rules are in fact necessary to circumvent the aforementioned lower bound. (ii) Expressivity: Which tasks can be learned by boosting weak hypotheses from a bounded VC class? Can complex concepts that are "far away" from the class be learned? Towards answering the first question we {introduce combinatorial-geometric parameters which capture expressivity in boosting.} As a corollary we provide an affirmative answer to the second question for well-studied classes, including half-spaces and decision stumps. Along the way, we establish and exploit connections with Discrepancy Theory.  ( 3 min )
    EEG-Based Epileptic Seizure Prediction Using Temporal Multi-Channel Transformers. (arXiv:2209.11172v1 [eess.SP])
    Epilepsy is one of the most common neurological diseases, characterized by transient and unprovoked events called epileptic seizures. Electroencephalogram (EEG) is an auxiliary method used to perform both the diagnosis and the monitoring of epilepsy. Given the unexpected nature of an epileptic seizure, its prediction would improve patient care, optimizing the quality of life and the treatment of epilepsy. Predicting an epileptic seizure implies the identification of two distinct states of EEG in a patient with epilepsy: the preictal and the interictal. In this paper, we developed two deep learning models called Temporal Multi-Channel Transformer (TMC-T) and Vision Transformer (TMC-ViT), adaptations of Transformer-based architectures for multi-channel temporal signals. Moreover, we accessed the impact of choosing different preictal duration, since its length is not a consensus among experts, and also evaluated how the sample size benefits each model. Our models are compared with fully connected, convolutional, and recurrent networks. The algorithms were patient-specific trained and evaluated on raw EEG signals from the CHB-MIT database. Experimental results and statistical validation demonstrated that our TMC-ViT model surpassed the CNN architecture, state-of-the-art in seizure prediction.  ( 3 min )
    Amortized Variational Inference: Towards the Mathematical Foundation and Review. (arXiv:2209.10888v1 [cs.LG])
    The core principle of Variational Inference (VI) is to convert the statistical inference problem of computing complex posterior probability densities into a tractable optimization problem. This property enables VI to be faster than several sampling-based techniques. However, the traditional VI algorithm is not scalable to large data sets and is unable to readily infer out-of-bounds data points without re-running the optimization process. Recent developments in the field, like stochastic-, black box- and amortized-VI, have helped address these issues. Generative modeling tasks nowadays widely make use of amortized VI for its efficiency and scalability, as it utilizes a parameterized function to learn the approximate posterior density parameters. With this paper, we review the mathematical foundations of various VI techniques to form the basis for understanding amortized VI. Additionally, we provide an overview of the recent trends that address several issues of amortized VI, such as the amortization gap, generalization issues, inconsistent representation learning, and posterior collapse. Finally, we analyze alternate divergence measures that improve VI optimization.  ( 2 min )
    Interpretable Meta-Measure for Model Performance. (arXiv:2006.02293v2 [cs.LG] UPDATED)
    Benchmarks for the evaluation of model performance play an important role in machine learning. However, there is no established way to describe and create new benchmarks. What is more, the most common benchmarks use performance measures that share several limitations. For example, the difference in performance for two models has no probabilistic interpretation, there is no reference point to indicate whether they represent a significant improvement, and it makes no sense to compare such differences between data sets. We introduce a new meta-score assessment named Elo-based Predictive Power (EPP) that is built on top of other performance measures and allows for interpretable comparisons of models. The differences in EPP scores have a probabilistic interpretation and can be directly compared between data sets, furthermore, the logistic regression-based design allows for an assessment of ranking fitness based on a deviance statistic. We prove the mathematical properties of EPP and support them with empirical results of a large scale benchmark on 30 classification data sets and a real-world benchmark for visual data. Additionally, we propose a Unified Benchmark Ontology that is used to give a uniform description of benchmarks.  ( 3 min )
    The Sample Complexity of One-Hidden-Layer Neural Networks. (arXiv:2202.06233v2 [cs.LG] UPDATED)
    We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm. We begin by proving that in general, controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees (independent of the network width), while a stronger Frobenius norm control is sufficient, extending and improving on previous work. Motivated by the proof constructions, we identify and analyze two important settings where (perhaps surprisingly) a mere spectral norm control turns out to be sufficient: First, when the network's activation functions are sufficiently smooth (with the result extending to deeper networks); and second, for certain types of convolutional networks. In the latter setting, we study how the sample complexity is additionally affected by parameters such as the amount of overlap between patches and the overall number of patches.  ( 2 min )
    Matrix factorisation and the interpretation of geodesic distance. (arXiv:2106.01260v3 [stat.ML] UPDATED)
    Given a graph or similarity matrix, we consider the problem of recovering a notion of true distance between the nodes, and so their true positions. We show that this can be accomplished in two steps: matrix factorisation, followed by nonlinear dimension reduction. This combination is effective because the point cloud obtained in the first step lives close to a manifold in which latent distance is encoded as geodesic distance. Hence, a nonlinear dimension reduction tool, approximating geodesic distance, can recover the latent positions, up to a simple transformation. We give a detailed account of the case where spectral embedding is used, followed by Isomap, and provide encouraging experimental evidence for other combinations of techniques.  ( 2 min )
    A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases. (arXiv:2209.11208v1 [cs.LG])
    Learned optimizers -- neural networks that are trained to act as optimizers -- have the potential to dramatically accelerate training of machine learning models. However, even when meta-trained across thousands of tasks at huge computational expense, blackbox learned optimizers often struggle with stability and generalization when applied to tasks unlike those in their meta-training set. In this paper, we use tools from dynamical systems to investigate the inductive biases and stability properties of optimization algorithms, and apply the resulting insights to designing inductive biases for blackbox optimizers. Our investigation begins with a noisy quadratic model, where we characterize conditions in which optimization is stable, in terms of eigenvalues of the training dynamics. We then introduce simple modifications to a learned optimizer's architecture and meta-training procedure which lead to improved stability, and improve the optimizer's inductive bias. We apply the resulting learned optimizer to a variety of neural network training tasks, where it outperforms the current state of the art learned optimizer -- at matched optimizer computational overhead -- with regard to optimization performance and meta-training speed, and is capable of generalization to tasks far different from those it was meta-trained on.  ( 2 min )
    Algorithm-Agnostic Interpretations for Clustering. (arXiv:2209.10578v1 [cs.LG])
    A clustering outcome for high-dimensional data is typically interpreted via post-processing, involving dimension reduction and subsequent visualization. This destroys the meaning of the data and obfuscates interpretations. We propose algorithm-agnostic interpretation methods to explain clustering outcomes in reduced dimensions while preserving the integrity of the data. The permutation feature importance for clustering represents a general framework based on shuffling feature values and measuring changes in cluster assignments through custom score functions. The individual conditional expectation for clustering indicates observation-wise changes in the cluster assignment due to changes in the data. The partial dependence for clustering evaluates average changes in cluster assignments for the entire feature space. All methods can be used with any clustering algorithm able to reassign instances through soft or hard labels. In contrast to common post-processing methods such as principal component analysis, the introduced methods maintain the original structure of the features.  ( 2 min )
    Gaussian Process Hydrodynamics. (arXiv:2209.10707v1 [physics.flu-dyn])
    We present a Gaussian Process (GP) approach (Gaussian Process Hydrodynamics, GPH) for solving the Euler and Navier-Stokes equations. As in Smoothed Particle Hydrodynamics (SPH), GPH is a Lagrangian particle-based approach involving the tracking of a finite number of particles transported by the flow. However, these particles do not represent mollified particles of matter but carry discrete/partial information about the continuous flow. Closure is achieved by placing a divergence-free GP prior $\xi$ on the velocity field and conditioning on vorticity at particle locations. Known physics (e.g., the Richardson cascade and velocity-increments power laws) is incorporated into the GP prior through physics-informed additive kernels. This is equivalent to expressing $\xi$ as a sum of independent GPs $\xi^l$, which we call modes, acting at different scales. This approach leads to a quantitative analysis of the Richardson cascade through the analysis of the activation of these modes and allows us to coarse-grain turbulence in a statistical manner rather than a deterministic one. Since GPH is formulated on the vorticity equations, it does not require solving a pressure equation. By enforcing incompressibility and fluid/structure boundary conditions through the selection of the kernel, GPH requires much fewer particles than SPH. Since GPH has a natural probabilistic interpretation, numerical results come with uncertainty estimates enabling their incorporation into a UQ pipeline and the adding/removing of particles in an adapted manner. The proposed approach is amenable to analysis, it inherits the complexity of state-of-the-art solvers for dense kernel matrices, and it leads to a natural definition of turbulence as information loss. Numerical experiments support the importance of selecting physics-informed kernels and illustrate the major impact of such kernels on accuracy and stability.  ( 3 min )
    Reversible Gromov-Monge Sampler for Simulation-Based Inference. (arXiv:2109.14090v3 [stat.ME] UPDATED)
    This paper introduces a new simulation-based inference procedure to model and sample from multi-dimensional probability distributions given access to i.i.d.\ samples, circumventing the usual approaches of explicitly modeling the density function or designing Markov chain Monte Carlo. Motivated by the seminal work on distance and isomorphism between metric measure spaces, we propose a new notion called the Reversible Gromov-Monge (RGM) distance and study how RGM can be used to design new transform samplers to perform simulation-based inference. Our RGM sampler can also estimate optimal alignments between two heterogeneous metric measure spaces $(\cX, \mu, c_{\cX})$ and $(\cY, \nu, c_{\cY})$ from empirical data sets, with estimated maps that approximately push forward one measure $\mu$ to the other $\nu$, and vice versa. We study the analytic properties of the RGM distance and derive that under mild conditions, RGM equals the classic Gromov-Wasserstein distance. Curiously, drawing a connection to Brenier's polar factorization, we show that the RGM sampler induces bias towards strong isomorphism with proper choices of $c_{\cX}$ and $c_{\cY}$. Statistical rate of convergence, representation, and optimization questions regarding the induced sampler are studied. Synthetic and real-world examples showcasing the effectiveness of the RGM sampler are also demonstrated.  ( 3 min )
    LIMIS: Locally Interpretable Modeling using Instance-wise Subsampling. (arXiv:1909.12367v2 [cs.LG] UPDATED)
    Understanding black-box machine learning models is crucial for their widespread adoption. Learning globally interpretable models is one approach, but achieving high performance with them is challenging. An alternative approach is to explain individual predictions using locally interpretable models. For locally interpretable modeling, various methods have been proposed and indeed commonly used, but they suffer from low fidelity, i.e. their explanations do not approximate the predictions well. In this paper, our goal is to push the state-of-the-art in high-fidelity locally interpretable modeling. We propose a novel framework, Locally Interpretable Modeling using Instance-wise Subsampling (LIMIS). LIMIS utilizes a policy gradient to select a small number of instances and distills the black-box model into a low-capacity locally interpretable model using those selected instances. Training is guided with a reward obtained directly by measuring the fidelity of the locally interpretable models. We show on multiple tabular datasets that LIMIS near-matches the prediction accuracy of black-box models, significantly outperforming state-of-the-art locally interpretable models in terms of fidelity and prediction accuracy.  ( 2 min )
    A data-driven interpretation of the stability of molecular crystals. (arXiv:2209.10709v1 [physics.chem-ph])
    Due to the subtle balance of intermolecular interactions that govern structure-property relations, predicting the stability of crystal structures formed from molecular building blocks is a highly non-trivial scientific problem. A particularly active and fruitful approach involves classifying the different combinations of interacting chemical moieties, as understanding the relative energetics of different interactions enables the design of molecular crystals and fine-tuning their stabilities. While this is usually performed based on the empirical observation of the most commonly encountered motifs in known crystal structures, we propose to apply a combination of supervised and unsupervised machine-learning techniques to automate the construction of an extensive library of molecular building blocks. We introduce a structural descriptor tailored to the prediction of the binding energy for a curated dataset of organic crystals and exploit its atom-centered nature to obtain a data-driven assessment of the contribution of different chemical groups to the lattice energy of the crystal. We then interpret this library using a low-dimensional representation of the structure-energy landscape and discuss selected examples of the insights that can be extracted from this analysis, providing a complete database to guide the design of molecular materials.  ( 2 min )
    Exploiting Independent Instruments: Identification and Distribution Generalization. (arXiv:2202.01864v2 [stat.ML] UPDATED)
    Instrumental variable models allow us to identify a causal function between covariates $X$ and a response $Y$, even in the presence of unobserved confounding. Most of the existing estimators assume that the error term in the response $Y$ and the hidden confounders are uncorrelated with the instruments $Z$. This is often motivated by a graphical separation, an argument that also justifies independence. Positing an independence restriction, however, leads to strictly stronger identifiability results. We connect to the existing literature in econometrics and provide a practical method called HSIC-X for exploiting independence that can be combined with any gradient-based learning procedure. We see that even in identifiable settings, taking into account higher moments may yield better finite sample results. Furthermore, we exploit the independence for distribution generalization. We prove that the proposed estimator is invariant to distributional shifts on the instruments and worst-case optimal whenever these shifts are sufficiently strong. These results hold even in the under-identified case where the instruments are not sufficiently rich to identify the causal function.  ( 2 min )
    Simulation-based inference of Bayesian hierarchical models while checking for model misspecification. (arXiv:2209.11057v1 [stat.ME])
    This paper presents recent methodological advances to perform simulation-based inference (SBI) of a general class of Bayesian hierarchical models (BHMs), while checking for model misspecification. Our approach is based on a two-step framework. First, the latent function that appears as second layer of the BHM is inferred and used to diagnose possible model misspecification. Second, target parameters of the trusted model are inferred via SBI. Simulations used in the first step are recycled for score compression, which is necessary to the second step. As a proof of concept, we apply our framework to a prey-predator model built upon the Lotka-Volterra equations and involving complex observational processes.  ( 2 min )

  • Open

    [P] New search engine that uses LLM's to find answers in scientific research
    Would love for this community to check it out and give us your feedback, it's 100% free to use and create an account: https://consensus.app/search/ You can ask any plain English research question and we will use language models to try to find relevant findings in research papers. Here's an example: Does Magnesium help with sleep? submitted by /u/EOlson76 [link] [comments]  ( 103 min )
    [P] What model should i use/learn to Creating a Post success prediction algorithm
    I'm trying to build a post success prediction, which basically tells u if your post will succeed (get alot of upvotes) or not, i was thinking of BERT but i dont want tp be mistaken, is my choice correct? For dataset i will provide reddit post title + upvotes submitted by /u/Yuuki__konno [link] [comments]  ( 88 min )
    Convert Pegasus model to ONNX [Discussion]
    Hi all I am working on a project where I fine-tuned a Pegasus model on the Reddit dataset. Now, I need to convert the fine-tuned model to ONNX for the deployment stage. I have followed this guide from Huggingface to convert to the ONNX model for unsupported architects. I got it done but the ONNX model can't generate text. Turned out that Pegasus is an encoder-decoder model and most guides are for either encoder-model (e.g. BERT) or decoder-model (e.g. GPT2). I found the only example of converting an encoder-decoder model to ONNX from here https://github.com/Ki6an/fastT5. My question is if someone has experienced or seen the Pegasus model being converted to ONNX in the wild. Or do you have any tips/hints to do so? Thanks so much in advance submitted by /u/Lost-Letterhead2105 [link] [comments]  ( 89 min )
    [D] Some OpenAI Whisper benchmarks for runtime and cost
    Hey guys! I ran a few benchmarks on Whisper's runtime and cost-to-run on GCP, so just dropping it here in case it's valuable to anyone! submitted by /u/SleekEagle [link] [comments]  ( 89 min )
    [D] NLP for long document understanding ?
    I'm looking for papers dealing with document understanding for long documents, when I say long I mean an article (~10 pages) or even a full book. I'm would like to perform summarization or question-answering for such documents, but it seems that most literature never get to such document length. Do you know articles or do you have advice on how to use typical NLP-models for book-size inputs ? The easy solution would be to chunk the large document into smaller parts, but it seems to me that such approach would lose some semantics about the original text (i.e. if you split the text at the wrong place, it could change the meaning of the text), moreover since each chunk is processed independently, you lose context which be important to important a specific chunk submitted by /u/Even_Information4853 [link] [comments]  ( 91 min )
    [D] Best papers/resources to start digging into NLP?
    Hi everyone, my background is mainly focused in Computer Vision/Image processing. I want to start delve more into NLP, as I’ve never really went further than the basic concepts and I feel like very interesting CV ideas recently stem from the intersection of the two. What NLP papers are in your opinion must-read? submitted by /u/ats678 [link] [comments]  ( 88 min )
    [D] Text generation for HR purposes
    Hi, I am interested in the task of text generation for HR purposes. Right now, using GPT3 I can get decent results by prompting it with "3 sentences with negative feedback about Mary with the words "bad-tempered", "tardiness" and "needs improvement"." It does generate a decent negative "feedback" about Mary. Question : Is there anything out there that is open-sourced that can match this capability? submitted by /u/lppier2 [link] [comments]  ( 89 min )
    [P] The Data Science Interview book
    The Data Science Interview book is a completely online and free resource which has been making steady progress over the months. In the last 1 year it has been used by readers of more than 90 countries. Be sure to check it out. Recently we have launched a 📖 PDF version of the book at a launch price of $5 🥳, with a commitment that all future releases of the book will be mailed to the purchasers. The proceedings of this will be used to MAINTAIN and keep the online version FREE Don't forget to show this project your ❤️ and support submitted by /u/dipranjanchatterjee [link] [comments]  ( 90 min )
    [P] Graph path traversal with semantic graphs
    Semantic graphs, also known as knowledge graphs or semantic networks, build a graph network with semantic relationships connecting the nodes. They can be used to explore topics, data connectivity and perform network analysis. One interesting application is path traversal to analyze the connectivity of a dataset. These semantic relationships can be between text, audio and/or images. Here is an illustration that traces how two sentences are connected in the ag_news dataset. https://preview.redd.it/cv7wzjdkgbp91.png?width=1268&format=png&auto=webp&s=6235f4f27280397da92f725418d237adba432187 The illustration above demonstrates how a graph's relationships are used to walk a path between two totally unrelated snippets of text. This idea isn't exclusive to text, the same can be done for images. See this example from the imagenette dataset. https://preview.redd.it/5onn0d33hbp91.png?width=1482&format=png&auto=webp&s=5b15e7de0f61868370021126da3b18e9bc22a120 This path is starting with a person parachuting and uses the semantic graph's relationships to walk a path to a picture of a person holding a french horn. In addition to being highly interesting, this is also useful in helping with Exploratory Data Analysis (EDA). Full article and code can be found at the links below. Article: https://neuml.hashnode.dev/introducing-the-semantic-graph GitHub: https://github.com/neuml/txtai submitted by /u/davidmezzetti [link] [comments]  ( 89 min )
    [D] Can I rent out my GPU to ML researchers remotely?
    Hey r/MachineLearning, just had a thought come to mind as I stare at my powerful gaming PC currently going unused. With its powerful 3080ti graphics card, intel i9, and tons of RAM, I was wondering if there's a way I could put this fella to work, earn a few $ a day, and maybe help some machine learning researchers train their models. Are there any turn-key solutions/providers that help facilitate this? Or any other advice for doing something like this? Thanks for any guidance, cheers submitted by /u/RealSonZoo [link] [comments]  ( 89 min )
  • Open

    Fist Of Confusion - By RawChaa (App used: Wonder - A.I Generator None Dialogue Short Manga) Part Two
    submitted by /u/Rawchaa [link] [comments]  ( 87 min )
    Researchers at Tencent Propose GFP-GAN that Leverages Rich and Diverse Priors Encapsulated in a Pretrained Face GAN for Blind Face Restoration
    The goal of blind face restoration is to recover high-quality images of human faces from their low-quality counterparts that have been degraded for an unknown reason. Some degradation causes could be noise, blur, low-resolution, and compression artifacts. In this work, researchers from the Applied Research Center of the Tencent company propose GFP-GAN, a Generative Facial Prior GAN for real-world blind face restoration. As it is possible to see in Figure 1, the images restored through GFP-GAN reach higher realness and fidelity with fewer artifacts. Continue reading | Check out the paper and github link. ​ https://preview.redd.it/vvss7n3lnhp91.png?width=1045&format=png&auto=webp&s=8fd0c5ffae078a9aae3ec978caacc898d050bd1a submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Google Colab notebook to transcribe and translate audio with OpenAI's Whisper
    I've learned a lot about AI applications by using other people's Google Colab notebooks. When OpenAI's Whisper arrived, I created a Google Colab notebook so you can run both the transcription and translation functions of this automatic speech recognition system. submitted by /u/ZackaryBlue [link] [comments]  ( 87 min )
    The Road to Realistic Full-Body Deepfakes
    submitted by /u/magenta_placenta [link] [comments]  ( 87 min )
    Stable Diffusion AUTOMATIC1111 Full Installation Guide
    submitted by /u/PuppetHere [link] [comments]  ( 86 min )
    Best Artificial Intelligence courses online
    hello, i am currently enrolled in a BSc of artificial intelligence, I'd like to know what if there are some courses online were i can earn a certificate to boost my CV when i am applying to jobs. note: preferably on coursera, but i dont mind better options if available. submitted by /u/Lebanese-dude [link] [comments]  ( 87 min )
    My experience with Anima AI so far 👎 (Have yet to find anything that compares to GPT 3 Davinci)
    submitted by /u/DeadUncle [link] [comments]  ( 92 min )
    AI Art Shorts Series | 7 Deadly Sins - Envy
    submitted by /u/Swisheater [link] [comments]  ( 87 min )
    If I want to be a part of the AI revolution in particular, is there a better undergrad major than computer science? Or is it best to go computer science bachelors and then get in an AI masters program?
    submitted by /u/Overall-Importance54 [link] [comments]  ( 87 min )
    This AI is getting too realistic
    submitted by /u/TheScrantonStranglr [link] [comments]  ( 88 min )
    How to use OpenAI's Whisper (and some accuracy, runtime, and cost benchmarks)
    Hey everyone! I'm sure many of you know that OpenAI released Whisper yesterday- an open source speech recognition model with weights available. Not sure if this is allowed, but I wrote a guide on how to run Whisper that also provides some benchmarks on accuracy, inference time, and cost. Let me know what you think :) submitted by /u/SleekEagle [link] [comments]  ( 87 min )
    Nvidia-Deloitte partnership aims to accelerate AI adoption
    submitted by /u/TallAssociation0 [link] [comments]  ( 92 min )
    Talk today: Data Labeling and Versioning for Production Retraining using Label Studio and Modzy
    Data-centric AI doesn't just stop with cleaning and preparing data for model training - there are rich insights to be gleaned from production data. By analyzing, segmenting, and selectively relabeling your production inference data, you can generate datasets for future model retraining. This talk will show you how you can use human-in-the-loop oversight to generate high-quality, labeled datasets using Label Studio from your prediction data for future model retraining. Tune in to the Modzy Discord Server today at 12:30 EDT! submitted by /u/modzykirsten [link] [comments]  ( 87 min )
    Introducing Whisper
    submitted by /u/Black_RL [link] [comments]  ( 93 min )
  • Open

    Why is this keras DQN cartpole so incredibly slow?
    For some reason, this program is super slow. I have spent quite a while, trying to debug it. However, it doesn't seem different from other keras programs that are much faster. ​ ***Update - I think I know where the problem is. After analyzing other programs, it seems that this code cannot perform batch operations quickly. Most, batch operations are performed in the replay method. ​ # -*- coding: utf-8 -*- import random import gym import numpy as np from collections import deque from keras.models import Sequential from keras.layers import Dense from tensorflow.keras.optimizers import Adam from keras import backend as K import time import tensorflow as tf EPISODES = 5000 class DQNAgent: def __init__(self, state_size, action_size): self.state_size = state_size self.action_size = action_s…  ( 90 min )
    Graduation project ideas
    I have completed several courses on data science, machine learning and deep learning. The thing that interested me the most was reinforcement learning. The idea of ​​building an AI that learns to play an arcade game looks good. But since this will be my final project, I thought something that could be used in real life would be better. For example, what problem does an artificial intelligence learn to play Mario solve in real life? Do you have any such project suggestions? Fun, impressive and simple. :) submitted by /u/Long_Elk_4215 [link] [comments]  ( 100 min )
    Prerequisites and guidance for UCLxdeep mind RL playlist
    What are the prerequisites for David's silver RL playlist? And should I use that because there are 2 other playlists in deep mind from Hado van Hasselt? One was released in 2018 and other on 2021. Which one should I follow at the beginning of the journey? And what are the prerequisites for all that? submitted by /u/Cosmic_Ishan [link] [comments]  ( 88 min )
    A question about BRL learns from expert buffers
    Assume a Batch-RL(BRL) agent learns from an expert buffer generated AFTER an RL agent trained online for a large number of iterations (e.g. 1M). Thus, the buffer should contain only transitions with high returns. Essentially, this BRL agent is not the same as the expert agent since it has no knowledge of what state-action pairs would lead to low returns. So the differences between this BRL agent and the expert agent are the regime where state-action visitations are learned and of course the parameters in the neural networks. I assume there is also a difference similar to the credit/blame assignment since there might be slim to no blame assignment in the BRL agent. My question is that intuitively a BRL model would be more robust if it trains on both expert and replay buffers (similar to what COMBO did). But how do I prove it? submitted by /u/Blasphemer666 [link] [comments]  ( 88 min )
    Why does my Deep Q Learning reach a limit?
    I am using Deep Q Learning to try to create a simple 2D self driving car simulation in Python. The state is the distance to the edge of the road at a few locations, and the actions are left, right, accelerate, brake. When simply controlling steering, it can navigate any map, but introduced to speed, it can't learn to brake around corners, causing it to crash. I have tried alot of different combinations of hyperparameters, and the below graph is the best I can get it. ​ https://preview.redd.it/36xatmkwifp91.png?width=564&format=png&auto=webp&s=0786ecc010ee7913513cad35fb4042902011f4a6 Here are the settings I used. "LEARNING_RATE": 1e-10, "GD_MOMENTUM": 0.9, "DISCOUNT_RATE": 0.999, "EPSILON_DECAY": 0.00002, "EPSILON_MIN": 0.1, "TARGET_NET_COPY_STEPS": 17000, "TRAIN_AMOUNT": 0.8, My guess is that it can't take into account rewards that far in the future, so I increased the movement per frame but it didn't help. For the neural networks, I am using my own library (which I have verified works), with 12 layers, increasing up to a max of 256 nodes, using relu. I have tried different configurations, which were either worse or the same. You can find the code here, but there is alot of code for other features, so it may be confusing. I can confirm it works, at least for steering.: Github Thanks for any advice! submitted by /u/Si1veRonReddit [link] [comments]  ( 92 min )
    Late rewards in reinforcement learning
    Hello. I'm working on a masters thesis in engineering where I'm deploying a deep RL agent on a simulation I made. I have hit a brick wall in formulating my reward signal it seems. So some actions the agent can take may not have any consequences until many states later, 50-100 even so I'm fearing that might cause divergence in the learning process but if I formulate the reward differently the agent might not learn the desired mechanics of the simulation. Am I overthinking this or is this a legitimate concern for deep RL in general? Thanks a lot in advance! P.s. Sorry for not explaining a whole lot, I thought I'd present the problem broadly but if you're interested to know what the simulation is about please dm me! submitted by /u/arachnarus96 [link] [comments]  ( 112 min )
    Beginning the journey in RL
    What is the roadmap I need to follow to learn deep reinforcement learning> What is the best course for learning "deep reinforcement learning"? ​ I show deep mind has some playlist for it but they don't teach the implementation. Anywhere I could learn the implementation? submitted by /u/Cosmic_Ishan [link] [comments]  ( 90 min )
    Help me choose hardwares for RL!
    Our research group currently add a new direction to use RL to solve job shop problems(one of NP-hard problems). Spent some days learning and practicing, we decide to buy a server to train model remotely. However we do not have much experience about hardwares, so we are just confused about which (CPU / GPU / Ram / Mainboard / Memory / Nic) will be good for us. In our project, images are rarely involved, so I guess maybe we don't need a GPU with high performance. In conclude, we need a server to : - support 2 - 3 persons to train RL models at the same time, - focus on training without image processing. I will be very grateful if you can give me some precious advice. And by the way, I have little knowledge about configuring remote server, like install Linux system, establish ssh connect, manage user group ... So I also need some tutorials about this. submitted by /u/JoPrimer [link] [comments]  ( 90 min )
  • Open

    TensorStore for High-Performance, Scalable Array Storage
    Posted by Jeremy Maitin-Shepard and Laramie Leavitt, Software Engineers, Connectomics at Google Many exciting contemporary applications of computer science and machine learning (ML) manipulate multidimensional datasets that span a single large coordinate system, for example, weather modeling from atmospheric measurements over a spatial grid or medical imaging predictions from multi-channel image intensity values in a 2d or 3d scan. In these settings, even a single dataset may require terabytes or petabytes of data storage. Such datasets are also challenging to work with as users may read and write data at irregular intervals and varying scales, and are often interested in performing analyses using numerous machines working in parallel. Today we are introducing TensorStore, an open-source…  ( 26 min )
  • Open

    Detect population variance of endangered species using Amazon Rekognition
    Our planet faces a global extinction crisis. UN Report shows a staggering number of more than a million species feared to be on the path of extinction. The most common reasons for extinction include loss of habitat, poaching, and invasive species. Several wildlife conservation foundations, research scientists, volunteers, and anti-poaching rangers have been working tirelessly […]  ( 8 min )
    How Amazon Search reduced ML inference costs by 85% with AWS Inferentia
    Amazon’s product search engine indexes billions of products, serves hundreds of millions of customers worldwide, and is one of the most heavily used services in the world. The Amazon Search team develops machine learning (ML) technology that powers the Amazon.com search engine and helps customers search effortlessly. To deliver a great customer experience and operate […]  ( 7 min )
  • Open

    Arcane Music Video Created using Stable Diffusion
    submitted by /u/Ziinxx [link] [comments]  ( 86 min )
    Neural network with variable number of inputs and related inputs
    Hi, I've been trying to learn about neural networks recently and I successfully coded a neural network from scratch to recognize digits from the MNIST database. I'm now trying to create a more advanced neural network and I would be grateful if someone would give me some advice about the architectural structure of the network. Let's assume that I am trying to build a neural network that is trying to predict a plant's height based on a variety of inputs. Some of the inputs would be things such as days alive, current height, etc. However, another type of inputs would be the width, length, and thickness of each leaf that the plant has. Because different plants will have different amounts of leaves and the width, length, and thickness of each leaf are correlated and not standalone inputs, I was wondering how this network would be designed. So far, I have done initial research about RNNs or LSTMs but I am not sure how these would solve the problem of finding a way to keep the width, length, and thickness of each leaf correlated to each other. Could somebody please point me in the right direction or tell me what architectures I should look into? submitted by /u/Ceraphen [link] [comments]  ( 104 min )
  • Open

    Go Hands On: Logitech G CLOUD Launches With Support for GeForce NOW
    When it rains, it pours. And this GFN Thursday brings a downpour of news for GeForce NOW members. The Logitech G CLOUD is the latest gaming handheld device to support GeForce NOW, giving members a brand new way to keep the gaming going. But that’s not all: Portal with RTX joins GeForce NOW in November, Read article > The post Go Hands On: Logitech G CLOUD Launches With Support for GeForce NOW appeared first on NVIDIA Blog.  ( 6 min )
    Continental and AEye Join NVIDIA DRIVE Sim Sensor Ecosystem, Providing Rich Capabilities for AV Development
    Autonomous vehicle sensors require the same rigorous testing and validation as the car itself, and one simulation platform is up to the task. Global tier-1 supplier Continental and software-defined lidar maker AEye announced this week at NVIDIA GTC that they will migrate their intelligent lidar sensor model into NVIDIA DRIVE Sim. The companies are the Read article > The post Continental and AEye Join NVIDIA DRIVE Sim Sensor Ecosystem, Providing Rich Capabilities for AV Development appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    Dominoes in Unicode
    I was spelunking around in Unicode and found that there are assigned characters for representing domino tiles and that the characters are enumerated in a convenient order. Here is the code chart. There are codes for representing tiles horizontally or vertically. And even though, for example, the 5-3 is the same domino as the 3-5, […] Dominoes in Unicode first appeared on John D. Cook.  ( 5 min )
  • Open

    TransformX by Scale AI is Oct 19-21: Register for free!
    Sponsored Post     📣 The AI event of the year is quickly approaching… We’re talking about TransformX, a FREE virtual conference where you’ll hear from 120+ technology leaders from companies like Google, Meta, OpenAI, DeepMind, Amazon, and more. Explore how AI will power ecommerce, AI applications for healthcare, NFT marketplaces and more. 🎙 Speakers […] The post TransformX by Scale AI is Oct 19-21: Register for free! appeared first on Machine Learning Mastery.
  • Open

    How Much Does It Cost to Hire a Software Developer?
    How much will it cost to hire a custom software development company or an app developer in 2022? If this question strikes your mind, you’re at the right place.  With the increasing trend of digitization, many businesses are paving the way to bring digital transformation to their operations. However, they remain confused about evaluating the… Read More »How Much Does It Cost to Hire a Software Developer? The post How Much Does It Cost to Hire a Software Developer? appeared first on Data Science Central.  ( 20 min )
  • Open

    Federated Learning from Pre-Trained Models: A Contrastive Learning Approach. (arXiv:2209.10083v1 [cs.CR])
    Federated Learning (FL) is a machine learning paradigm that allows decentralized clients to learn collaboratively without sharing their private data. However, excessive computation and communication demands pose challenges to current FL frameworks, especially when training large-scale models. To prevent these issues from hindering the deployment of FL systems, we propose a lightweight framework where clients jointly learn to fuse the representations generated by multiple fixed pre-trained models rather than training a large-scale model from scratch. This leads us to a more practical FL problem by considering how to capture more client-specific and class-relevant information from the pre-trained models and jointly improve each client's ability to exploit those off-the-shelf models. In this work, we design a Federated Prototype-wise Contrastive Learning (FedPCL) approach which shares knowledge across clients through their class prototypes and builds client-specific representations in a prototype-wise contrastive manner. Sharing prototypes rather than learnable model parameters allows each client to fuse the representations in a personalized way while keeping the shared knowledge in a compact form for efficient communication. We perform a thorough evaluation of the proposed FedPCL in the lightweight framework, measuring and visualizing its ability to fuse various pre-trained models on popular FL datasets.
    Heterogeneous Treatment Effect Estimation using machine learning for Healthcare application: tutorial and benchmark. (arXiv:2109.12769v3 [cs.LG] UPDATED)
    Developing new drugs for target diseases is a time-consuming and expensive task, drug repurposing has become a popular topic in the drug development field. As much health claim data become available, many studies have been conducted on the data. The real-world data is noisy, sparse, and has many confounding factors. In addition, many studies have shown that drugs effects are heterogeneous among the population. Lots of advanced machine learning models about estimating heterogeneous treatment effects (HTE) have emerged in recent years, and have been applied to in econometrics and machine learning communities. These studies acknowledge medicine and drug development as the main application area, but there has been limited translational research from the HTE methodology to drug development. We aim to introduce the HTE methodology to the healthcare area and provide feasibility consideration when translating the methodology with benchmark experiments on healthcare administrative claim data. Also, we want to use benchmark experiments to show how to interpret and evaluate the model when it is applied to healthcare research. By introducing the recent HTE techniques to a broad readership in biomedical informatics communities, we expect to promote the wide adoption of causal inference using machine learning. We also expect to provide the feasibility of HTE for personalized drug effectiveness.
    Periodic Extrapolative Generalisation in Neural Networks. (arXiv:2209.10280v1 [cs.LG])
    The learning of the simplest possible computational pattern -- periodicity -- is an open problem in the research of strong generalisation in neural networks. We formalise the problem of extrapolative generalisation for periodic signals and systematically investigate the generalisation abilities of classical, population-based, and recently proposed periodic architectures on a set of benchmarking tasks. We find that periodic and "snake" activation functions consistently fail at periodic extrapolation, regardless of the trainability of their periodicity parameters. Further, our results show that traditional sequential models still outperform the novel architectures designed specifically for extrapolation, and that these are in turn trumped by population-based training. We make our benchmarking and evaluation toolkit, PerKit, available and easily accessible to facilitate future work in the area.
    Scheduling Jobs with Stochastic Holding Costs. (arXiv:2105.13655v3 [cs.LG] UPDATED)
    We study a single-server scheduling problem for the objective of minimizing the expected cumulative holding cost incurred by jobs, where parameters defining stochastic job holding costs are unknown to the scheduler. We consider a general setting allowing for different job classes, where jobs of the same class have statistically identical holding costs and service times, with an arbitrary number of jobs across classes. In each time step, the server can process a job and observes random holding costs of the jobs that are yet to be completed. We consider a learning-based $c\mu$ rule scheduling which starts with a preemption period of fixed duration, serving as a learning phase, and having gathered data about jobs, it switches to nonpreemptive scheduling. Our algorithms are designed to handle instances with large and small gaps in mean job holding costs and achieve near-optimal performance guarantees. The performance of algorithms is evaluated by regret, where the benchmark is the minimum possible total holding cost attained by the $c\mu$ rule scheduling policy when the parameters of jobs are known. We show regret lower bounds and algorithms that achieve nearly matching regret upper bounds. Our numerical results demonstrate the efficacy of our algorithms and show that our regret analysis is nearly tight.
    Bias-Scalable Near-Memory CMOS Analog Processor for Machine Learning. (arXiv:2202.05022v2 [cs.ET] UPDATED)
    Bias-scalable analog computing is attractive for implementing machine learning (ML) processors with distinct power-performance specifications. For example, ML implementations for server workloads are focused on computational throughput and faster training, whereas ML implementations for edge devices are focused on energy-efficient inference. In this paper, we demonstrate the implementation of bias-scalable analog computing circuits using a generalization of the Margin Propagation (MP) principle called shape-based analog computing (S-AC). The resulting S-AC core integrates several near-memory compute elements, which include: (a) non-linear activation functions; (b) inner-product compute circuits; and (c) a mixed-signal compressive memory. Using measured results from prototypes fabricated in a 180nm CMOS process, we demonstrate that the performance of computing modules remains robust to transistor biasing and variations in temperature. In this paper, we also demonstrate bias-scalability for a simple ML regression task.
    TECM: Transfer Learning-based Evidential C-Means Clustering. (arXiv:2112.10152v2 [cs.LG] UPDATED)
    As a representative evidential clustering algorithm, evidential c-means (ECM) provides a deeper insight into the data by allowing an object to belong not only to a single class, but also to any subset of a collection of classes, which generalizes the hard, fuzzy, possibilistic, and rough partitions. However, compared with other partition-based algorithms, ECM must estimate numerous additional parameters, and thus insufficient or contaminated data will have a greater influence on its clustering performance. To solve this problem, in this study, a transfer learning-based ECM (TECM) algorithm is proposed by introducing the strategy of transfer learning into the process of evidential clustering. The TECM objective function is constructed by integrating the knowledge learned from the source domain with the data in the target domain to cluster the target data. Subsequently, an alternate optimization scheme is developed to solve the constraint objective function of the TECM algorithm. The proposed TECM algorithm is applicable to cases where the source and target domains have the same or different numbers of clusters. A series of experiments were conducted on both synthetic and real datasets, and the experimental results demonstrated the effectiveness of the proposed TECM algorithm compared to ECM and other representative multitask or transfer-clustering algorithms.
    In progress. (arXiv:2209.08860v2 [stat.ML] UPDATED)
    The concept of causality plays an important role in human cognition . In the past few decades, causal inference has been well developed in many fields, such as computer science, medicine, economics, and education. With the advancement of deep learning techniques, it has been increasingly used in causal inference against counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective optimization functions to estimate counterfactual data unbiasedly based on the different optimization methods. This paper focuses on the survey of the deep causal models, and its core contributions are as follows: 1) we provide relevant metrics under multiple treatments and continuous-dose treatment; 2) we incorporate a comprehensive overview of deep causal models from both temporal development and method classification perspectives; 3) we assist a detailed and comprehensive classification and analysis of relevant datasets and source code.
    Chaotic Hedging with Iterated Integrals and Neural Networks. (arXiv:2209.10166v1 [q-fin.MF])
    In this paper, we extend the Wiener-Ito chaos decomposition to the class of diffusion processes, whose drift and diffusion coefficient are of linear growth. By omitting the orthogonality in the chaos expansion, we are able to show that every $p$-integrable functional, for $p \in [1,\infty)$, can be represented as sum of iterated integrals of the underlying process. Using a truncated sum of this expansion and (possibly random) neural networks for the integrands, whose parameters are learned in a machine learning setting, we show that every financial derivative can be approximated arbitrarily well in the $L^p$-sense. Moreover, the hedging strategy of the approximating financial derivative can be computed in closed form.
    Causal Effect Variational Autoencoder with Uniform Treatment. (arXiv:2111.08656v2 [cs.LG] UPDATED)
    Domain adaptation and covariate shift are big issues in deep learning and they ultimately affect any causal inference algorithms that rely on deep neural networks. Causal effect variational autoencoder (CEVAE) is trained to predict the outcome given observational treatment data and it suffers from the distribution shift at test time. In this paper, we introduce uniform treatment variational autoencoders (UTVAE) that are trained with uniform treatment distribution using importance sampling and show that using uniform treatment over observational treatment distribution leads to better causal inference by mitigating the distribution shift that occurs from training to test time. We also explore the combination of uniform and observational treatment distributions with inference and generative network training objectives to find a better training procedure for inferring treatment effects. Experimentally, we find that the proposed UTVAE yields better absolute average treatment effect error and precision in the estimation of heterogeneous effect error than the CEVAE on synthetic and IHDP datasets.
    Optimizing Crop Management with Reinforcement Learning and Imitation Learning. (arXiv:2209.09991v1 [cs.AI])
    Crop management, including nitrogen (N) fertilization and irrigation management, has a significant impact on the crop yield, economic profit, and the environment. Although management guidelines exist, it is challenging to find the optimal management practices given a specific planting environment and a crop. Previous work used reinforcement learning (RL) and crop simulators to solve the problem, but the trained policies either have limited performance or are not deployable in the real world. In this paper, we present an intelligent crop management system which optimizes the N fertilization and irrigation simultaneously via RL, imitation learning (IL), and crop simulations using the Decision Support System for Agrotechnology Transfer (DSSAT). We first use deep RL, in particular, deep Q-network, to train management policies that require all state information from the simulator as observations (denoted as full observation). We then invoke IL to train management policies that only need a limited amount of state information that can be readily obtained in the real world (denoted as partial observation) by mimicking the actions of the previously RL-trained policies under full observation. We conduct experiments on a case study using maize in Florida and compare trained policies with a maize management guideline in simulations. Our trained policies under both full and partial observations achieve better outcomes, resulting in a higher profit or a similar profit with a smaller environmental impact. Moreover, the partial-observation management policies are directly deployable in the real world as they use readily available information.
    Can You Still See Me?: Reconstructing Robot Operations Over End-to-End Encrypted Channels. (arXiv:2205.08426v2 [cs.CR] UPDATED)
    Connected robots play a key role in Industry 4.0, providing automation and higher efficiency for many industrial workflows. Unfortunately, these robots can leak sensitive information regarding these operational workflows to remote adversaries. While there exists mandates for the use of end-to-end encryption for data transmission in such settings, it is entirely possible for passive adversaries to fingerprint and reconstruct entire workflows being carried out -- establishing an understanding of how facilities operate. In this paper, we investigate whether a remote attacker can accurately fingerprint robot movements and ultimately reconstruct operational workflows. Using a neural network approach to traffic analysis, we find that one can predict TLS-encrypted movements with around ~60% accuracy, increasing to near-perfect accuracy under realistic network conditions. Further, we also find that attackers can reconstruct warehousing workflows with similar success. Ultimately, simply adopting best cybersecurity practices is clearly not enough to stop even weak (passive) adversaries.
    Model-Free Reinforcement Learning for Asset Allocation. (arXiv:2209.10458v1 [q-fin.PM])
    Asset allocation (or portfolio management) is the task of determining how to optimally allocate funds of a finite budget into a range of financial instruments/assets such as stocks. This study investigated the performance of reinforcement learning (RL) when applied to portfolio management using model-free deep RL agents. We trained several RL agents on real-world stock prices to learn how to perform asset allocation. We compared the performance of these RL agents against some baseline agents. We also compared the RL agents among themselves to understand which classes of agents performed better. From our analysis, RL agents can perform the task of portfolio management since they significantly outperformed two of the baseline agents (random allocation and uniform allocation). Four RL agents (A2C, SAC, PPO, and TRPO) outperformed the best baseline, MPT, overall. This shows the abilities of RL agents to uncover more profitable trading strategies. Furthermore, there were no significant performance differences between value-based and policy-based RL agents. Actor-critic agents performed better than other types of agents. Also, on-policy agents performed better than off-policy agents because they are better at policy evaluation and sample efficiency is not a significant problem in portfolio management. This study shows that RL agents can substantially improve asset allocation since they outperform strong baselines. On-policy, actor-critic RL agents showed the most promise based on our analysis.
    Hierarchical Decision Transformer. (arXiv:2209.10447v1 [cs.LG])
    Sequence models in reinforcement learning require task knowledge to estimate the task policy. This paper presents a hierarchical algorithm for learning a sequence model from demonstrations. The high-level mechanism guides the low-level controller through the task by selecting sub-goals for the latter to reach. This sequence replaces the returns-to-go of previous methods, improving its performance overall, especially in tasks with longer episodes and scarcer rewards. We validate our method in multiple tasks of OpenAIGym, D4RL and RoboMimic benchmarks. Our method outperforms the baselines in eight out of ten tasks of varied horizons and reward frequencies without prior task knowledge, showing the advantages of the hierarchical model approach for learning from demonstrations using a sequence model.
    Data Augmentation as Feature Manipulation. (arXiv:2203.01572v2 [cs.LG] UPDATED)
    Data augmentation is a cornerstone of the machine learning pipeline, yet its theoretical underpinnings remain unclear. Is it merely a way to artificially augment the data set size? Or is it about encouraging the model to satisfy certain invariance? In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process. We find that data augmentation can alter the relative importance of various features, effectively making certain informative but hard to learn features more likely to be captured in the learning process. Importantly, we show that this effect is more pronounced for non-linear models, such as neural networks. Our main contribution is a detailed analysis of data augmentation on the learning dynamic for a two layer convolutional neural network in the recently proposed multi-view data model by Allen-Zhu and Li [2020]. We complement this analysis with further experimental evidence that data augmentation can be viewed as feature manipulation.
    SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning. (arXiv:2209.10365v1 [cs.LG])
    Partial-label learning (PLL) is a peculiar weakly-supervised learning task where the training samples are generally associated with a set of candidate labels instead of single ground truth. While a variety of label disambiguation methods have been proposed in this domain, they normally assume a class-balanced scenario that may not hold in many real-world applications. Empirically, we observe degenerated performance of the prior methods when facing the combinatorial challenge from the long-tailed distribution and partial-labeling. In this work, we first identify the major reasons that the prior work failed. We subsequently propose SoLar, a novel Optimal Transport-based framework that allows to refine the disambiguated labels towards matching the marginal class prior distribution. SoLar additionally incorporates a new and systematic mechanism for estimating the long-tailed class prior distribution under the PLL setup. Through extensive experiments, SoLar exhibits substantially superior results on standardized benchmarks compared to the previous state-of-the-art PLL methods. Code and data are available at: https://github.com/hbzju/SoLar .
    Are Attention Networks More Robust? Towards Exact Robustness Verification for Attention Networks. (arXiv:2202.03932v2 [cs.LG] UPDATED)
    As an emerging type of Neural Networks (NNs), Attention Networks (ATNs) such as Transformers have been shown effective, in terms of accuracy, in many applications. This paper further considers their robustness. More specifically, we are curious about their maximum resilience against local input perturbations compared to the more conventional Multi-Layer Perceptrons (MLPs). Thus, we formulate the verification task into an optimization problem, from which exact robustness values can be obtained. One major challenge, however, is the non-convexity and non-linearity of NNs. While the existing literature has handled the challenge to some extent with methods such as Branch-and-Bound, the additional level of difficulty introduced by the quadratic and exponential functions in the ATNs has not been tackled. Our work reduces this gap by focusing on sparsemax-based ATNs, encoding them into Mixed Integer Quadratically Constrained Programming problems, and proposing two powerful heuristics for a speedup of one order of magnitude. Finally, we train and evaluate several sparsemax-based ATNs and similar-sized ReLU-based MLPs for a lane departure warning task and show that the former is surprisingly less robust despite generally higher accuracy.
    Robust Information Bottleneck for Task-Oriented Communication with Digital Modulation. (arXiv:2209.10382v1 [cs.IT])
    Task-oriented communications, mostly using learning-based joint source-channel coding (JSCC), aim to design a communication-efficient edge inference system by transmitting task-relevant information to the receiver. However, only transmitting task-relevant information without introducing any redundancy may cause robustness issues in learning due to the channel variations, and the JSCC which directly maps the source data into continuous channel input symbols poses compatibility issues on existing digital communication systems. In this paper, we address these two issues by first investigating the inherent tradeoff between the informativeness of the encoded representations and the robustness to information distortion in the received representations, and then propose a task-oriented communication scheme with digital modulation, named discrete task-oriented JSCC (DT-JSCC), where the transmitter encodes the features into a discrete representation and transmits it to the receiver with the digital modulation scheme. In the DT-JSCC scheme, we develop a robust encoding framework, named robust information bottleneck (RIB), to improve the communication robustness to the channel variations, and derive a tractable variational upper bound of the RIB objective function using the variational approximation to overcome the computational intractability of mutual information. The experimental results demonstrate that the proposed DT-JSCC achieves better inference performance than the baseline methods with low communication latency, and exhibits robustness to channel variations due to the applied RIB framework.
    NeurOLight: A Physics-Agnostic Neural Operator Enabling Parametric Photonic Device Simulation. (arXiv:2209.10098v1 [cs.ET])
    Optical computing is an emerging technology for next-generation efficient artificial intelligence (AI) due to its ultra-high speed and efficiency. Electromagnetic field simulation is critical to the design, optimization, and validation of photonic devices and circuits. However, costly numerical simulation significantly hinders the scalability and turn-around time in the photonic circuit design loop. Recently, physics-informed neural networks have been proposed to predict the optical field solution of a single instance of a partial differential equation (PDE) with predefined parameters. Their complicated PDE formulation and lack of efficient parametrization mechanisms limit their flexibility and generalization in practical simulation scenarios. In this work, for the first time, a physics-agnostic neural operator-based framework, dubbed NeurOLight, is proposed to learn a family of frequency-domain Maxwell PDEs for ultra-fast parametric photonic device simulation. We balance the efficiency and generalization of NeurOLight via several novel techniques. Specifically, we discretize different devices into a unified domain, represent parametric PDEs with a compact wave prior, and encode the incident light via masked source modeling. We design our model with parameter-efficient cross-shaped NeurOLight blocks and adopt superposition-based augmentation for data-efficient learning. With these synergistic approaches, NeurOLight generalizes to a large space of unseen simulation settings, demonstrates 2-orders-of-magnitude faster simulation speed than numerical solvers, and outperforms prior neural network models by ~54% lower prediction error with ~44% fewer parameters. Our code is available at https://github.com/JeremieMelo/NeurOLight.
    Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning. (arXiv:2209.10113v1 [cs.LG])
    Synchronizing decisions across multiple agents in realistic settings is problematic since it requires agents to wait for other agents to terminate and communicate about termination reliably. Ideally, agents should learn and execute asynchronously instead. Such asynchronous methods also allow temporally extended actions that can take different amounts of time based on the situation and action executed. Unfortunately, current policy gradient methods are not applicable in asynchronous settings, as they assume that agents synchronously reason about action selection at every time step. To allow asynchronous learning and decision-making, we formulate a set of asynchronous multi-agent actor-critic methods that allow agents to directly optimize asynchronous policies in three standard training paradigms: decentralized learning, centralized learning, and centralized training for decentralized execution. Empirical results (in simulation and hardware) in a variety of realistic domains demonstrate the superiority of our approaches in large multi-agent problems and validate the effectiveness of our algorithms for learning high-quality and asynchronous solutions.
    Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems. (arXiv:2209.10058v1 [cs.LG])
    Deep learning systems have been reported to achieve state-of-the-art performances in many applications, and a key is the existence of well trained classifiers on benchmark datasets. As a main-stream loss function, the cross entropy can easily lead us to find models which demonstrate severe overfitting behavior. In this paper, we show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy (CE) of the underlying data distribution of the dataset. However, the CE learned in this way does not characterize well the information shared by the label and the input. In this paper, we propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input. Theoretically, we give the population classification error lower bound in terms of the mutual information. In addition, we derive the mutual information lower and upper bounds for a concrete binary classification data model in $\mathbb{R}^n$, and also the error probability lower bound in this scenario. Empirically, we conduct extensive experiments on several benchmark datasets to support our theory. The mutual information learned classifiers (MILCs) achieve far better generalization performances than the conditional entropy learned classifiers (CELCs) with an improvement which can exceed more than 10\% in testing accuracy.
    Generative Modelling With Inverse Heat Dissipation. (arXiv:2206.13397v3 [cs.CV] UPDATED)
    While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the desirability of coarse-to-fine modelling, we propose a new model that generates images through iteratively inverting the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. We interpret the solution of the forward heat equation as a variational approximation in a diffusion-like latent variable model. We point out emergent qualitative properties not seen in diffusion models, such as disentanglement of overall colour and shape in images and aspects of neural network interpretability. Spectral analysis on natural images elucidates connections to diffusion models and reveals implicit inductive biases in them.
    Monotonic Neural Additive Models: Pursuing Regulated Machine Learning Models for Credit Scoring. (arXiv:2209.10070v1 [cs.LG])
    The forecasting of credit default risk has been an active research field for several decades. Historically, logistic regression has been used as a major tool due to its compliance with regulatory requirements: transparency, explainability, and fairness. In recent years, researchers have increasingly used complex and advanced machine learning methods to improve prediction accuracy. Even though a machine learning method could potentially improve the model accuracy, it complicates simple logistic regression, deteriorates explainability, and often violates fairness. In the absence of compliance with regulatory requirements, even highly accurate machine learning methods are unlikely to be accepted by companies for credit scoring. In this paper, we introduce a novel class of monotonic neural additive models, which meet regulatory requirements by simplifying neural network architecture and enforcing monotonicity. By utilizing the special architectural features of the neural additive model, the monotonic neural additive model penalizes monotonicity violations effectively. Consequently, the computational cost of training a monotonic neural additive model is similar to that of training a neural additive model, as a free lunch. We demonstrate through empirical results that our new model is as accurate as black-box fully-connected neural networks, providing a highly accurate and regulated machine learning method.
    Partial Information Decomposition Reveals the Structure of Neural Representations. (arXiv:2209.10438v1 [cs.IT])
    In neural networks, task-relevant information is represented jointly by groups of neurons. However, the specific way in which the information is distributed among the individual neurons is not well understood: While parts of it may only be obtainable from specific single neurons, other parts are carried redundantly or synergistically by multiple neurons. We show how Partial Information Decomposition (PID), a recent extension of information theory, can disentangle these contributions. From this, we introduce the measure of "Representational Complexity", which quantifies the difficulty of accessing information spread across multiple neurons. We show how this complexity is directly computable for smaller layers. For larger layers, we propose subsampling and coarse-graining procedures and prove corresponding bounds on the latter. Empirically, for quantized deep neural networks solving the MNIST task, we observe that representational complexity decreases both through successive hidden layers and over training. Overall, we propose representational complexity as a principled and interpretable summary statistic for analyzing the structure of neural representations.
    Measuring and Controlling Split Layer Privacy Leakage Using Fisher Information. (arXiv:2209.10119v1 [cs.CR])
    Split learning and inference propose to run training/inference of a large model that is split across client devices and the cloud. However, such a model splitting imposes privacy concerns, because the activation flowing through the split layer may leak information about the clients' private input data. There is currently no good way to quantify how much private information is being leaked through the split layer, nor a good way to improve privacy up to the desired level. In this work, we propose to use Fisher information as a privacy metric to measure and control the information leakage. We show that Fisher information can provide an intuitive understanding of how much private information is leaking through the split layer, in the form of an error bound for an unbiased reconstruction attacker. We then propose a privacy-enhancing technique, ReFIL, that can enforce a user-desired level of Fisher information leakage at the split layer to achieve high privacy, while maintaining reasonable utility.
    Learning the Propagation of Worms in Wireless Sensor Networks. (arXiv:2209.09984v1 [cs.LG])
    Wireless sensor networks (WSNs) are composed of spatially distributed sensors and are considered vulnerable to attacks by worms and their variants. Due to the distinct strategies of worms propagation, the dynamic behavior varies depending on the different features of the sensors. Modeling the spread of worms can help us understand the worm attack behaviors and analyze the propagation procedure. In this paper, we design a communication model under various worms. We aim to learn our proposed model to analytically derive the dynamics of competitive worms propagation. We develop a new searching space combined with complex neural network models. Furthermore, the experiment results verified our analysis and demonstrated the performance of our proposed learning algorithms.
    Safety Metrics and Losses for Object Detection in Autonomous Driving. (arXiv:2209.10368v1 [cs.CV])
    State-of-the-art object detectors have been shown effective in many applications. Usually, their performance is evaluated based on accuracy metrics such as mean Average Precision. In this paper, we consider a safety property of 3D object detectors in the context of Autonomous Driving (AD). In particular, we propose an essential safety requirement for object detectors in AD and formulate it into a specification. During the formulation, we find that abstracting 3D objects with projected 2D bounding boxes on the image and bird's-eye-view planes allows for a necessary and sufficient condition to the proposed safety requirement. We then leverage the analysis and derive qualitative and quantitative safety metrics based on the Intersection-over-Ground-Truth measure and a distance ratio between predictions and ground truths. Finally, for continual improvement, we formulate safety losses that can be used to optimize object detectors towards higher safety scores. Our experiments with public models on the MMDetection3D library and the nuScenes datasets demonstrate the validity of our consideration and proposals.
    Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics. (arXiv:2209.10015v1 [cs.LG])
    Modern machine learning research relies on relatively few carefully curated datasets. Even in these datasets, and typically in `untidy' or raw data, practitioners are faced with significant issues of data quality and diversity which can be prohibitively labor intensive to address. Existing methods for dealing with these challenges tend to make strong assumptions about the particular issues at play, and often require a priori knowledge or metadata such as domain labels. Our work is orthogonal to these methods: we instead focus on providing a unified and efficient framework for Metadata Archaeology -- uncovering and inferring metadata of examples in a dataset. We curate different subsets of data that might exist in a dataset (e.g. mislabeled, atypical, or out-of-distribution examples) using simple transformations, and leverage differences in learning dynamics between these probe suites to infer metadata of interest. Our method is on par with far more sophisticated mitigation methods across different tasks: identifying and correcting mislabeled examples, classifying minority-group samples, prioritizing points relevant for training and enabling scalable human auditing of relevant examples.
    Improving the Performance of Robust Control through Event-Triggered Learning. (arXiv:2207.14252v2 [eess.SY] UPDATED)
    Robust controllers ensure stability in feedback loops designed under uncertainty but at the cost of performance. Model uncertainty in time-invariant systems can be reduced by recently proposed learning-based methods, which improve the performance of robust controllers using data. However, in practice, many systems also exhibit uncertainty in the form of changes over time, e.g., due to weight shifts or wear and tear, leading to decreased performance or instability of the learning-based controller. We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem with rare or slow changes. Our key idea is to switch between robust and learned controllers. For learning, we first approximate the optimal length of the learning phase via Monte-Carlo estimations using a probabilistic model. We then design a statistical test for uncertain systems based on the moment-generating function of the LQR cost. The test detects changes in the system under control and triggers re-learning when control performance deteriorates due to system changes. We demonstrate improved performance over a robust controller baseline in a numerical example.
    Deep Learning for Multi-User MIMO Systems: Joint Design of Pilot, Limited Feedback, and Precoding. (arXiv:2209.10332v1 [cs.IT])
    In conventional multi-user multiple-input multiple-output (MU-MIMO) systems with frequency division duplexing (FDD), channel acquisition and precoder optimization processes have been designed separately although they are highly coupled. This paper studies an end-to-end design of downlink MU-MIMO systems which include pilot sequences, limited feedback, and precoding. To address this problem, we propose a novel deep learning (DL) framework which jointly optimizes the feedback information generation at users and the precoder design at a base station (BS). Each procedure in the MU-MIMO systems is replaced by intelligently designed multiple deep neural networks (DNN) units. At the BS, a neural network generates pilot sequences and helps the users obtain accurate channel state information. At each user, the channel feedback operation is carried out in a distributed manner by an individual user DNN. Then, another BS DNN collects feedback information from the users and determines the MIMO precoding matrices. A joint training algorithm is proposed to optimize all DNN units in an end-to-end manner. In addition, a training strategy which can avoid retraining for different network sizes for a scalable design is proposed. Numerical results demonstrate the effectiveness of the proposed DL framework compared to classical optimization techniques and other conventional DNN schemes.
    Learning from Mixed Datasets: A Monotonic Image Quality Assessment Model. (arXiv:2209.10451v1 [cs.CV])
    Deep learning based image quality assessment (IQA) models usually learn to predict image quality from a single dataset, leading the model to overfit specific scenes. To account for this, mixed datasets training can be an effective way to enhance the generalization capability of the model. However, it is nontrivial to combine different IQA datasets, as their quality evaluation criteria, score ranges, view conditions, as well as subjects are usually not shared during the image quality annotation. In this paper, instead of aligning the annotations, we propose a monotonic neural network for IQA model learning with different datasets combined. In particular, our model consists of a dataset-shared quality regressor and several dataset-specific quality transformers. The quality regressor aims to obtain the perceptual qualities of each dataset while each quality transformer maps the perceptual qualities to the corresponding dataset annotations with their monotonicity maintained. The experimental results verify the effectiveness of the proposed learning strategy and our code is available at https://github.com/fzp0424/MonotonicIQA.
    MAREO: Memory- and Attention- based visual REasOning. (arXiv:2206.04928v3 [cs.AI] UPDATED)
    Humans continue to outperform modern AI systems in their ability to parse and understand complex visual scenes flexibly. Attention and memory are two systems known to play a critical role in our ability to selectively maintain and manipulate behaviorally-relevant visual information to solve some of the most challenging visual reasoning tasks. Here, we present a novel architecture for visual reasoning inspired by the cognitive-science literature on visual reasoning, the Memory- and Attention-based (visual) REasOning (MAREO) architecture. MAREO instantiates an active-vision theory, which posits that the brain solves complex visual reasoning problems compositionally by learning to combine previously-learned elementary visual operations to form more complex visual routines. MAREO learns to solve visual reasoning tasks via sequences of attention shifts to route and maintain task-relevant visual information into a memory bank via a multi-head transformer module. Visual routines are then deployed by a dedicated reasoning module trained to judge various relations between objects in the scenes. Experiments on tasks containing complex visual relations (SVRT challenge) and same-different differentiation, relation match to sample, Raven's and Identity rules from ART challenge demonstrate MAREO's ability to learn visual routines in a robust and sample-efficient manner. We also show the zero-shot generalization on unseen tasks and the compositionality nature of the architecture.
    A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes. (arXiv:2209.10095v1 [cs.LG])
    In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary target of data discretization in classification is to improve the generalization performance. As a result, the data tend to be over-split into many small bins since the data without discretization retain the maximal discriminant information. Thus, we propose a Max-Dependency-Min-Divergence (MDmD) criterion that maximizes both the discriminant information and generalization ability of the discretized data. More specifically, the Max-Dependency criterion maximizes the statistical dependency between the discretized data and the classification variable while the Min-Divergence criterion explicitly minimizes the JS-divergence between the training data and the validation data for a given discretization scheme. The proposed MDmD criterion is technically appealing, but it is difficult to reliably estimate the high-order joint distributions of attributes and the classification variable. We hence further propose a more practical solution, Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute is discretized separately, by simultaneously maximizing the discriminant information and the generalization ability of the discretized data. The proposed MRmD is compared with the state-of-the-art discretization algorithms under the naive Bayes classification framework on 45 machine-learning benchmark datasets. It significantly outperforms all the compared methods on most of the datasets.
    On the convex formulations of robust Markov decision processes. (arXiv:2209.10187v1 [math.OC])
    Robust Markov decision processes (MDPs) are used for applications of dynamic optimization in uncertain environments and have been studied extensively. Many of the main properties and algorithms of MDPs, such as value iteration and policy iteration, extend directly to RMDPs. Surprisingly, there is no known analog of the MDP convex optimization formulation for solving RMDPs. This work describes the first convex optimization formulation of RMDPs under the classical sa-rectangularity and s-rectangularity assumptions. We derive a convex formulation with a linear number of variables and constraints but large coefficients in the constraints by using entropic regularization and exponential change of variables. Our formulation can be combined with efficient methods from convex optimization to obtain new algorithms for solving RMDPs with uncertain probabilities. We further simplify the formulation for RMDPs with polyhedral uncertainty sets. Our work opens a new research direction for RMDPs and can serve as a first step toward obtaining a tractable convex formulation of RMDPs.
    Efficient Calibration of Multi-Agent Simulation Models from Output Series with Bayesian Optimization. (arXiv:2112.03874v2 [q-fin.ST] UPDATED)
    Multi-agent simulation is commonly used across multiple disciplines, specifically in artificial intelligence in recent years, which creates an environment for downstream machine learning or reinforcement learning tasks. In many practical scenarios, however, only the output series that result from the interactions of simulation agents are observable. Therefore, simulators need to be calibrated so that the simulated output series resemble historical -- which amounts to solving a complex simulation optimization problem. In this paper, we propose a simple and efficient framework for calibrating simulator parameters from historical output series observations. First, we consider a novel concept of eligibility set to bypass the potential non-identifiability issue. Second, we generalize the two-sample Kolmogorov-Smirnov (K-S) test with Bonferroni correction to test the similarity between two high-dimensional distributions, which gives a simple yet effective distance metric between the output series sample sets. Third, we suggest using Bayesian optimization (BO) and trust-region BO (TuRBO) to minimize the aforementioned distance metric. Finally, we demonstrate the efficiency of our framework using numerical experiments both on a multi-agent financial market simulator.
    Tab2vox: CNN-Based Multivariate Multilevel Demand Forecasting Framework by Tabular-To-Voxel Image Conversion. (arXiv:2209.10516v1 [stat.ML])
    Since demand is influenced by a wide variety of causes, it is necessary to decompose the explana-tory variables into different levels, extract their relationships effectively, and reflect them in the forecast. In particular, this contextual information can be very useful in demand forecasting with large demand volatility or intermittent demand patterns. Convolutional neural networks (CNNs) have been successfully used in many fields where important information in data is represented by images. CNNs are powerful because they accept samples as images and use adjacent voxel sets to integrate multi-dimensional important information and learn important features. On the other hand, although the demand-forecasting model has been improved, the input data is still limited in its tabular form and is not suitable for CNN modeling. In this study, we propose a Tab2vox neural architecture search (NAS) model as a method to convert a high-dimensional tabular sam-ple into a well-formed 3D voxel image and use it in a 3D CNN network. For each image repre-sentation, the 3D CNN forecasting model proposed from the Tab2vox framework showed supe-rior performance, compared to the existing time series and machine learning techniques using tabular data, and the latest image transformation studies.
    FedFOR: Stateless Heterogeneous Federated Learning with First-Order Regularization. (arXiv:2209.10537v1 [cs.LG])
    Federated Learning (FL) seeks to distribute model training across local clients without collecting data in a centralized data-center, hence removing data-privacy concerns. A major challenge for FL is data heterogeneity (where each client's data distribution can differ) as it can lead to weight divergence among local clients and slow global convergence. The current SOTA FL methods designed for data heterogeneity typically impose regularization to limit the impact of non-IID data and are stateful algorithms, i.e., they maintain local statistics over time. While effective, these approaches can only be used for a special case of FL involving only a small number of reliable clients. For the more typical applications of FL where the number of clients is large (e.g., edge-device and mobile applications), these methods cannot be applied, motivating the need for a stateless approach to heterogeneous FL which can be used for any number of clients. We derive a first-order gradient regularization to penalize inconsistent local updates due to local data heterogeneity. Specifically, to mitigate weight divergence, we introduce a first-order approximation of the global data distribution into local objectives, which intuitively penalizes updates in the opposite direction of the global update. The end result is a stateless FL algorithm that achieves 1) significantly faster convergence (i.e., fewer communication rounds) and 2) higher overall converged performance than SOTA methods under non-IID data distribution. Importantly, our approach does not impose unrealistic limits on the client size, enabling learning from a large number of clients as is typical in most FL applications.
    Can Shadows Reveal Biometric Information?. (arXiv:2209.10077v1 [cs.CV])
    We study the problem of extracting biometric information of individuals by looking at shadows of objects cast on diffuse surfaces. We show that the biometric information leakage from shadows can be sufficient for reliable identity inference under representative scenarios via a maximum likelihood analysis. We then develop a learning-based method that demonstrates this phenomenon in real settings, exploiting the subtle cues in the shadows that are the source of the leakage without requiring any labeled real data. In particular, our approach relies on building synthetic scenes composed of 3D face models obtained from a single photograph of each identity. We transfer what we learn from the synthetic data to the real data using domain adaptation in a completely unsupervised way. Our model is able to generalize well to the real domain and is robust to several variations in the scenes. We report high classification accuracies in an identity classification task that takes place in a scene with unknown geometry and occluding objects.
    A Systematic Literature Review on Process-Aware Recommender Systems. (arXiv:2103.16654v2 [cs.IR] CROSS LISTED)
    Considering processes of a business in a recommender system is highly advantageous. Although most studies in the business process analysis domain are of descriptive and predictive nature, the feasibility of constructing a process-aware recommender system is assessed in a few works. One reason can be the lack of knowledge on process mining potential for recommendation problems. Therefore, this paper aims to identify and analyze the published studies on process-aware recommender system techniques in business process management and process mining domain. A systematic review was conducted on 33 academic articles published between 2008 and 2020 according to several aspects. In this regard, we provide a state-of-the-art review with critical details and researchers with a better perception of which path to pursue in this field. Moreover, based on a knowledge base and holistic perspective, we discuss some research gaps and open challenges in this field.
    tntorch: Tensor Network Learning with PyTorch. (arXiv:2206.11128v2 [cs.LG] UPDATED)
    We present tntorch, a tensor learning framework that supports multiple decompositions (including Candecomp/Parafac, Tucker, and Tensor Train) under a unified interface. With our library, the user can learn and handle low-rank tensors with automatic differentiation, seamless GPU support, and the convenience of PyTorch's API. Besides decomposition algorithms, tntorch implements differentiable tensor algebra, rank truncation, cross-approximation, batch processing, comprehensive tensor arithmetics, and more.
    Robust, High-Rate Trajectory Tracking on Insect-Scale Soft-Actuated Aerial Robots with Deep-Learned Tube MPC. (arXiv:2209.10007v1 [cs.RO])
    Accurate and agile trajectory tracking in sub-gram Micro Aerial Vehicles (MAVs) is challenging, as the small scale of the robot induces large model uncertainties, demanding robust feedback controllers, while the fast dynamics and computational constraints prevent the deployment of computationally expensive strategies. In this work, we present an approach for agile and computationally efficient trajectory tracking on the MIT SoftFly, a sub-gram MAV (0.7 grams). Our strategy employs a cascaded control scheme, where an adaptive attitude controller is combined with a neural network policy trained to imitate a trajectory tracking robust tube model predictive controller (RTMPC). The neural network policy is obtained using our recent work, which enables the policy to preserve the robustness of RTMPC, but at a fraction of its computational cost. We experimentally evaluate our approach, achieving position Root Mean Square Errors lower than 1.8 cm even in the more challenging maneuvers, obtaining a 60% reduction in maximum position error compared to our previous work, and demonstrating robustness to large external disturbances
    Protein language models trained on multiple sequence alignments learn phylogenetic relationships. (arXiv:2203.15465v2 [q-bio.BM] UPDATED)
    Self-supervised neural language models with attention have recently been applied to biological sequence data, advancing structure, function and mutational effect prediction. Some protein language models, including MSA Transformer and AlphaFold's EvoFormer, take multiple sequence alignments (MSAs) of evolutionarily related proteins as inputs. Simple combinations of MSA Transformer's row attentions have led to state-of-the-art unsupervised structural contact prediction. We demonstrate that similarly simple, and universal, combinations of MSA Transformer's column attentions strongly correlate with Hamming distances between sequences in MSAs. Therefore, MSA-based language models encode detailed phylogenetic relationships. We further show that these models can separate coevolutionary signals encoding functional and structural constraints from phylogenetic correlations reflecting historical contingency. To assess this, we generate synthetic MSAs, either without or with phylogeny, from Potts models trained on natural MSAs. We find that unsupervised contact prediction is substantially more resilient to phylogenetic noise when using MSA Transformer versus inferred Potts models.
    Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments. (arXiv:2205.07015v3 [cs.LG] UPDATED)
    Visualizing optimization landscapes has led to many fundamental insights in numeric optimization, and novel improvements to optimization techniques. However, visualizations of the objective that reinforcement learning optimizes (the "reward surface") have only ever been generated for a small number of narrow contexts. This work presents reward surfaces and related visualizations of 27 of the most widely used reinforcement learning environments in Gym for the first time. We also explore reward surfaces in the policy gradient direction and show for the first time that many popular reinforcement learning environments have frequent "cliffs" (sudden large drops in expected return). We demonstrate that A2C often "dives off" these cliffs into low reward regions of the parameter space while PPO avoids them, confirming a popular intuition for PPO's improved performance over previous methods. We additionally introduce a highly extensible library that allows researchers to easily generate these visualizations in the future. Our findings provide new intuition to explain the successes and failures of modern RL methods, and our visualizations concretely characterize several failure modes of reinforcement learning agents in novel ways.
    Data Augmentation for Deep Graph Learning: A Survey. (arXiv:2202.08235v2 [cs.LG] UPDATED)
    Graph neural networks, a powerful deep learning tool to model graph-structured data, have demonstrated remarkable performance on numerous graph learning tasks. To address the data noise and data scarcity issues in deep graph learning, the research on graph data augmentation has intensified lately. However, conventional data augmentation methods can hardly handle graph-structured data which is defined in non-Euclidean space with multi-modality. In this survey, we formally formulate the problem of graph data augmentation and further review the representative techniques and their applications in different deep graph learning problems. Specifically, we first propose a taxonomy for graph data augmentation techniques and then provide a structured review by categorizing the related work based on the augmented information modalities. Moreover, we summarize the applications of graph data augmentation in two representative problems in data-centric deep graph learning: (1) reliable graph learning which focuses on enhancing the utility of input graph as well as the model capacity via graph data augmentation; and (2) low-resource graph learning which targets on enlarging the labeled training data scale through graph data augmentation. For each problem, we also provide a hierarchical problem taxonomy and review the existing literature related to graph data augmentation. Finally, we point out promising research directions and the challenges in future research.
    EXIT: Extrapolation and Interpolation-based Neural Controlled Differential Equations for Time-series Classification and Forecasting. (arXiv:2204.08771v2 [cs.LG] UPDATED)
    Deep learning inspired by differential equations is a recent research trend and has marked the state of the art performance for many machine learning tasks. Among them, time-series modeling with neural controlled differential equations (NCDEs) is considered as a breakthrough. In many cases, NCDE-based models not only provide better accuracy than recurrent neural networks (RNNs) but also make it possible to process irregular time-series. In this work, we enhance NCDEs by redesigning their core part, i.e., generating a continuous path from a discrete time-series input. NCDEs typically use interpolation algorithms to convert discrete time-series samples to continuous paths. However, we propose to i) generate another latent continuous path using an encoder-decoder architecture, which corresponds to the interpolation process of NCDEs, i.e., our neural network-based interpolation vs. the existing explicit interpolation, and ii) exploit the generative characteristic of the decoder, i.e., extrapolation beyond the time domain of original data if needed. Therefore, our NCDE design can use both the interpolated and the extrapolated information for downstream machine learning tasks. In our experiments with 5 real-world datasets and 12 baselines, our extrapolation and interpolation-based NCDEs outperform existing baselines by non-trivial margins.
    Escaping the Impossibility of Fairness: From Formal to Substantive Algorithmic Fairness. (arXiv:2107.04642v7 [cs.CY] UPDATED)
    Efforts to promote equitable public policy with algorithms appear to be fundamentally constrained by the "impossibility of fairness" (an incompatibility between mathematical definitions of fairness). This technical limitation raises a central question about algorithmic fairness: How can computer scientists and policymakers support equitable policy reforms with algorithms? In this article, I argue that promoting justice with algorithms requires reforming the methodology of algorithmic fairness. First, I diagnose why the current methodology for algorithmic fairness--which I call "formal algorithmic fairness"--leads to the impossibility of fairness and to models that exacerbate oppression despite appearing "fair." I demonstrate that the problems of algorithmic fairness result from the field's methodology, which restricts analysis to isolated decision-making procedures. Second, I draw on theories of substantive equality from law and philosophy to propose an alternative methodology: "substantive algorithmic fairness." Because substantive algorithmic fairness takes a more expansive scope to fairness, it enables an escape from the impossibility of fairness and provides a rigorous guide for alleviating injustice with algorithms. In sum, substantive algorithmic fairness presents a new direction for algorithmic fairness: away from formal mathematical models of "fair" decision-making and toward substantive evaluations of how algorithms can (and cannot) promote justice.
    Approximating the full-field temperature evolution in 3D electronic systems from randomized "Minecraft" systems. (arXiv:2209.10369v1 [physics.comp-ph])
    Neural Networks as fast physics simulators have a large potential for many engineering design tasks. Prerequisites for a wide-spread application are an easy-to-use workflow for generating training datasets in a reasonable time, and the capability of the network to generalize to unseen systems. In contrast to most previous works where training systems are similar to the evaluation dataset, we propose to adapt the type of training system to the network architecture. Specifically, we apply a fully convolutional network and, thus, design 3D systems of randomly located voxels with randomly assigned physical properties. The idea is tested for the transient heat diffusion in electronic systems. Training only on random "Minecraft" systems, we obtain good generalization to electronic systems four times as large as the training systems (one-step prediction error of 0.07% vs 0.8%).
    Transfer Learning with Jukebox for Music Source Separation. (arXiv:2111.14200v3 [eess.AS] UPDATED)
    In this work, we demonstrate how a publicly available, pre-trained Jukebox model can be adapted for the problem of audio source separation from a single mixed audio channel. Our neural network architecture, which is using transfer learning, is quick to train and the results demonstrate performance comparable to other state-of-the-art approaches that require a lot more compute resources, training data, and time. We provide an open-source code implementation of our architecture (https://github.com/wzaielamri/unmix)
    ConvFormer: Closing the Gap Between CNN and Vision Transformers. (arXiv:2209.07738v1 [cs.CV] CROSS LISTED)
    Vision transformers have shown excellent performance in computer vision tasks. However, the computation cost of their (local) self-attention mechanism is expensive. Comparatively, CNN is more efficient with built-in inductive bias. Recent works show that CNN is promising to compete with vision transformers by learning their architecture design and training protocols. Nevertheless, existing methods either ignore multi-level features or lack dynamic prosperity, leading to sub-optimal performance. In this paper, we propose a novel attention mechanism named MCA, which captures different patterns of input images by multiple kernel sizes and enables input-adaptive weights with a gating mechanism. Based on MCA, we present a neural network named ConvFormer. ConvFormer adopts the general architecture of vision transformers, while replacing the (local) self-attention mechanism with our proposed MCA. Extensive experimental results demonstrated that ConvFormer outperforms similar size vision transformers(ViTs) and convolutional neural networks (CNNs) in various tasks. For example, ConvFormer-S, ConvFormer-L achieve state-of-the-art performance of 82.8%, 83.6% top-1 accuracy on ImageNet dataset. Moreover, ConvFormer-S outperforms Swin-T by 1.5 mIoU on ADE20K, and 0.9 bounding box AP on COCO with a smaller model size. Code and models will be available.
    Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance. (arXiv:2202.12387v4 [cs.LG] UPDATED)
    In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors. We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. From the optimization perspective, we explain why existing methods such as SimCLR require a large batch size in order to achieve a satisfactory result. In order to remove such requirement, we propose a memory-efficient Stochastic Optimization algorithm for solving the Global objective of Contrastive Learning of Representations, named SogCLR. We show that its optimization error is negligible under a reasonable condition after a sufficient number of iterations or is diminishing for a slightly different global contrastive objective. Empirically, we demonstrate that SogCLR with small batch size (e.g., 256) can achieve similar performance as SimCLR with large batch size (e.g., 8192) on self-supervised learning task on ImageNet-1K. We also attempt to show that the proposed optimization technique is generic and can be applied to solving other contrastive losses, e.g., two-way contrastive losses for bimodal contrastive learning. The proposed method is implemented in our open-sourced library LibAUC (www.libauc.org).
    An NWDAF Approach to 5G Core Network Signaling Traffic: Analysis and Characterization. (arXiv:2209.10428v1 [cs.NI])
    Data-driven approaches and paradigms have become promising solutions to efficient network performances through optimization. These approaches focus on state-of-the-art machine learning techniques that can address the needs of 5G networks and the networks of tomorrow, such as proactive load balancing. In contrast to model-based approaches, data-driven approaches do not need accurate models to tackle the target problem, and their associated architectures provide a flexibility of available system parameters that improve the feasibility of learning-based algorithms in mobile wireless networks. The work presented in this paper focuses on demonstrating a working system prototype of the 5G Core (5GC) network and the Network Data Analytics Function (NWDAF) used to bring the benefits of data-driven techniques to fruition. Analyses of the network-generated data explore core intra-network interactions through unsupervised learning, clustering, and evaluate these results as insights for future opportunities and works.
    Reconstructing spectral functions via automatic differentiation. (arXiv:2111.14760v3 [hep-ph] UPDATED)
    Reconstructing spectral functions from Euclidean Green's functions is an important inverse problem in many-body physics. However, the inversion is proved to be ill-posed in the realistic systems with noisy Green's functions. In this Letter, we propose an automatic differentiation(AD) framework as a generic tool for the spectral reconstruction from propagator observable. Exploiting the neural networks' regularization as a non-local smoothness regulator of the spectral function, we represent spectral functions by neural networks and use the propagator's reconstruction error to optimize the network parameters unsupervisedly. In the training process, except for the positive-definite form for the spectral function, there are no other explicit physical priors embedded into the neural networks. The reconstruction performance is assessed through relative entropy and mean square error for two different network representations. Compared to the maximum entropy method, the AD framework achieves better performance in the large-noise situation. It is noted that the freedom of introducing non-local regularization is an inherent advantage of the present framework and may lead to substantial improvements in solving inverse problems.
    A Human-Centric Take on Model Monitoring. (arXiv:2206.02868v2 [cs.LG] UPDATED)
    Predictive models are increasingly used to make various consequential decisions in high-stakes domains such as healthcare, finance, and policy. It becomes critical to ensure that these models make accurate predictions, are robust to shifts in the data, do not rely on spurious features, and do not unduly discriminate against minority groups. To this end, several approaches spanning various areas such as explainability, fairness, and robustness have been proposed in recent literature. Such approaches need to be human-centered as they cater to the understanding of the models to their users. However, there is a research gap in understanding the human-centric needs and challenges of monitoring machine learning (ML) models once they are deployed. To fill this gap, we conducted an interview study with 13 practitioners who have experience at the intersection of deploying ML models and engaging with customers spanning domains such as financial services, healthcare, hiring, online retail, computational advertising, and conversational assistants. We identified various human-centric challenges and requirements for model monitoring in real-world applications. Specifically, we found the need and the challenge for the model monitoring systems to clarify the impact of the monitoring observations on outcomes. Further, such insights must be actionable, robust, customizable for domain-specific use cases, and cognitively considerate to avoid information overload.
    Universum GANs: Improving GANs through contradictions. (arXiv:2106.09946v2 [cs.LG] UPDATED)
    Limited availability of labeled-data makes any supervised learning problem challenging. Alternative learning settings like semi-supervised and universum learning alleviate the dependency on labeled data, but still require a large amount of unlabeled data, which may be unavailable or expensive to acquire. GAN-based data generation methods have recently shown promise by generating synthetic samples to improve learning. However, most existing GAN based approaches either provide poor discriminator performance under limited labeled data settings; or results in low quality generated data. In this paper, we propose a Universum GAN game which provides improved discriminator accuracy under limited data settings, while generating high quality realistic data. We further propose an evolving discriminator loss which improves its convergence and generalization performance. We derive the theoretical guarantees and provide empirical results in support of our approach.
    Sample, Crop, Track: Self-Supervised Mobile 3D Object Detection for Urban Driving LiDAR. (arXiv:2209.10471v1 [cs.CV])
    Deep learning has led to great progress in the detection of mobile (i.e. movement-capable) objects in urban driving scenes in recent years. Supervised approaches typically require the annotation of large training sets; there has thus been great interest in leveraging weakly, semi- or self-supervised methods to avoid this, with much success. Whilst weakly and semi-supervised methods require some annotation, self-supervised methods have used cues such as motion to relieve the need for annotation altogether. However, a complete absence of annotation typically degrades their performance, and ambiguities that arise during motion grouping can inhibit their ability to find accurate object boundaries. In this paper, we propose a new self-supervised mobile object detection approach called SCT. This uses both motion cues and expected object sizes to improve detection performance, and predicts a dense grid of 3D oriented bounding boxes to improve object discovery. We significantly outperform the state-of-the-art self-supervised mobile object detection method TCR on the KITTI tracking benchmark, and achieve performance that is within 30% of the fully supervised PV-RCNN++ method for IoUs <= 0.5.
    Multi-trial Neural Architecture Search with Lottery Tickets. (arXiv:2203.04300v2 [cs.LG] UPDATED)
    In this paper, we propose MENAS, an efficient multi-trial evolution-based NAS method with less human intervention. Specifically, we propose an enlarged search space (MobileNet3-MT) for ImageNet-1K and improve the search efficiency from two aspects. First, MENAS jointly explores architectures and optimal pruned candidates (Lottery Tickets), gradually slimming the average model in populations. Each model is trained with an early stop and replaced by its Lottery Tickets, instead of first searching for a cumbersome network then conducting pruning. Second, we introduce individual weight sharing, which is dedicated to multi-trial NAS, aiming to amortize the training costs by sharing weights between parents and child networks. Compared with weight sharing in supernet, individual weight sharing attains more reliable rank consistency, meanwhile is easy to implement by preventing the sophisticated supernet training. Moreover, to regularize the evolutionary process from trapped in small models, we preserve a small ratio of the largest models when formulating parent populations, which is proved beneficial to enhance model performance. Extensive experiment results demonstrate the superiority of MENAS. On the ImageNet-1K database, MENAS achieves 80.5% top-1 accuracy without involving knowledge distillation or larger image resolution. Code and models will be available.
    Amortized Projection Optimization for Sliced Wasserstein Generative Models. (arXiv:2203.13417v2 [stat.ML] UPDATED)
    Seeking informative projecting directions has been an important task in utilizing sliced Wasserstein distance in applications. However, finding these directions usually requires an iterative optimization procedure over the space of projecting directions, which is computationally expensive. Moreover, the computational issue is even more severe in deep learning applications, where computing the distance between two mini-batch probability measures is repeated several times. This nested loop has been one of the main challenges that prevent the usage of sliced Wasserstein distances based on good projections in practice. To address this challenge, we propose to utilize the learning-to-optimize technique or amortized optimization to predict the informative direction of any given two mini-batch probability measures. To the best of our knowledge, this is the first work that bridges amortized optimization and sliced Wasserstein generative models. In particular, we derive linear amortized models, generalized linear amortized models, and non-linear amortized models which are corresponding to three types of novel mini-batch losses, named amortized sliced Wasserstein. We demonstrate the favorable performance of the proposed sliced losses in deep generative modeling on standard benchmark datasets.
    FLAME: Federated Learning Across Multi-device Environments. (arXiv:2202.08922v2 [cs.LG] UPDATED)
    Federated Learning (FL) enables distributed training of machine learning models while keeping personal data on user devices private. While we witness increasing applications of FL in the area of mobile sensing, such as human activity recognition (HAR), FL has not been studied in the context of a multi-device environment (MDE), wherein each user owns multiple data-producing devices. With the proliferation of mobile and wearable devices, MDEs are increasingly becoming popular in ubicomp settings, therefore necessitating the study of FL in them. FL in MDEs is characterized by being not independent and identically distributed (non-IID) across clients, complicated by the presence of both user and device heterogeneities. Further, ensuring efficient utilization of system resources on FL clients in a MDE remains an important challenge. In this paper, we propose FLAME, a user-centered FL training approach to counter statistical and system heterogeneity in MDEs, and bring consistency in inference performance across devices. FLAME features (i) user-centered FL training utilizing the time alignment across devices from the same user; (ii) accuracy- and efficiency-aware device selection; and (iii) model personalization to devices. We also present an FL evaluation testbed with realistic energy drain and network bandwidth profiles, and a novel class-based data partitioning scheme to extend existing HAR datasets to a federated setup. Our experiment results on three multi-device HAR datasets show that FLAME outperforms various baselines by 4.3-25.8% higher F1 score, 1.02-2.86x greater energy efficiency, and up to 2.06x speedup in convergence to target accuracy through fair distribution of the FL workload.
    Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces. (arXiv:2209.10526v1 [cs.LG])
    Clustered federated learning (FL) has been shown to produce promising results by grouping clients into clusters. This is especially effective in scenarios where separate groups of clients have significant differences in the distributions of their local data. Existing clustered FL algorithms are essentially trying to group together clients with similar distributions so that clients in the same cluster can leverage each other's data to better perform federated learning. However, prior clustered FL algorithms attempt to learn these distribution similarities indirectly during training, which can be quite time consuming as many rounds of federated learning may be required until the formation of clusters is stabilized. In this paper, we propose a new approach to federated learning that directly aims to efficiently identify distribution similarities among clients by analyzing the principal angles between the client data subspaces. Each client applies a truncated singular value decomposition (SVD) step on its local data in a single-shot manner to derive a small set of principal vectors, which provides a signature that succinctly captures the main characteristics of the underlying distribution. This small set of principal vectors is provided to the server so that the server can directly identify distribution similarities among the clients to form clusters. This is achieved by comparing the similarities of the principal angles between the client data subspaces spanned by those principal vectors. The approach provides a simple, yet effective clustered FL framework that addresses a broad range of data heterogeneity issues beyond simpler forms of Non-IIDness like label skews. Our clustered FL approach also enables convergence guarantees for non-convex objectives. Our code is available at https://github.com/MMorafah/PACFL.
    Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution. (arXiv:2204.01188v2 [cs.CV] UPDATED)
    The conventional sliced Wasserstein is defined between two probability measures that have realizations as vectors. When comparing two probability measures over images, practitioners first need to vectorize images and then project them to one-dimensional space by using matrix multiplication between the sample matrix and the projection matrix. After that, the sliced Wasserstein is evaluated by averaging the two corresponding one-dimensional projected probability measures. However, this approach has two limitations. The first limitation is that the spatial structure of images is not captured efficiently by the vectorization step; therefore, the later slicing process becomes harder to gather the discrepancy information. The second limitation is memory inefficiency since each slicing direction is a vector that has the same dimension as the images. To address these limitations, we propose novel slicing methods for sliced Wasserstein between probability measures over images that are based on the convolution operators. We derive convolution sliced Wasserstein (CSW) and its variants via incorporating stride, dilation, and non-linear activation function into the convolution operators. We investigate the metricity of CSW as well as its sample complexity, its computational complexity, and its connection to conventional sliced Wasserstein distances. Finally, we demonstrate the favorable performance of CSW over the conventional sliced Wasserstein in comparing probability measures over images and in training deep generative modeling on images.
    DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret. (arXiv:2005.02791v3 [stat.ML] UPDATED)
    Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage treatment plans that adapt treatment decisions both to an individual's initial features and to intermediate outcomes and features at each subsequent stage, which are affected by decisions in prior stages. Examples include personalized first- and second-line treatments of chronic conditions like diabetes, cancer, and depression, which adapt to patient response to first-line treatment, disease progression, and individual characteristics. While existing literature mostly focuses on estimating the optimal DTR from offline data such as from sequentially randomized trials, we study the problem of developing the optimal DTR in an online manner, where the interaction with each individual affect both our cumulative reward and our data collection for future learning. We term this the DTR bandit problem. We propose a novel algorithm that, by carefully balancing exploration and exploitation, is guaranteed to achieve rate-optimal regret when the transition and reward models are linear. We demonstrate our algorithm and its benefits both in synthetic experiments and in a case study of adaptive treatment of major depressive disorder using real-world data.
    A Simple Self-Supervised ECG Representation Learning Method via Manipulated Temporal-Spatial Reverse Detection. (arXiv:2202.12458v2 [cs.LG] UPDATED)
    Learning representations from electrocardiogram (ECG) signals can serve as a fundamental step for different machine learning-based ECG tasks. In order to extract general ECG representations that can be adapted to various downstream tasks, the learning process needs to be based on a general ECG-related task which can be achieved through self-supervised learning (SSL). However, existing SSL approaches either fail to provide satisfactory ECG representations or require too much effort to construct the learning data. In this paper, we propose the T-S reverse detection, a simple yet effective self-supervised approach to learn ECG representations. Inspired by the temporal and spatial characteristics of ECG signals, we flip the original signals horizontally (temporal reverse), vertically (spatial reverse), and both horizontally and vertically (temporal-spatial reverse). Learning is then done by classifying four types of signals including the original one. To verify the effectiveness of the proposed method, we perform a downstream task to detect atrial fibrillation (AF) which is one of the most common ECG tasks. The results show that the ECG representations learned with our method achieve remarkable performance. Furthermore, after exploring the representation feature space and investigating salient ECG locations, we conclude that the temporal reverse is more effective for learning ECG representations than the spatial reverse.
    EXACT: How to Train Your Accuracy. (arXiv:2205.09615v3 [cs.LG] UPDATED)
    Classification tasks are usually evaluated in terms of accuracy. However, accuracy is discontinuous and cannot be directly optimized using gradient ascent. Popular methods minimize cross-entropy, hinge loss, or other surrogate losses, which can lead to suboptimal results. In this paper, we propose a new optimization framework by introducing stochasticity to a model's output and optimizing expected accuracy, i.e. accuracy of the stochastic model. Extensive experiments on linear models and deep image classification show that the proposed optimization method is a powerful alternative to widely used classification losses.
    Cross Project Software Vulnerability Detection via Domain Adaptation and Max-Margin Principle. (arXiv:2209.10406v1 [cs.CR])
    Software vulnerabilities (SVs) have become a common, serious and crucial concern due to the ubiquity of computer software. Many machine learning-based approaches have been proposed to solve the software vulnerability detection (SVD) problem. However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SVD, and ii) tackling the scarcity of labeled vulnerabilities datasets that conventionally need laborious labeling effort by experts. In this paper, we propose a novel end-to-end approach to tackle these two crucial issues. We first exploit the automatic representation learning with deep domain adaptation for software vulnerability detection. We then propose a novel cross-domain kernel classifier leveraging the max-margin principle to significantly improve the transfer learning process of software vulnerabilities from labeled projects into unlabeled ones. The experimental results on real-world software datasets show the superiority of our proposed method over state-of-the-art baselines. In short, our method obtains a higher performance on F1-measure, the most important measure in SVD, from 1.83% to 6.25% compared to the second highest method in the used datasets. Our released source code samples are publicly available at https://github.com/vannguyennd/dam2p
    Approximate sampling and estimation of partition functions using neural networks. (arXiv:2209.10423v1 [cs.LG])
    We consider the closely related problems of sampling from a distribution known up to a normalizing constant, and estimating said normalizing constant. We show how variational autoencoders (VAEs) can be applied to this task. In their standard applications, VAEs are trained to fit data drawn from an intractable distribution. We invert the logic and train the VAE to fit a simple and tractable distribution, on the assumption of a complex and intractable latent distribution, specified up to normalization. This procedure constructs approximations without the use of training data or Markov chain Monte Carlo sampling. We illustrate our method on three examples: the Ising model, graph clustering, and ranking.
    Fairness Reprogramming. (arXiv:2209.10222v1 [cs.LG])
    Despite a surge of recent advances in promoting machine Learning (ML) fairness, the existing mainstream approaches mostly require training or finetuning the entire weights of the neural network to meet the fairness criteria. However, this is often infeasible in practice for those large-scale trained models due to large computational and storage costs, low data efficiency, and model privacy issues. In this paper, we propose a new generic fairness learning paradigm, called FairReprogram, which incorporates the model reprogramming technique. Specifically, FairReprogram considers the neural model fixed, and instead appends to the input a set of perturbations, called the fairness trigger, which is tuned towards the fairness criteria under a min-max formulation. We further introduce an information-theoretic framework that explains why and under what conditions fairness goals can be achieved using the fairness trigger. We show both theoretically and empirically that the fairness trigger can effectively obscure demographic biases in the output prediction of fixed ML models by providing false demographic information that hinders the model from utilizing the correct demographic information to make the prediction. Extensive experiments on both NLP and CV datasets demonstrate that our method can achieve better fairness improvements than retraining-based methods with far less training cost and data dependency under two widely-used fairness criteria.
    Fast Few shot Self-attentive Semi-supervised Political Inclination Prediction. (arXiv:2209.10292v1 [cs.CY])
    With the rising participation of the common mass in social media, it is increasingly common now for policymakers/journalists to create online polls on social media to understand the political leanings of people in specific locations. The caveat here is that only influential people can make such an online polling and reach out at a mass scale. Further, in such cases, the distribution of voters is not controllable and may be, in fact, biased. On the other hand,if we can interpret the publicly available data over social media to probe the political inclination of users, we will be able to have controllable insights about the survey population, keep the cost of survey low and also collect publicly available data without involving the concerned persons. Hence we introduce a self-attentive semi-supervised framework for political inclination detection to further that objective. The advantage of our model is that it neither needs huge training data nor does it need to store social network parameters. Nevertheless, it achieves an accuracy of 93.7\% with no annotated data; further, with only a few annotated examples per class it achieves competitive performance. We found that the model is highly efficient even in resource-constrained settings, and insights drawn from its predictions match the manual survey outcomes when applied to diverse real-life scenarios.
    LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning. (arXiv:2209.10341v1 [cs.LG])
    LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification with maximal probability. LCRL leverages partially deterministic finite-state machines known as Limit Deterministic Buchi Automata (LDBA) to express a given linear temporal specification. A reward function for the RL algorithm is shaped on-the-fly, based on the structure of the LDBA. Theoretical guarantees under proper assumptions ensure the convergence of the RL algorithm to an optimal policy that maximises the satisfaction probability. We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL. Owing to the LDBA-guided exploration and LCRL model-free architecture, we observe robust performance, which also scales well when compared to standard RL approaches (whenever applicable to LTL specifications). Full instructions on how to execute all the case studies in this paper are provided on a GitHub page that accompanies the LCRL distribution www.github.com/grockious/lcrl.
    Recurrent Super-Resolution Method for Enhancing Low Quality Thermal Facial Data. (arXiv:2209.10489v1 [cs.CV])
    The process of obtaining high-resolution images from single or multiple low-resolution images of the same scene is of great interest for real-world image and signal processing applications. This study is about exploring the potential usage of deep learning based image super-resolution algorithms on thermal data for producing high quality thermal imaging results for in-cabin vehicular driver monitoring systems. In this work we have proposed and developed a novel multi-image super-resolution recurrent neural network to enhance the resolution and improve the quality of low-resolution thermal imaging data captured from uncooled thermal cameras. The end-to-end fully convolutional neural network is trained from scratch on newly acquired thermal data of 30 different subjects in indoor environmental conditions. The effectiveness of the thermally tuned super-resolution network is validated quantitatively as well as qualitatively on test data of 6 distinct subjects. The network was able to achieve a mean peak signal to noise ratio of 39.24 on the validation dataset for 4x super-resolution, outperforming bicubic interpolation both quantitatively and qualitatively.
    Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos. (arXiv:2209.10126v1 [cs.CV])
    During recent years transformers architectures have been growing in popularity. Modulated Detection Transformer (MDETR) is an end-to-end multi-modal understanding model that performs tasks such as phase grounding, referring expression comprehension, referring expression segmentation, and visual question answering. One remarkable aspect of the model is the capacity to infer over classes that it was not previously trained for. In this work we explore the use of MDETR in a new task, action detection, without any previous training. We obtain quantitative results using the Atomic Visual Actions dataset. Although the model does not report the best performance in the task, we believe that it is an interesting finding. We show that it is possible to use a multi-modal model to tackle a task that it was not designed for. Finally, we believe that this line of research may lead into the generalization of MDETR in additional downstream tasks.
    Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees. (arXiv:2209.10492v1 [cs.CL])
    Current abstractive summarization models either suffer from a lack of clear interpretability or provide incomplete rationales by only highlighting parts of the source document. To this end, we propose the Summarization Program (SP), an interpretable modular framework consisting of an (ordered) list of binary trees, each encoding the step-by-step generative process of an abstractive summary sentence from the source document. A Summarization Program contains one root node per summary sentence, and a distinct tree connects each summary sentence (root node) to the document sentences (leaf nodes) from which it is derived, with the connecting nodes containing intermediate generated sentences. Edges represent different modular operations involved in summarization such as sentence fusion, compression, and paraphrasing. We first propose an efficient best-first search method over neural modules, SP-Search that identifies SPs for human summaries by directly optimizing for ROUGE scores. Next, using these programs as automatic supervision, we propose seq2seq models that generate Summarization Programs, which are then executed to obtain final summaries. We demonstrate that SP-Search effectively represents the generative process behind human summaries using modules that are typically faithful to their intended behavior. We also conduct a simulation study to show that Summarization Programs improve the interpretability of summarization models by allowing humans to better simulate model reasoning. Summarization Programs constitute a promising step toward interpretable and modular abstractive summarization, a complex task previously addressed primarily through blackbox end-to-end neural systems. Our code is available at https://github.com/swarnaHub/SummarizationPrograms
    Estimating Potential Outcome Distributions with Collaborating Causal Networks. (arXiv:2110.01664v3 [stat.ML] UPDATED)
    Traditional causal inference approaches leverage observational study data to estimate the difference in observed and unobserved outcomes for a potential treatment, known as the Conditional Average Treatment Effect (CATE). However, CATE corresponds to the comparison on the first moment alone, and as such may be insufficient in reflecting the full picture of treatment effects. As an alternative, estimating the full potential outcome distributions could provide greater insights. However, existing methods for estimating treatment effect potential outcome distributions often impose restrictive or simplistic assumptions about these distributions. Here, we propose Collaborating Causal Networks (CCN), a novel methodology which goes beyond the estimation of CATE alone by learning the full potential outcome distributions. Estimation of outcome distributions via the CCN framework does not require restrictive assumptions of the underlying data generating process. Additionally, CCN facilitates estimation of the utility of each possible treatment and permits individual-specific variation through utility functions. CCN not only extends outcome estimation beyond traditional risk difference, but also enables a more comprehensive decision-making process through definition of flexible comparisons. Under assumptions commonly made in the causal literature, we show that CCN learns distributions that asymptotically capture the true potential outcome distributions. Furthermore, we propose an adjustment approach that is empirically effective in alleviating sample imbalance between treatment groups in observational data. Finally, we evaluate the performance of CCN in multiple synthetic and semi-synthetic experiments. We demonstrate that CCN learns improved distribution estimates compared to existing Bayesian and deep generative methods as well as improved decisions with respects to a variety of utility functions.
    A Comprehensive Survey on Trustworthy Recommender Systems. (arXiv:2209.10117v1 [cs.IR])
    As one of the most successful AI-powered applications, recommender systems aim to help people make appropriate decisions in an effective and efficient way, by providing personalized suggestions in many aspects of our lives, especially for various human-oriented online services such as e-commerce platforms and social media sites. In the past few decades, the rapid developments of recommender systems have significantly benefited human by creating economic value, saving time and effort, and promoting social good. However, recent studies have found that data-driven recommender systems can pose serious threats to users and society, such as spreading fake news to manipulate public opinion in social media sites, amplifying unfairness toward under-represented groups or individuals in job matching services, or inferring privacy information from recommendation results. Therefore, systems' trustworthiness has been attracting increasing attention from various aspects for mitigating negative impacts caused by recommender systems, so as to enhance the public's trust towards recommender systems techniques. In this survey, we provide a comprehensive overview of Trustworthy Recommender systems (TRec) with a specific focus on six of the most important aspects; namely, Safety & Robustness, Nondiscrimination & Fairness, Explainability, Privacy, Environmental Well-being, and Accountability & Auditability. For each aspect, we summarize the recent related technologies and discuss potential research directions to help achieve trustworthy recommender systems in the future.
    Learning Hierarchical Metrical Structure Beyond Measures. (arXiv:2209.10259v1 [cs.SD])
    Music contains hierarchical structures beyond beats and measures. While hierarchical structure annotations are helpful for music information retrieval and computer musicology, such annotations are scarce in current digital music databases. In this paper, we explore a data-driven approach to automatically extract hierarchical metrical structures from scores. We propose a new model with a Temporal Convolutional Network-Conditional Random Field (TCN-CRF) architecture. Given a symbolic music score, our model takes in an arbitrary number of voices in a beat-quantized form, and predicts a 4-level hierarchical metrical structure from downbeat-level to section-level. We also annotate a dataset using RWC-POP MIDI files to facilitate training and evaluation. We show by experiments that the proposed method performs better than the rule-based approach under different orchestration settings. We also perform some simple musicological analysis on the model predictions. All demos, datasets and pre-trained models are publicly available on Github.
    GP-net: Grasp Proposal for Mobile Manipulators. (arXiv:2209.10404v1 [cs.RO])
    We present the Grasp Proposal Network (GP-net), a Convolutional Neural Network model which can generate 6-DOF grasps for mobile manipulators. To train GP-net, we synthetically generate a dataset containing depth-images and ground-truth grasp information for more than 1400 objects. In real-world experiments we use the EGAD! grasping benchmark to evaluate GP-net against two commonly used algorithms, the Volumetric Grasping Network (VGN) and the Grasp Pose Detection package (GPD), on a PAL TIAGo mobile manipulator. GP-net achieves grasp success rates of 82.2% compared to 57.8% for VGN and 63.3% with GPD. In contrast to the state-of-the-art methods in robotic grasping, GP-net can be used out-of-the-box for grasping objects with mobile manipulators without limiting the workspace, requiring table segmentation or needing a high-end GPU. To encourage the usage of GP-net, we provide a ROS package along with our code and pre-trained models at https://aucoroboticsmu.github.io/GP-net/.
    Reconstructing Robot Operations via Radio-Frequency Side-Channel. (arXiv:2209.10179v1 [cs.CR])
    Connected teleoperated robotic systems play a key role in ensuring operational workflows are carried out with high levels of accuracy and low margins of error. In recent years, a variety of attacks have been proposed that actively target the robot itself from the cyber domain. However, little attention has been paid to the capabilities of a passive attacker. In this work, we investigate whether an insider adversary can accurately fingerprint robot movements and operational warehousing workflows via the radio frequency side channel in a stealthy manner. Using an SVM for classification, we found that an adversary can fingerprint individual robot movements with at least 96% accuracy, increasing to near perfect accuracy when reconstructing entire warehousing workflows.
    Power of Explanations: Towards automatic debiasing in hate speech detection. (arXiv:2209.09975v1 [cs.CL])
    Hate speech detection is a common downstream application of natural language processing (NLP) in the real world. In spite of the increasing accuracy, current data-driven approaches could easily learn biases from the imbalanced data distributions originating from humans. The deployment of biased models could further enhance the existing social biases. But unlike handling tabular data, defining and mitigating biases in text classifiers, which deal with unstructured data, are more challenging. A popular solution for improving machine learning fairness in NLP is to conduct the debiasing process with a list of potentially discriminated words given by human annotators. In addition to suffering from the risks of overlooking the biased terms, exhaustively identifying bias with human annotators are unsustainable since discrimination is variable among different datasets and may evolve over time. To this end, we propose an automatic misuse detector (MiD) relying on an explanation method for detecting potential bias. And built upon that, an end-to-end debiasing framework with the proposed staged correction is designed for text classifiers without any external resources required.
    Performance Optimization for Variable Bitwidth Federated Learning in Wireless Networks. (arXiv:2209.10200v1 [cs.LG])
    This paper considers improving wireless communication and computation efficiency in federated learning (FL) via model quantization. In the proposed bitwidth FL scheme, edge devices train and transmit quantized versions of their local FL model parameters to a coordinating server, which, in turn, aggregates them into a quantized global model and synchronizes the devices. The goal is to jointly determine the bitwidths employed for local FL model quantization and the set of devices participating in FL training at each iteration. This problem is posed as an optimization problem whose goal is to minimize the training loss of quantized FL under a per-iteration device sampling budget and delay requirement. To derive the solution, an analytical characterization is performed in order to show how the limited wireless resources and induced quantization errors affect the performance of the proposed FL method. The analytical results show that the improvement of FL training loss between two consecutive iterations depends on the device selection and quantization scheme as well as on several parameters inherent to the model being learned. Given linear regression-based estimates of these model properties, it is shown that the FL training process can be described as a Markov decision process (MDP), and, then, a model-based reinforcement learning (RL) method is proposed to optimize action selection over iterations. Compared to model-free RL, this model-based RL approach leverages the derived mathematical characterization of the FL training process to discover an effective device selection and quantization scheme without imposing additional device communication overhead. Simulation results show that the proposed FL algorithm can reduce 29% and 63% convergence time compared to a model free RL method and the standard FL method, respectively.
    Deep Double Descent via Smooth Interpolation. (arXiv:2209.10080v1 [cs.LG])
    Overparameterized deep networks are known to be able to perfectly fit the training data while at the same time showing good generalization performance. A common paradigm drawn from intuition on linear regression suggests that large networks are able to interpolate even noisy data, without considerably deviating from the ground-truth signal. At present, a precise characterization of this phenomenon is missing. In this work, we present an empirical study of sharpness of the loss landscape of deep networks as we systematically control the number of model parameters and training epochs. We extend our study to neighbourhoods of the training data, as well as around cleanly- and noisily-labelled samples. Our findings show that the loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large models express a smooth and flat loss landscape, in contrast with existing intuition.
    Variational Transformer: A Framework Beyond the Trade-off between Accuracy and Diversity for Image Captioning. (arXiv:2205.14458v2 [cs.CV] UPDATED)
    Accuracy and Diversity are two essential metrizable manifestations in generating natural and semantically correct captions. Many efforts have been made to enhance one of them with another decayed due to the trade-off gap. In this work, we will show that the inferior standard of accuracy draws from human annotations (leave-one-out) are not appropriate for machine-generated captions. To improve diversity with a solid accuracy performance, we exploited a novel Variational Transformer framework. By introducing the "Invisible Information Prior" and the "Auto-selectable GMM", we instruct the encoder to learn the precise language information and object relation in different scenes for accuracy assurance. By introducing the "Range-Median Reward" baseline, we retain more diverse candidates with higher rewards during the RL-based training process for diversity assurance. Experiments show that our method achieves the simultaneous promotion of accuracy (CIDEr) and diversity (self-CIDEr), up to 1.1 and 4.8 percent. Also, our method got the most similar performance of the semantic retrieval compared to human annotations, with 50.3 (50.6 of human) for R@1(i2t).
    A data-centric approach to anomaly detection in layer-based additive manufacturing. (arXiv:2209.10178v1 [cs.LG])
    Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space. Industrial processes are a domain where predicitve models are needed for finding anomalous data instances for quality enhancement. A main challenge, however, is absence of labels in this environment. This paper contributes to a data-centric way of approaching artificial intelligence in industrial production. With a use case from additive manufacturing for automotive components we present a deep-learning-based image processing pipeline. We integrate the concept of domain randomisation and synthetic data in the loop that shows promising results for bridging advances in deep learning and its application to real-world, industrial production processes.
    Distributed Online Non-convex Optimization with Composite Regret. (arXiv:2209.10105v1 [cs.LG])
    Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on developing approaches for (either strongly or non-strongly) convex losses, and very few results have been obtained regarding regret bounds in distributed online optimization for general non-convex losses. To address these two issues, we propose a novel composite regret with a new network regret-based metric to evaluate distributed online optimization algorithms. We concretely define static and dynamic forms of the composite regret. By leveraging the dynamic form of our composite regret, we develop a consensus-based online normalized gradient (CONGD) approach for pseudo-convex losses, and it provably shows a sublinear behavior relating to a regularity term for the path variation of the optimizer. For general non-convex losses, we first shed light on the regret for the setting of distributed online non-convex learning based on recent advances such that no deterministic algorithm can achieve the sublinear regret. We then develop the distributed online non-convex optimization with composite regret (DINOCO) without access to the gradients, depending on an offline optimization oracle. DINOCO is shown to achieve sublinear regret; to our knowledge, this is the first regret bound for general distributed online non-convex learning.
    Learning Acceptance Regions for Many Classes with Anomaly Detection. (arXiv:2209.09963v1 [stat.ML])
    Set-valued classification, a new classification paradigm that aims to identify all the plausible classes that an observation belongs to, can be obtained by learning the acceptance regions for all classes. Many existing set-valued classification methods do not consider the possibility that a new class that never appeared in the training data appears in the test data. Moreover, they are computationally expensive when the number of classes is large. We propose a Generalized Prediction Set (GPS) approach to estimate the acceptance regions while considering the possibility of a new class in the test data. The proposed classifier minimizes the expected size of the prediction set while guaranteeing that the class-specific accuracy is at least a pre-specified value. Unlike previous methods, the proposed method achieves a good balance between accuracy, efficiency, and anomaly detection rate. Moreover, our method can be applied in parallel to all the classes to alleviate the computational burden. Both theoretical analysis and numerical experiments are conducted to illustrate the effectiveness of the proposed method.
    An Information-Theoretic and Contrastive Learning-based Approach for Identifying Code Statements Causing Software Vulnerability. (arXiv:2209.10414v1 [cs.CR])
    Software vulnerabilities existing in a program or function of computer systems are a serious and crucial concern. Typically, in a program or function consisting of hundreds or thousands of source code statements, there are only few statements causing the corresponding vulnerabilities. Vulnerability labeling is currently done on a function or program level by experts with the assistance of machine learning tools. Extending this approach to the code statement level is much more costly and time-consuming and remains an open problem. In this paper we propose a novel end-to-end deep learning-based approach to identify the vulnerability-relevant code statements of a specific function. Inspired by the specific structures observed in real world vulnerable code, we first leverage mutual information for learning a set of latent variables representing the relevance of the source code statements to the corresponding function's vulnerability. We then propose novel clustered spatial contrastive learning in order to further improve the representation learning and the robust selection process of vulnerability-relevant code statements. Experimental results on real-world datasets of 200k+ C/C++ functions show the superiority of our method over other state-of-the-art baselines. In general, our method obtains a higher performance in VCP, VCA, and Top-10 ACC measures of between 3\% to 14\% over the baselines when running on real-world datasets in an unsupervised setting. Our released source code samples are publicly available at \href{https://github.com/vannguyennd/livuitcl}{https://github.com/vannguyennd/livuitcl.}
    Dataset: Impact Events for Structural Health Monitoring of a Plastic Thin Plate. (arXiv:2209.10018v1 [cs.LG])
    Nowadays, more and more datasets are published towards research and development of systems and models, enabling direct comparisons, continuous improvement of solutions, and researchers engagement with experimental, real life data. However, especially in the Structural Health Monitoring (SHM) domain, there are plenty of cases where new research projects have a unique combination of structure design and implementation, sensor selection and technological enablers that does not fit with the configuration of relevant individual studies in the literature. Thus, we share the data from our case study to the research community as we did not find any relevant repository available. More specifically, in this paper, we present a novel time-series dataset for impact detection and localization on a plastic thin-plate, towards Structural Health Monitoring applications, using ceramic piezoelectric transducers (PZTs) connected to an Internet of Things (IoT) device. The dataset was collected from an experimental procedure of low-velocity, low-energy impact events that includes at least 3 repetitions for each unique experiment, while the input measurements come from 4 PZT sensors placed at the corners of the plate. For each repetition and sensor, 5000 values are stored with 100 KHz sampling rate. The system is excited with a steel ball, and the height from which it is released varies from 10 cm to 20 cm. The dataset is available in GitHub (https://github.com/Smart-Objects/Impact-Events-Dataset).
    DARTSRepair: Core-failure-set Guided DARTS for Network Robustness to Common Corruptions. (arXiv:2209.10381v1 [cs.CV])
    Network architecture search (NAS), in particular the differentiable architecture search (DARTS) method, has shown a great power to learn excellent model architectures on the specific dataset of interest. In contrast to using a fixed dataset, in this work, we focus on a different but important scenario for NAS: how to refine a deployed network's model architecture to enhance its robustness with the guidance of a few collected and misclassified examples that are degraded by some real-world unknown corruptions having a specific pattern (e.g., noise, blur, etc.). To this end, we first conduct an empirical study to validate that the model architectures can be definitely related to the corruption patterns. Surprisingly, by just adding a few corrupted and misclassified examples (e.g., $10^3$ examples) to the clean training dataset (e.g., $5.0 \times 10^4$ examples), we can refine the model architecture and enhance the robustness significantly. To make it more practical, the key problem, i.e., how to select the proper failure examples for the effective NAS guidance, should be carefully investigated. Then, we propose a novel core-failure-set guided DARTS that embeds a K-center-greedy algorithm for DARTS to select suitable corrupted failure examples to refine the model architecture. We use our method for DARTS-refined DNNs on the clean as well as 15 corruptions with the guidance of four specific real-world corruptions. Compared with the state-of-the-art NAS as well as data-augmentation-based enhancement methods, our final method can achieve higher accuracy on both corrupted datasets and the original clean dataset. On some of the corruption patterns, we can achieve as high as over 45% absolute accuracy improvements.
    Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models. (arXiv:2209.10064v1 [stat.ML])
    We study the problem of off-policy evaluation (OPE) for episodic Partially Observable Markov Decision Processes (POMDPs) with continuous states. Motivated by the recently proposed proximal causal inference framework, we develop a non-parametric identification result for estimating the policy value via a sequence of so-called V-bridge functions with the help of time-dependent proxy variables. We then develop a fitted-Q-evaluation-type algorithm to estimate V-bridge functions recursively, where a non-parametric instrumental variable (NPIV) problem is solved at each step. By analyzing this challenging sequential NPIV problem, we establish the finite-sample error bounds for estimating the V-bridge functions and accordingly that for evaluating the policy value, in terms of the sample size, length of horizon and so-called (local) measure of ill-posedness at each step. To the best of our knowledge, this is the first finite-sample error bound for OPE in POMDPs under non-parametric models.
    Towards a Standardised Performance Evaluation Protocol for Cooperative MARL. (arXiv:2209.10485v1 [cs.LG])
    Multi-agent reinforcement learning (MARL) has emerged as a useful approach to solving decentralised decision-making problems at scale. Research in the field has been growing steadily with many breakthrough algorithms proposed in recent years. In this work, we take a closer look at this rapid development with a focus on evaluation methodologies employed across a large body of research in cooperative MARL. By conducting a detailed meta-analysis of prior work, spanning 75 papers accepted for publication from 2016 to 2022, we bring to light worrying trends that put into question the true rate of progress. We further consider these trends in a wider context and take inspiration from single-agent RL literature on similar issues with recommendations that remain applicable to MARL. Combining these recommendations, with novel insights from our analysis, we propose a standardised performance evaluation protocol for cooperative MARL. We argue that such a standard protocol, if widely adopted, would greatly improve the validity and credibility of future research, make replication and reproducibility easier, as well as improve the ability of the field to accurately gauge the rate of progress over time by being able to make sound comparisons across different works. Finally, we release our meta-analysis data publicly on our project website for future research on evaluation: https://sites.google.com/view/marl-standard-protocol
    Learning to Relight Portrait Images via a Virtual Light Stage and Synthetic-to-Real Adaptation. (arXiv:2209.10510v1 [cs.CV])
    Given a portrait image of a person and an environment map of the target lighting, portrait relighting aims to re-illuminate the person in the image as if the person appeared in an environment with the target lighting. To achieve high-quality results, recent methods rely on deep learning. An effective approach is to supervise the training of deep neural networks with a high-fidelity dataset of desired input-output pairs, captured with a light stage. However, acquiring such data requires an expensive special capture rig and time-consuming efforts, limiting access to only a few resourceful laboratories. To address the limitation, we propose a new approach that can perform on par with the state-of-the-art (SOTA) relighting methods without requiring a light stage. Our approach is based on the realization that a successful relighting of a portrait image depends on two conditions. First, the method needs to mimic the behaviors of physically-based relighting. Second, the output has to be photorealistic. To meet the first condition, we propose to train the relighting network with training data generated by a virtual light stage that performs physically-based rendering on various 3D synthetic humans under different environment maps. To meet the second condition, we develop a novel synthetic-to-real approach to bring photorealism to the relighting network output. In addition to achieving SOTA results, our approach offers several advantages over the prior methods, including controllable glares on glasses and more temporally-consistent results for relighting videos.
    Learning Bilinear Models of Actuated Koopman Generators from Partially-Observed Trajectories. (arXiv:2209.09977v1 [math.DS])
    Data-driven models for nonlinear dynamical systems based on approximating the underlying Koopman operator or generator have proven to be successful tools for forecasting, feature learning, state estimation, and control. It has become well known that the Koopman generators for control-affine systems also have affine dependence on the input, leading to convenient finite-dimensional bilinear approximations of the dynamics. Yet there are still two main obstacles that limit the scope of current approaches for approximating the Koopman generators of systems with actuation. First, the performance of existing methods depends heavily on the choice of basis functions over which the Koopman generator is to be approximated; and there is currently no universal way to choose them for systems that are not measure preserving. Secondly, if we do not observe the full state, we may not gain access to a sufficiently rich collection of such functions to describe the dynamics. This is because the commonly used method of forming time-delayed observables fails when there is actuation. To remedy these issues, we write the dynamics of observables governed by the Koopman generator as a bilinear hidden Markov model, and determine the model parameters using the expectation-maximization (EM) algorithm. The E-step involves a standard Kalman filter and smoother, while the M-step resembles control-affine dynamic mode decomposition for the generator. We demonstrate the performance of this method on three examples, including recovery of a finite-dimensional Koopman-invariant subspace for an actuated system with a slow manifold; estimation of Koopman eigenfunctions for the unforced Duffing equation; and model-predictive control of a fluidic pinball system based only on noisy observations of lift and drag.
    Extreme Multi-Domain, Multi-Task Learning With Unified Text-to-Text Transfer Transformers. (arXiv:2209.10106v1 [cs.CL])
    Text-to-text transformers have shown remarkable success in the task of multi-task transfer learning, especially in natural language processing (NLP). However, while there have been several attempts to train transformers on different domains, there is usually a clear relationship between these domains, e.g.,, code summarization, where the natural language summary describes the code. There have been very few attempts to study how multi-task transfer learning works on tasks in significantly different domains. In this project, we investigated the behavior of multi-domain, multi-task learning using multi-domain text-to-text transfer transformers (MD-T5) on four tasks across two domains - Python Code and Chess. We carried out extensive experiments using three popular training strategies: Bert-style joint pretraining + successive finetuning, GPT-style joint pretraining + successive finetuning, and GPT-style joint pretraining + joint finetuning. Also, we evaluate the model on four metrics - Play Score, Eval Score, BLEU Score, and Multi-Domain Learning Score (MDLS). These metrics measure performance across the various tasks and multi-domain learning. We show that while negative knowledge transfer and catastrophic forgetting are still considerable challenges for all the models, the GPT-style joint pretraining + joint finetuning strategy showed the most promise in multi-domain, multi-task learning as it performs well across all four tasks while still keeping its multi-domain knowledge.
    Detecting Crop Burning in India using Satellite Data. (arXiv:2209.10148v1 [cs.CV])
    Crop residue burning is a major source of air pollution in many parts of the world, notably South Asia. Policymakers, practitioners and researchers have invested in both measuring impacts and developing interventions to reduce burning. However, measuring the impacts of burning or the effectiveness of interventions to reduce burning requires data on where burning occurred. These data are challenging to collect in the field, both in terms of cost and feasibility. We take advantage of data from ground-based monitoring of crop residue burning in Punjab, India to explore whether burning can be detected more effectively using accessible satellite imagery. Specifically, we used 3m PlanetScope data with high temporal resolution (up to daily) as well as publicly-available Sentinel-2 data with weekly temporal resolution but greater depth of spectral information. Following an analysis of the ability of different spectral bands and burn indices to separate burned and unburned plots individually, we built a Random Forest model with those determined to provide the greatest separability and evaluated model performance with ground-verified data. Our overall model accuracy of 82-percent is favorable given the challenges presented by the measurement. Based on insights from this process, we discuss technical challenges of detecting crop residue burning from satellite imagery as well as challenges to measuring impacts, both of burning and of policy interventions.
    Social-Inverse: Inverse Decision-making of Social Contagion Management with Task Migrations. (arXiv:2209.10493v1 [cs.LG])
    Considering two decision-making tasks $A$ and $B$, each of which wishes to compute an effective \textit{decision} $Y$ for a given \textit{query} $X$, {can we solve task $B$ by using query-decision pairs $(X, Y)$ of $A$ without knowing the latent decision-making model?} Such problems, called \textit{inverse decision-making with task migrations}, are of interest in that the complex and stochastic nature of real-world applications often prevents the agent from completely knowing the underlying system. In this paper, we introduce such a new problem with formal formulations and present a generic framework for addressing decision-making tasks in social contagion management. On the theory side, we present a generalization analysis for justifying the learning performance of our framework. In empirical studies, we perform a sanity check and compare the presented method with other possible learning-based and graph-based methods. We have acquired promising experimental results, confirming for the first time that it is possible to solve one decision-making task by using the solutions associated with another one.
    Audit and Improve Robustness of Private Neural Networks on Encrypted Data. (arXiv:2209.09996v1 [cs.LG])
    Performing neural network inference on encrypted data without decryption is one popular method to enable privacy-preserving neural networks (PNet) as a service. Compared with regular neural networks deployed for machine-learning-as-a-service, PNet requires additional encoding, e.g., quantized-precision numbers, and polynomial activation. Encrypted input also introduces novel challenges such as adversarial robustness and security. To the best of our knowledge, we are the first to study questions including (i) Whether PNet is more robust against adversarial inputs than regular neural networks? (ii) How to design a robust PNet given the encrypted input without decryption? We propose PNet-Attack to generate black-box adversarial examples that can successfully attack PNet in both target and untarget manners. The attack results show that PNet robustness against adversarial inputs needs to be improved. This is not a trivial task because the PNet model owner does not have access to the plaintext of the input values, which prevents the application of existing detection and defense methods such as input tuning, model normalization, and adversarial training. To tackle this challenge, we propose a new fast and accurate noise insertion method, called RPNet, to design Robust and Private Neural Networks. Our comprehensive experiments show that PNet-Attack reduces at least $2.5\times$ queries than prior works. We theoretically analyze our RPNet methods and demonstrate that RPNet can decrease $\sim 91.88\%$ attack success rate.
    Generalized Gloves of Neural Additive Models: Pursuing transparent and accurate machine learning models in finance. (arXiv:2209.10082v1 [cs.LG])
    For many years, machine learning methods have been used in a wide range of fields, including computer vision and natural language processing. While machine learning methods have significantly improved model performance over traditional methods, their black-box structure makes it difficult for researchers to interpret results. For highly regulated financial industries, transparency, explainability, and fairness are equally, if not more, important than accuracy. Without meeting regulated requirements, even highly accurate machine learning methods are unlikely to be accepted. We address this issue by introducing a novel class of transparent and interpretable machine learning algorithms known as generalized gloves of neural additive models. The generalized gloves of neural additive models separate features into three categories: linear features, individual nonlinear features, and interacted nonlinear features. Additionally, interactions in the last category are only local. The linear and nonlinear components are distinguished by a stepwise selection algorithm, and interacted groups are carefully verified by applying additive separation criteria. Empirical results demonstrate that generalized gloves of neural additive models provide optimal accuracy with the simplest architecture, allowing for a highly accurate, transparent, and explainable approach to machine learning.
    The ReturnZero System for VoxCeleb Speaker Recognition Challenge 2022. (arXiv:2209.10147v1 [eess.AS])
    In this paper, we describe the top-scoring submissions for team RTZR VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22) in the closed dataset, speaker verification Track 1. The top performed system is a fusion of 7 models, which contains 3 different types of model architectures. We focus on training models to learn extra-temporal information. Therefore, all models were trained with 4-6 second frames for each utterance. Also, we apply the Large Margin Fine-tuning strategy which has shown good performance on the previous challenges for some of our fusion models. While the evaluation process, we apply the scoring methods with adaptive symmetric normalization (AS-Norm) and matrix score average (MSA). Finally, we mix up models with logistic regression to fuse all the trained models. The final submission achieves 0.165 DCF and 2.912% EER on the VoxSRC22 test set.
    Finite-Sum Coupled Compositional Stochastic Optimization: Theory and Applications. (arXiv:2202.12396v6 [math.OC] UPDATED)
    This paper studies stochastic optimization for a sum of compositional functions, where the inner-level function of each summand is coupled with the corresponding summation index. We refer to this family of problems as finite-sum coupled compositional optimization (FCCO). It has broad applications in machine learning for optimizing non-convex or convex compositional measures/objectives such as average precision (AP), p-norm push, listwise ranking losses, neighborhood component analysis (NCA), deep survival analysis, deep latent variable models, etc., which deserves finer analysis. Yet, existing algorithms and analyses are restricted in one or other aspects. The contribution of this paper is to provide a comprehensive convergence analysis of a simple stochastic algorithm for both non-convex and convex objectives. Our key result is the improved oracle complexity with the parallel speed-up by using the moving-average based estimator with mini-batching. Our theoretical analysis also exhibits new insights for improving the practical implementation by sampling the batches of equal size for the outer and inner levels. Numerical experiments on AP maximization, NCA, and p-norm push corroborate some aspects of the theory.
    SC2EGSet: StarCraft II Esport Replay and Game-state Dataset. (arXiv:2207.03428v2 [cs.LG] UPDATED)
    As a relatively new form of sport, esports offers unparalleled data availability. Despite the vast amounts of data that are generated by game engines, it can be challenging to extract them and verify their integrity for the purposes of practical and scientific use. Our work aims to open esports to a broader scientific community by supplying raw and pre-processed files from StarCraft II esports tournaments. These files can be used in statistical and machine learning modeling tasks and related to various laboratory-based measurements (e.g., behavioral tests, brain imaging). We have gathered publicly available game-engine generated "replays" of tournament matches and performed data extraction and cleanup using a low-level application programming interface (API) parser library. Additionally, we open-sourced and published all the custom tools that were developed in the process of creating our dataset. These tools include PyTorch and PyTorch Lightning API abstractions to load and model the data. Our dataset contains replays from major and premiere StarCraft II tournaments since 2016. To prepare the dataset, we processed 55 tournament "replaypacks" that contained 17930 files with game-state information. Based on initial investigation of available StarCraft II datasets, we observed that our dataset is the largest publicly available source of StarCraft II esports data upon its publication. Analysis of the extracted data holds promise for further Artificial Intelligence (AI), Machine Learning (ML), psychological, Human-Computer Interaction (HCI), and sports-related studies in a variety of supervised and self-supervised tasks.
    Intentional Choreography with Semi-Supervised Recurrent VAEs. (arXiv:2209.10010v1 [cs.LG])
    We summarize the model and results of PirouNet, a semi-supervised recurrent variational autoencoder. Given a small amount of dance sequences labeled with qualitative choreographic annotations, PirouNet conditionally generates dance sequences in the style of the choreographer.
    Leak Detection in Natural Gas Pipeline Using Machine Learning Models. (arXiv:2209.10121v1 [cs.LG])
    Leak detection in gas pipelines is an important and persistent problem in the Oil and Gas industry. This is particularly important as pipelines are the most common way of transporting natural gas. This research aims to study the ability of data-driven intelligent models to detect small leaks for a natural gas pipeline using basic operational parameters and then compare the intelligent models among themselves using existing performance metrics. This project applies the observer design technique to detect leaks in natural gas pipelines using a regressoclassification hierarchical model where an intelligent model acts as a regressor and a modified logistic regression model acts as a classifier. Five intelligent models (gradient boosting, decision trees, random forest, support vector machine and artificial neural network) are studied in this project using a pipeline data stream of four weeks. The results shows that while support vector machine and artificial neural networks are better regressors than the others, they do not provide the best results in leak detection due to their internal complexities and the volume of data used. The random forest and decision tree models are the most sensitive as they can detect a leak of 0.1% of nominal flow in about 2 hours. All the intelligent models had high reliability with zero false alarm rate in testing phase. The average time to leak detection for all the intelligent models was compared to a real time transient model in literature. The results show that intelligent models perform relatively well in the problem of leak detection. This result suggests that intelligent models could be used alongside a real time transient model to significantly improve leak detection results.
    Sanity Check for External Clustering Validation Benchmarks using Internal Validation Measures. (arXiv:2209.10042v1 [cs.LG])
    We address the lack of reliability in benchmarking clustering techniques based on labeled datasets. A standard scheme in external clustering validation is to use class labels as ground truth clusters, based on the assumption that each class forms a single, clearly separated cluster. However, as such cluster-label matching (CLM) assumption often breaks, the lack of conducting a sanity check for the CLM of benchmark datasets casts doubt on the validity of external validations. Still, evaluating the degree of CLM is challenging. For example, internal clustering validation measures can be used to quantify CLM within the same dataset to evaluate its different clusterings but are not designed to compare clusterings of different datasets. In this work, we propose a principled way to generate between-dataset internal measures that enable the comparison of CLM across datasets. We first determine four axioms for between-dataset internal measures, complementing Ackerman and Ben-David's within-dataset axioms. We then propose processes to generalize internal measures to fulfill these new axioms, and use them to extend the widely used Calinski-Harabasz index for between-dataset CLM evaluation. Through quantitative experiments, we (1) verify the validity and necessity of the generalization processes and (2) show that the proposed between-dataset Calinski-Harabasz index accurately evaluates CLM across datasets. Finally, we demonstrate the importance of evaluating CLM of benchmark datasets before conducting external validation.
    On the Complexity of Finding Small Subgradients in Nonsmooth Optimization. (arXiv:2209.10346v1 [math.OC])
    We study the oracle complexity of producing $(\delta,\epsilon)$-stationary points of Lipschitz functions, in the sense proposed by Zhang et al. [2020]. While there exist dimension-free randomized algorithms for producing such points within $\widetilde{O}(1/\delta\epsilon^3)$ first-order oracle calls, we show that no dimension-free rate can be achieved by a deterministic algorithm. On the other hand, we point out that this rate can be derandomized for smooth functions with merely a logarithmic dependence on the smoothness parameter. Moreover, we establish several lower bounds for this task which hold for any randomized algorithm, with or without convexity. Finally, we show how the convergence rate of finding $(\delta,\epsilon)$-stationary points can be improved in case the function is convex, a setting which we motivate by proving that in general no finite time algorithm can produce points with small subgradients even for convex functions.
    Machine Learning on generalized Complete Intersection Calabi-Yau Manifolds. (arXiv:2209.10157v1 [hep-th])
    Generalized Complete Intersection Calabi-Yau Manifold (gCICY) is a new construction of Calabi-Yau manifolds established recently. However, the generation of new gCICYs using standard algebraic method is very laborious. Due to this complexity, the number of gCICYs and their classification still remain unknown. In this paper, we try to make some progress in this direction using neural network. The results showed that our trained models can have a high precision on the existing type $(1,1)$ and type $(2,1)$ gCICYs in the literature. Moreover, They can achieve a $97\%$ precision in predicting new gCICY which is generated differently from those used for training and testing. This shows that machine learning could be an effective method to classify and generate new gCICY.
    Fingerprinting Robot Movements via Acoustic Side Channel. (arXiv:2209.10240v1 [cs.CR])
    In this paper, we present an acoustic side channel attack which makes use of smartphone microphones recording a robot in operation to exploit acoustic properties of the sound to fingerprint a robot's movements. In this work we consider the possibility of an insider adversary who is within physical proximity of a robotic system (such as a technician or robot operator), equipped with only their smartphone microphone. Through the acoustic side-channel, we demonstrate that it is indeed possible to fingerprint not only individual robot movements within 3D space, but also patterns of movements which could lead to inferring the purpose of the movements (i.e. surgical procedures which a surgical robot is undertaking) and hence, resulting in potential privacy violations. Upon evaluation, we find that individual robot movements can be fingerprinted with around 75% accuracy, decreasing slightly with more fine-grained movement meta-data such as distance and speed. Furthermore, workflows could be reconstructed with around 62% accuracy as a whole, with more complex movements such as pick-and-place or packing reconstructed with near perfect accuracy. As well as this, in some environments such as surgical settings, audio may be recorded and transmitted over VoIP, such as for education/teaching purposes or in remote telemedicine. The question here is, can the same attack be successful even when VoIP communication is employed, and how does packet loss impact the captured audio and the success of the attack? Using the same characteristics of acoustic sound for plain audio captured by the smartphone, the attack was 90% accurate in fingerprinting VoIP samples on average, 15% higher than the baseline without the VoIP codec employed. This opens up new research questions regarding anonymous communications to protect robotic systems from acoustic side channel attacks via VoIP communication networks.
    On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies. (arXiv:2209.10072v1 [cs.AI])
    Modern meta-reinforcement learning (Meta-RL) methods are mainly developed based on model-agnostic meta-learning, which performs policy gradient steps across tasks to maximize policy performance. However, the gradient conflict problem is still poorly understood in Meta-RL, which may lead to performance degradation when encountering distinct tasks. To tackle this challenge, this paper proposes a novel personalized Meta-RL (pMeta-RL) algorithm, which aggregates task-specific personalized policies to update a meta-policy used for all tasks, while maintaining personalized policies to maximize the average return of each task under the constraint of the meta-policy. We also provide the theoretical analysis under the tabular setting, which demonstrates the convergence of our pMeta-RL algorithm. Moreover, we extend the proposed pMeta-RL algorithm to a deep network version based on soft actor-critic, making it suitable for continuous control tasks. Experiment results show that the proposed algorithms outperform other previous Meta-RL algorithms on Gym and MuJoCo suites.
    Revisiting Discrete Soft Actor-Critic. (arXiv:2209.10081v1 [cs.LG])
    We study the adaption of soft actor-critic (SAC)from continuous action space to discrete action space. We revisit vanilla SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at:https://github.com/coldsummerday/Revisiting-Discrete-SAC.
    Tree Methods for Hierarchical Classification in Parallel. (arXiv:2209.10288v1 [cs.LG])
    We propose methods that enable efficient hierarchical classification in parallel. Our methods transform a batch of classification scores and labels, corresponding to given nodes in a semantic tree, to scores and labels corresponding to all nodes in the ancestral paths going down the tree to every given node, relying only on tensor operations that execute efficiently on hardware accelerators. We implement our methods and test them on current hardware accelerators with a tree incorporating all English-language synsets in WordNet 3.0, spanning 117,659 classes in 20 levels of depth. We transform batches of scores and labels to their respective ancestral paths, incurring negligible computation and consuming only a fixed 0.04GB of memory over the footprint of data.
    Investigating and Mitigating Failure Modes in Physics-informed Neural Networks (PINNs). (arXiv:2209.09988v1 [cs.LG])
    In this paper, we demonstrate and investigate several challenges that stand in the way of tackling complex problems using physics-informed neural networks. In particular, we visualize the loss landscapes of trained models and perform sensitivity analysis of backpropagated gradients in the presence of physics. Our findings suggest that existing methods produce highly non-convex loss landscapes that are difficult to navigate. Furthermore, high-order PDEs contaminate the backpropagated gradients that may impede or prevent convergence. We then propose a novel method that bypasses the calculation of high-order PDE operators and mitigates the contamination of backpropagating gradients. In doing so, we reduce the dimension of the search space of our solution and facilitate learning problems with non-smooth solutions. Our formulation also provides a feedback mechanism that helps our model adaptively focus on complex regions of the domain that are difficult to learn. We then formulate an unconstrained dual problem by adapting the Lagrange multiplier method. We apply our method to solve several challenging benchmark problems governed by linear and non-linear PDEs.
    Projected Gradient Descent Algorithms for Solving Nonlinear Inverse Problems with Generative Priors. (arXiv:2209.10093v1 [stat.ML])
    In this paper, we propose projected gradient descent (PGD) algorithms for signal estimation from noisy nonlinear measurements. We assume that the unknown $p$-dimensional signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. In particular, we consider two cases when the nonlinear link function is either unknown or known. For unknown nonlinearity, similarly to \cite{liu2020generalized}, we make the assumption of sub-Gaussian observations and propose a linear least-squares estimator. We show that when there is no representation error and the sensing vectors are Gaussian, roughly $O(k \log L)$ samples suffice to ensure that a PGD algorithm converges linearly to a point achieving the optimal statistical rate using arbitrary initialization. For known nonlinearity, we assume monotonicity as in \cite{yang2016sparse}, and make much weaker assumptions on the sensing vectors and allow for representation error. We propose a nonlinear least-squares estimator that is guaranteed to enjoy an optimal statistical rate. A corresponding PGD algorithm is provided and is shown to also converge linearly to the estimator using arbitrary initialization. In addition, we present experimental results on image datasets to demonstrate the performance of our PGD algorithms.
    MulBot: Unsupervised Bot Detection Based on Multivariate Time Series. (arXiv:2209.10361v1 [cs.SI])
    Online social networks are actively involved in the removal of malicious social bots due to their role in the spread of low quality information. However, most of the existing bot detectors are supervised classifiers incapable of capturing the evolving behavior of sophisticated bots. Here we propose MulBot, an unsupervised bot detector based on multivariate time series (MTS). For the first time, we exploit multidimensional temporal features extracted from user timelines. We manage the multidimensionality with an LSTM autoencoder, which projects the MTS in a suitable latent space. Then, we perform a clustering step on this encoded representation to identify dense groups of very similar users -- a known sign of automation. Finally, we perform a binary classification task achieving f1-score $= 0.99$, outperforming state-of-the-art methods (f1-score $\le 0.97$). Not only does MulBot achieve excellent results in the binary classification task, but we also demonstrate its strengths in a novel and practically-relevant task: detecting and separating different botnets. In this multi-class classification task we achieve f1-score $= 0.96$. We conclude by estimating the importance of the different features used in our model and by evaluating MulBot's capability to generalize to new unseen bots, thus proposing a solution to the generalization deficiencies of supervised bot detectors.
    Improving Generalizability of Graph Anomaly Detection Models via Data Augmentation. (arXiv:2209.10168v1 [cs.LG])
    Graph anomaly detection (GAD) is a vital task since even a few anomalies can pose huge threats to benign users. Recent semi-supervised GAD methods, which can effectively leverage the available labels as prior knowledge, have achieved superior performances than unsupervised methods. In practice, people usually need to identify anomalies on new (sub)graphs to secure their business, but they may lack labels to train an effective detection model. One natural idea is to directly adopt a trained GAD model to the new (sub)graph for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issue, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the same graph. It may cause great troubles. In this paper, we base on the phenomenon and propose a general and novel research problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph and unseen testing graph to eliminate potential dangers. Nevertheless, it is a challenging task since only limited labels are available, and the normal background may differ between training and testing data. Accordingly, we propose a data augmentation method named \textit{AugAN} (\uline{Aug}mentation for \uline{A}nomaly and \uline{N}ormal distributions) to enrich training data and boost the generalizability of GAD models. Experiments verify the effectiveness of our method in improving model generalizability.
    Identification of Adaptive Driving Style Preference through Implicit Inputs in SAE L2 Vehicles. (arXiv:2209.10536v1 [cs.HC])
    A key factor to optimal acceptance and comfort of automated vehicle features is the driving style. Mismatches between the automated and the driver preferred driving styles can make users take over more frequently or even disable the automation features. This work proposes identification of user driving style preference with multimodal signals, so the vehicle could match user preference in a continuous and automatic way. We conducted a driving simulator study with 36 participants and collected extensive multimodal data including behavioral, physiological, and situational data. This includes eye gaze, steering grip force, driving maneuvers, brake and throttle pedal inputs as well as foot distance from pedals, pupil diameter, galvanic skin response, heart rate, and situational drive context. Then, we built machine learning models to identify preferred driving styles, and confirmed that all modalities are important for the identification of user preference. This work paves the road for implicit adaptive driving styles on automated vehicles.
    T5QL: Taming language models for SQL generation. (arXiv:2209.10254v1 [cs.LG])
    Automatic SQL generation has been an active research area, aiming at streamlining the access to databases by writing natural language with the given intent instead of writing SQL. Current SOTA methods for semantic parsing depend on LLMs to achieve high predictive accuracy on benchmark datasets. This reduces their applicability, since LLMs requires expensive GPUs. Furthermore, SOTA methods are ungrounded and thus not guaranteed to always generate valid SQL. Here we propose T5QL, a new SQL generation method that improves the performance in benchmark datasets when using smaller LMs, namely T5-Base, by 13pp when compared against SOTA methods. Additionally, T5QL is guaranteed to always output valid SQL using a context-free grammar to constrain SQL generation. Finally, we show that dividing semantic parsing in two tasks, candidate SQLs generation and candidate re-ranking, is a promising research avenue that can reduce the need for large LMs.
    Off-Policy Risk Assessment in Markov Decision Processes. (arXiv:2209.10444v1 [cs.LG])
    Addressing such diverse ends as safety alignment with human preferences, and the efficiency of learning, a growing line of reinforcement learning research focuses on risk functionals that depend on the entire distribution of returns. Recent work on \emph{off-policy risk assessment} (OPRA) for contextual bandits introduced consistent estimators for the target policy's CDF of returns along with finite sample guarantees that extend to (and hold simultaneously over) all risk. In this paper, we lift OPRA to Markov decision processes (MDPs), where importance sampling (IS) CDF estimators suffer high variance on longer trajectories due to small effective sample size. To mitigate these problems, we incorporate model-based estimation to develop the first doubly robust (DR) estimator for the CDF of returns in MDPs. This estimator enjoys significantly less variance and, when the model is well specified, achieves the Cramer-Rao variance lower bound. Moreover, for many risk functionals, the downstream estimates enjoy both lower bias and lower variance. Additionally, we derive the first minimax lower bounds for off-policy CDF and risk estimation, which match our error bounds up to a constant factor. Finally, we demonstrate the precision of our DR CDF estimates experimentally on several different environments.
    Distributed Dynamic Map Fusion via Federated Learning for Intelligent Networked Vehicles. (arXiv:2103.03786v2 [cs.LG] UPDATED)
    The technology of dynamic map fusion among networked vehicles has been developed to enlarge sensing ranges and improve sensing accuracies for individual vehicles. This paper proposes a federated learning (FL) based dynamic map fusion framework to achieve high map quality despite unknown numbers of objects in fields of view (FoVs), various sensing and model uncertainties, and missing data labels for online learning. The novelty of this work is threefold: (1) developing a three-stage fusion scheme to predict the number of objects effectively and to fuse multiple local maps with fidelity scores; (2) developing an FL algorithm which fine-tunes feature models (i.e., representation learning networks for feature extraction) distributively by aggregating model parameters; (3) developing a knowledge distillation method to generate FL training labels when data labels are unavailable. The proposed framework is implemented in the Car Learning to Act (CARLA) simulation platform. Extensive experimental results are provided to verify the superior performance and robustness of the developed map fusion and FL schemes.
    Benchmarking energy consumption and latency for neuromorphic computing in condensed matter and particle physics. (arXiv:2209.10481v1 [cs.ET])
    The massive use of artificial neural networks (ANNs), increasingly popular in many areas of scientific computing, rapidly increases the energy consumption of modern high-performance computing systems. An appealing and possibly more sustainable alternative is provided by novel neuromorphic paradigms, which directly implement ANNs in hardware. However, little is known about the actual benefits of running ANNs on neuromorphic hardware for use cases in scientific computing. Here we present a methodology for measuring the energy cost and compute time for inference tasks with ANNs on conventional hardware. In addition, we have designed an architecture for these tasks and estimate the same metrics based on a state-of-the-art analog in-memory computing (AIMC) platform, one of the key paradigms in neuromorphic computing. Both methodologies are compared for a use case in quantum many-body physics in two dimensional condensed matter systems and for anomaly detection at 40 MHz rates at the Large Hadron Collider in particle physics. We find that AIMC can achieve up to one order of magnitude shorter computation times than conventional hardware, at an energy cost that is up to three orders of magnitude smaller. This suggests great potential for faster and more sustainable scientific computing with neuromorphic hardware.
    Benchmarking Online Sequence-to-Sequence and Character-based Handwriting Recognition from IMU-Enhanced Pens. (arXiv:2202.07036v3 [cs.LG] UPDATED)
    Purpose. Handwriting is one of the most frequently occurring patterns in everyday life and with it come challenging applications such as handwriting recognition (HWR), writer identification, and signature verification. In contrast to offline HWR that only uses spatial information (i.e., images), online HWR (OnHWR) uses richer spatio-temporal information (i.e., trajectory data or inertial data). While there exist many offline HWR datasets, there is only little data available for the development of OnHWR methods on paper as it requires hardware-integrated pens. Methods. This paper presents data and benchmark models for real-time sequence-to-sequence (seq2seq) learning and single character-based recognition. Our data is recorded by a sensor-enhanced ballpoint pen, yielding sensor data streams from triaxial accelerometers, a gyroscope, a magnetometer and a force sensor at 100 Hz. We propose a variety of datasets including equations and words for both the writer-dependent and writer-independent tasks. Our datasets allow a comparison between classical OnHWR on tablets and on paper with sensor-enhanced pens. We provide an evaluation benchmark for seq2seq and single character-based HWR using recurrent and temporal convolutional networks and Transformers combined with a connectionist temporal classification (CTC) loss and cross-entropy (CE) losses. Results. Our convolutional network combined with BiLSTMs outperforms Transformer-based architectures, is on par with InceptionTime for sequence-based classification tasks, and yields better results compared to 28 state-of-the-art techniques. Time-series augmentation methods improve the sequence-based task, and we show that CE variants can improve the single classification task.
    SPViT: Enabling Faster Vision Transformers via Soft Token Pruning. (arXiv:2112.13890v2 [cs.CV] UPDATED)
    Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Pruning, a traditional model compression paradigm for hardware efficiency, has been widely applied in various DNN structures. Nevertheless, it stays ambiguous on how to perform exclusive pruning on the ViT structure. Considering three key points: the structural characteristics, the internal data pattern of ViTs, and the related edge device deployment, we leverage the input token sparsity and propose a computation-aware soft pruning framework, which can be set up on vanilla Transformers of both flatten and CNN-type structures, such as Pooling-based ViT (PiT). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens generated by the selector module into a package token that will participate in subsequent calculations rather than being completely discarded. Our framework is bound to the trade-off between accuracy and computation constraints of specific edge devices through our proposed computation-aware training strategy. Experimental results show that our framework significantly reduces the computation cost of ViTs while maintaining comparable performance on image classification. Moreover, our framework can guarantee the identified model to meet resource specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile platforms. For example, our method reduces the latency of DeiT-T to 26 ms (26%$\sim $41% superior to existing works) on the mobile device with 0.25%$\sim $4% higher top-1 accuracy on ImageNet.
    Improved Marginal Unbiased Score Expansion (MUSE) via Implicit Differentiation. (arXiv:2209.10512v1 [stat.ML])
    We apply the technique of implicit differentiation to boost performance, reduce numerical error, and remove required user-tuning in the Marginal Unbiased Score Expansion (MUSE) algorithm for hierarchical Bayesian inference. We demonstrate these improvements on three representative inference problems: 1) an extended Neal's funnel 2) Bayesian neural networks, and 3) probabilistic principal component analysis. On our particular test cases, MUSE with implicit differentiation is faster than Hamiltonian Monte Carlo by factors of 155, 397, and 5, respectively, or factors of 65, 278, and 1 without implicit differentiation, and yields good approximate marginal posteriors. The Julia and Python MUSE packages have been updated to use implicit differentiation, and can solve problems defined by hand or with any of a number of popular probabilistic programming languages and automatic differentiation backends.
    Probabilistic Robust Linear Quadratic Regulators with Gaussian Processes. (arXiv:2105.07668v2 [eess.SY] UPDATED)
    Probabilistic models such as Gaussian processes (GPs) are powerful tools to learn unknown dynamical systems from data for subsequent use in control design. While learning-based control has the potential to yield superior performance in demanding applications, robustness to uncertainty remains an important challenge. Since Bayesian methods quantify uncertainty of the learning results, it is natural to incorporate these uncertainties into a robust design. In contrast to most state-of-the-art approaches that consider worst-case estimates, we leverage the learning method's posterior distribution in the controller synthesis. The result is a more informed and, thus, more efficient trade-off between performance and robustness. We present a novel controller synthesis for linearized GP dynamics that yields robust controllers with respect to a probabilistic stability margin. The formulation is based on a recently proposed algorithm for linear quadratic control synthesis, which we extend by giving probabilistic robustness guarantees in the form of credibility bounds for the system's stability.Comparisons to existing methods based on worst-case and certainty-equivalence designs reveal superior performance and robustness properties of the proposed method.
    Reconfigurable Intelligent Surface Enabled Spatial Multiplexing with Fully Convolutional Network. (arXiv:2201.02834v2 [eess.SP] UPDATED)
    Reconfigurable intelligent surface (RIS) is an emerging technology for future wireless communication systems. In this work, we consider downlink spatial multiplexing enabled by the RIS for weighted sum-rate (WSR) maximization. In the literature, most solutions use alternating gradient-based optimization, which has moderate performance, high complexity, and limited scalability. We propose to apply a fully convolutional network (FCN) to solve this problem, which was originally designed for semantic segmentation of images. The rectangular shape of the RIS and the spatial correlation of channels with adjacent RIS antennas due to the short distance between them encourage us to apply it for the RIS configuration. We design a set of channel features that includes both cascaded channels via the RIS and the direct channel. In the base station (BS), the differentiable minimum mean squared error (MMSE) precoder is used for pretraining and the weighted minimum mean squared error (WMMSE) precoder is then applied for fine-tuning, which is nondifferentiable, more complex, but achieves a better performance. Evaluation results show that the proposed solution has higher performance and allows for a faster evaluation than the baselines. Hence it scales better to a large number of antennas, advancing the RIS one step closer to practical deployment.
    Minimax Optimal Fixed-Budget Best Arm Identification in Linear Bandits. (arXiv:2105.13017v2 [cs.LG] UPDATED)
    We study the problem of best arm identification in linear bandits in the fixed-budget setting. By leveraging properties of the G-optimal design and incorporating it into the arm allocation rule, we design a parameter-free algorithm, Optimal Design-based Linear Best Arm Identification (OD-LinBAI). We provide a theoretical analysis of the failure probability of OD-LinBAI. Instead of all the optimality gaps, the performance of OD-LinBAI depends only on the gaps of the top $d$ arms, where $d$ is the effective dimension of the linear bandit instance. Complementarily, we present a minimax lower bound for this problem. The upper and lower bounds show that OD-LinBAI is minimax optimal up to constant multiplicative factors in the exponent, which is a significant theoretical improvement over existing methods (e.g., BayesGap, Peace, LinearExploration and GSE), and settles the question of ascertaining the difficulty of learning the best arm in the fixed-budget setting. Finally, numerical experiments demonstrate considerable empirical improvements over existing algorithms on a variety of real and synthetic datasets.
    Multi-time Predictions of Wildfire Grid Map using Remote Sensing Local Data. (arXiv:2209.10102v1 [cs.LG])
    Due to recent climate changes, we have seen more frequent and severe wildfires in the United States. Predicting wildfires is critical for natural disaster prevention and mitigation. Advances in technologies in data processing and communication enabled us to access remote sensing data. With the remote sensing data, valuable spatiotemporal statistical models can be created and used for resource management practices. This paper proposes a distributed learning framework that shares local data collected in ten locations in the western USA throughout the local agents. The local agents aim to predict wildfire grid maps one, two, three, and four weeks in advance while online processing the remote sensing data stream. The proposed model has distinct features that address the characteristic need in prediction evaluations, including dynamic online estimation and time-series modeling. Local fire event triggers are not isolated between locations, and there are confounding factors when local data is analyzed due to incomplete state observations. Compared to existing approaches that do not account for incomplete state observation within wildfire time-series data, on average, we can achieve higher prediction performance.
    Variational Inference for Infinitely Deep Neural Networks. (arXiv:2209.10091v1 [cs.LG])
    We introduce the unbounded depth neural network (UDN), an infinitely deep probabilistic model that adapts its complexity to the training data. The UDN contains an infinite sequence of hidden layers and places an unbounded prior on a truncation L, the layer from which it produces its data. Given a dataset of observations, the posterior UDN provides a conditional distribution of both the parameters of the infinite neural network and its truncation. We develop a novel variational inference algorithm to approximate this posterior, optimizing a distribution of the neural network weights and of the truncation depth L, and without any upper limit on L. To this end, the variational family has a special structure: it models neural network weights of arbitrary depth, and it dynamically creates or removes free variational parameters as its distribution of the truncation is optimized. (Unlike heuristic approaches to model search, it is solely through gradient-based optimization that this algorithm explores the space of truncations.) We study the UDN on real and synthetic data. We find that the UDN adapts its posterior depth to the dataset complexity; it outperforms standard neural networks of similar computational complexity; and it outperforms other approaches to infinite-depth neural networks.
    A Reinforcement Learning Framework with Description Language for Critical Driving Scenario Generation. (arXiv:2209.10078v1 [cs.AI])
    Critical scenario generation requires the ability of finding critical parameter combinations from the infinite parameter space in the logic scenario. Existing solutions aims to explore the correlation of parameters in the initial scenario without considering the connection between the parameters in the action sequence. How to model action sequences and consider the effects of different action parameter in the scenario remains a key challenge to solve the problem. In this paper, we propose a framework to generate critical scenarios for speeding up evaluating specific tasks. Specifically, we first propose a description language, BTScenario, to model the scenario, which contains the map, actors, interactions between actors, and oracles. We then use reinforcement learning to search for combinations of critical parameters. By adopting the action mask, the effects of non-fixed length and sequences in parameter space can be prevented. We demonstrate that the proposed framework is more efficient than random test and combination test methods in various scenarios.
    FoVolNet: Fast Volume Rendering using Foveated Deep Neural Networks. (arXiv:2209.09965v1 [cs.GR])
    Volume data is found in many important scientific and engineering applications. Rendering this data for visualization at high quality and interactive rates for demanding applications such as virtual reality is still not easily achievable even using professional-grade hardware. We introduce FoVolNet -- a method to significantly increase the performance of volume data visualization. We develop a cost-effective foveated rendering pipeline that sparsely samples a volume around a focal point and reconstructs the full-frame using a deep neural network. Foveated rendering is a technique that prioritizes rendering computations around the user's focal point. This approach leverages properties of the human visual system, thereby saving computational resources when rendering data in the periphery of the user's field of vision. Our reconstruction network combines direct and kernel prediction methods to produce fast, stable, and perceptually convincing output. With a slim design and the use of quantization, our method outperforms state-of-the-art neural reconstruction techniques in both end-to-end frame times and visual quality. We conduct extensive evaluations of the system's rendering performance, inference speed, and perceptual properties, and we provide comparisons to competing neural image reconstruction techniques. Our test results show that FoVolNet consistently achieves significant time saving over conventional rendering while preserving perceptual quality.
    Collaborative Anomaly Detection. (arXiv:2209.09923v1 [cs.LG])
    In recommendation systems, items are likely to be exposed to various users and we would like to learn about the familiarity of a new user with an existing item. This can be formulated as an anomaly detection (AD) problem distinguishing between "common users" (nominal) and "fresh users" (anomalous). Considering the sheer volume of items and the sparsity of user-item paired data, independently applying conventional single-task detection methods on each item quickly becomes difficult, while correlations between items are ignored. To address this multi-task anomaly detection problem, we propose collaborative anomaly detection (CAD) to jointly learn all tasks with an embedding encoding correlations among tasks. We explore CAD with conditional density estimation and conditional likelihood ratio estimation. We found that: $i$) estimating a likelihood ratio enjoys more efficient learning and yields better results than density estimation. $ii$) It is beneficial to select a small number of tasks in advance to learn a task embedding model, and then use it to warm-start all task embeddings. Consequently, these embeddings can capture correlations between tasks and generalize to new correlated tasks.
    Predicting Drug-Drug Interactions using Deep Generative Models on Graphs. (arXiv:2209.09941v1 [q-bio.BM])
    Latent representations of drugs and their targets produced by contemporary graph autoencoder-based models have proved useful in predicting many types of node-pair interactions on large networks, including drug-drug, drug-target, and target-target interactions. However, most existing approaches model the node's latent spaces in which node distributions are rigid and disjoint; these limitations hinder the methods from generating new links among pairs of nodes. In this paper, we present the effectiveness of variational graph autoencoders (VGAE) in modeling latent node representations on multimodal networks. Our approach can produce flexible latent spaces for each node type of the multimodal graph; the embeddings are used later for predicting links among node pairs under different edge types. To further enhance the models' performance, we suggest a new method that concatenates Morgan fingerprints, which capture the molecular structures of each drug, with their latent embeddings before preceding them to the decoding stage for link prediction. Our proposed model shows competitive results on two multimodal networks: (1) a multi-graph consisting of drug and protein nodes, and (2) a multi-graph consisting of drug and cell line nodes. Our source code is publicly available at https://github.com/HySonLab/drug-interactions.
    Boosting Star-GANs for Voice Conversion with Contrastive Discriminator. (arXiv:2209.10088v1 [eess.AS])
    Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios. However, the training of these models usually poses a challenge due to their complicated adversarial network architectures. To address this, in this work we leverage the state-of-the-art contrastive learning techniques and incorporate an efficient Siamese network structure into the StarGAN discriminator. Our method is called SimSiam-StarGAN-VC and it boosts the training stability and effectively prevents the discriminator overfitting issue in the training process. We conduct experiments on the Voice Conversion Challenge (VCC 2018) dataset, plus a user study to validate the performance of our framework. Our experimental results show that SimSiam-StarGAN-VC significantly outperforms existing StarGAN-VC methods in terms of both the objective and subjective metrics.
    Flashlight: Scalable Link Prediction with Effective Decoders. (arXiv:2209.10100v1 [cs.SI])
    Link prediction (LP) has been recognized as an important task in graph learning with its board practical applications. A typical application of LP is to retrieve the top scoring neighbors for a given source node, such as the friend recommendation. These services desire the high inference scalability to find the top scoring neighbors from many candidate nodes at low latencies. There are two popular decoders that the recent LP models mainly use to compute the edge scores from node embeddings: the \textbf{HadamardMLP} and \textbf{Dot Product} decoders. After theoretical and empirical analysis, we find that the HadamardMLP decoders are generally more effective for LP. However, HadamardMLP lacks the scalability for retrieving top scoring neighbors on large graphs, since to the best of our knowledge, there does not exist an algorithm to retrieve the top scoring neighbors for HadamardMLP decoders in sublinear complexity. To make HadamardMLP scalable, we propose the \textit{Flashlight} algorithm to accelerate the top scoring neighbor retrievals for HadamardMLP: a sublinear algorithm that progressively applies approximate maximum inner product search (MIPS) techniques with adaptively adjusted query embeddings. Empirical results show that Flashlight improves the inference speed of LP by more than 100 times on the large OGBL-CITATION2 dataset without sacrificing effectiveness. Our work paves the way for large-scale LP applications with the effective HadamardMLP decoders by greatly accelerating their inference.
    Learning-Based Radiomic Prediction of Type 2 Diabetes Mellitus Using Image-Derived Phenotypes. (arXiv:2209.10043v1 [cs.LG])
    Early diagnosis of Type 2 Diabetes Mellitus (T2DM) is crucial to enable timely therapeutic interventions and lifestyle modifications. As medical imaging data become more widely available for many patient populations, we sought to investigate whether image-derived phenotypic data could be leveraged in tabular learning classifier models to predict T2DM incidence without the use of invasive blood lab measurements. We show that both neural network and decision tree models that use image-derived phenotypes can predict patient T2DM status with recall scores as high as 87.6%. We also propose the novel use of these same architectures as 'SynthA1c encoders' that are able to output interpretable values mimicking blood hemoglobin A1C empirical lab measurements. Finally, we demonstrate that T2DM risk prediction model sensitivity to small perturbations in input vector components can be used to predict performance on covariates sampled from previously unseen patient populations.  ( 2 min )
    Deep-Steiner: Learning to Solve the Euclidean Steiner Tree Problem. (arXiv:2209.09983v1 [cs.LG])
    The Euclidean Steiner tree problem seeks the min-cost network to connect a collection of target locations, and it underlies many applications of wireless networks. In this paper, we present a study on solving the Euclidean Steiner tree problem using reinforcement learning enhanced by graph representation learning. Different from the commonly studied connectivity problems like travelling salesman problem or vehicle routing problem where the search space is finite, the Euclidean Steiner tree problem requires to search over the entire Euclidean space, thereby making the existing methods not applicable. In this paper, we design discretization methods by leveraging the unique characteristics of the Steiner tree, and propose new training schemes for handling the dynamic Steiner points emerging during the incremental construction. Our design is examined through a sanity check using experiments on a collection of datasets, with encouraging results demonstrating the utility of our method as an alternative to classic combinatorial methods.  ( 2 min )
    Differentiable Safe Controller Design through Control Barrier Functions. (arXiv:2209.10034v1 [eess.SY])
    Learning-based controllers, such as neural network (NN) controllers, can show high empirical performance but lack formal safety guarantees. To address this issue, control barrier functions (CBFs) have been applied as a safety filter to monitor and modify the outputs of learning-based controllers in order to guarantee the safety of the closed-loop system. However, such modification can be myopic with unpredictable long-term effects. In this work, we propose a safe-by-construction NN controller which employs differentiable CBF-based safety layers, and investigate the performance of safe-by-construction NN controllers in learning-based control. Specifically, two formulations of controllers are compared: one is projection-based and the other relies on our proposed set-theoretic parameterization. Both methods demonstrate improved closed-loop performance over using CBF as a separate safety filter in numerical experiments.  ( 2 min )
    Lamarckian Platform: Pushing the Boundaries of Evolutionary Reinforcement Learning towards Asynchronous Commercial Games. (arXiv:2209.10055v1 [cs.LG])
    Despite the emerging progress of integrating evolutionary computation into reinforcement learning, the absence of a high-performance platform endowing composability and massive parallelism causes non-trivial difficulties for research and applications related to asynchronous commercial games. Here we introduce Lamarckian - an open-source platform featuring support for evolutionary reinforcement learning scalable to distributed computing resources. To improve the training speed and data efficiency, Lamarckian adopts optimized communication methods and an asynchronous evolutionary reinforcement learning workflow. To meet the demand for an asynchronous interface by commercial games and various methods, Lamarckian tailors an asynchronous Markov Decision Process interface and designs an object-oriented software architecture with decoupled modules. In comparison with the state-of-the-art RLlib, we empirically demonstrate the unique advantages of Lamarckian on benchmark tests with up to 6000 CPU cores: i) both the sampling efficiency and training speed are doubled when running PPO on Google football game; ii) the training speed is 13 times faster when running PBT+PPO on Pong game. Moreover, we also present two use cases: i) how Lamarckian is applied to generating behavior-diverse game AI; ii) how Lamarckian is applied to game balancing tests for an asynchronous commercial game.  ( 2 min )
  • Open

    Partial Information Decomposition Reveals the Structure of Neural Representations. (arXiv:2209.10438v1 [cs.IT])
    In neural networks, task-relevant information is represented jointly by groups of neurons. However, the specific way in which the information is distributed among the individual neurons is not well understood: While parts of it may only be obtainable from specific single neurons, other parts are carried redundantly or synergistically by multiple neurons. We show how Partial Information Decomposition (PID), a recent extension of information theory, can disentangle these contributions. From this, we introduce the measure of "Representational Complexity", which quantifies the difficulty of accessing information spread across multiple neurons. We show how this complexity is directly computable for smaller layers. For larger layers, we propose subsampling and coarse-graining procedures and prove corresponding bounds on the latter. Empirically, for quantized deep neural networks solving the MNIST task, we observe that representational complexity decreases both through successive hidden layers and over training. Overall, we propose representational complexity as a principled and interpretable summary statistic for analyzing the structure of neural representations.
    Tab2vox: CNN-Based Multivariate Multilevel Demand Forecasting Framework by Tabular-To-Voxel Image Conversion. (arXiv:2209.10516v1 [stat.ML])
    Since demand is influenced by a wide variety of causes, it is necessary to decompose the explana-tory variables into different levels, extract their relationships effectively, and reflect them in the forecast. In particular, this contextual information can be very useful in demand forecasting with large demand volatility or intermittent demand patterns. Convolutional neural networks (CNNs) have been successfully used in many fields where important information in data is represented by images. CNNs are powerful because they accept samples as images and use adjacent voxel sets to integrate multi-dimensional important information and learn important features. On the other hand, although the demand-forecasting model has been improved, the input data is still limited in its tabular form and is not suitable for CNN modeling. In this study, we propose a Tab2vox neural architecture search (NAS) model as a method to convert a high-dimensional tabular sam-ple into a well-formed 3D voxel image and use it in a 3D CNN network. For each image repre-sentation, the 3D CNN forecasting model proposed from the Tab2vox framework showed supe-rior performance, compared to the existing time series and machine learning techniques using tabular data, and the latest image transformation studies.
    Estimating Potential Outcome Distributions with Collaborating Causal Networks. (arXiv:2110.01664v3 [stat.ML] UPDATED)
    Traditional causal inference approaches leverage observational study data to estimate the difference in observed and unobserved outcomes for a potential treatment, known as the Conditional Average Treatment Effect (CATE). However, CATE corresponds to the comparison on the first moment alone, and as such may be insufficient in reflecting the full picture of treatment effects. As an alternative, estimating the full potential outcome distributions could provide greater insights. However, existing methods for estimating treatment effect potential outcome distributions often impose restrictive or simplistic assumptions about these distributions. Here, we propose Collaborating Causal Networks (CCN), a novel methodology which goes beyond the estimation of CATE alone by learning the full potential outcome distributions. Estimation of outcome distributions via the CCN framework does not require restrictive assumptions of the underlying data generating process. Additionally, CCN facilitates estimation of the utility of each possible treatment and permits individual-specific variation through utility functions. CCN not only extends outcome estimation beyond traditional risk difference, but also enables a more comprehensive decision-making process through definition of flexible comparisons. Under assumptions commonly made in the causal literature, we show that CCN learns distributions that asymptotically capture the true potential outcome distributions. Furthermore, we propose an adjustment approach that is empirically effective in alleviating sample imbalance between treatment groups in observational data. Finally, we evaluate the performance of CCN in multiple synthetic and semi-synthetic experiments. We demonstrate that CCN learns improved distribution estimates compared to existing Bayesian and deep generative methods as well as improved decisions with respects to a variety of utility functions.
    Universum GANs: Improving GANs through contradictions. (arXiv:2106.09946v2 [cs.LG] UPDATED)
    Limited availability of labeled-data makes any supervised learning problem challenging. Alternative learning settings like semi-supervised and universum learning alleviate the dependency on labeled data, but still require a large amount of unlabeled data, which may be unavailable or expensive to acquire. GAN-based data generation methods have recently shown promise by generating synthetic samples to improve learning. However, most existing GAN based approaches either provide poor discriminator performance under limited labeled data settings; or results in low quality generated data. In this paper, we propose a Universum GAN game which provides improved discriminator accuracy under limited data settings, while generating high quality realistic data. We further propose an evolving discriminator loss which improves its convergence and generalization performance. We derive the theoretical guarantees and provide empirical results in support of our approach.
    Off-Policy Risk Assessment in Markov Decision Processes. (arXiv:2209.10444v1 [cs.LG])
    Addressing such diverse ends as safety alignment with human preferences, and the efficiency of learning, a growing line of reinforcement learning research focuses on risk functionals that depend on the entire distribution of returns. Recent work on \emph{off-policy risk assessment} (OPRA) for contextual bandits introduced consistent estimators for the target policy's CDF of returns along with finite sample guarantees that extend to (and hold simultaneously over) all risk. In this paper, we lift OPRA to Markov decision processes (MDPs), where importance sampling (IS) CDF estimators suffer high variance on longer trajectories due to small effective sample size. To mitigate these problems, we incorporate model-based estimation to develop the first doubly robust (DR) estimator for the CDF of returns in MDPs. This estimator enjoys significantly less variance and, when the model is well specified, achieves the Cramer-Rao variance lower bound. Moreover, for many risk functionals, the downstream estimates enjoy both lower bias and lower variance. Additionally, we derive the first minimax lower bounds for off-policy CDF and risk estimation, which match our error bounds up to a constant factor. Finally, we demonstrate the precision of our DR CDF estimates experimentally on several different environments.
    "Calibeating": Beating Forecasters at Their Own Game. (arXiv:2209.04892v1 [econ.TH] CROSS LISTED)
    In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one can gain calibration without losing expertise, which we refer to as "calibeating." We provide an easy way to calibeat any forecast, by a deterministic online procedure. We moreover show that calibeating can be achieved by a stochastic procedure that is itself calibrated, and then extend the results to simultaneously calibeating multiple procedures, and to deterministic procedures that are continuously calibrated.
    Scheduling Jobs with Stochastic Holding Costs. (arXiv:2105.13655v3 [cs.LG] UPDATED)
    We study a single-server scheduling problem for the objective of minimizing the expected cumulative holding cost incurred by jobs, where parameters defining stochastic job holding costs are unknown to the scheduler. We consider a general setting allowing for different job classes, where jobs of the same class have statistically identical holding costs and service times, with an arbitrary number of jobs across classes. In each time step, the server can process a job and observes random holding costs of the jobs that are yet to be completed. We consider a learning-based $c\mu$ rule scheduling which starts with a preemption period of fixed duration, serving as a learning phase, and having gathered data about jobs, it switches to nonpreemptive scheduling. Our algorithms are designed to handle instances with large and small gaps in mean job holding costs and achieve near-optimal performance guarantees. The performance of algorithms is evaluated by regret, where the benchmark is the minimum possible total holding cost attained by the $c\mu$ rule scheduling policy when the parameters of jobs are known. We show regret lower bounds and algorithms that achieve nearly matching regret upper bounds. Our numerical results demonstrate the efficacy of our algorithms and show that our regret analysis is nearly tight.
    Amortized Projection Optimization for Sliced Wasserstein Generative Models. (arXiv:2203.13417v2 [stat.ML] UPDATED)
    Seeking informative projecting directions has been an important task in utilizing sliced Wasserstein distance in applications. However, finding these directions usually requires an iterative optimization procedure over the space of projecting directions, which is computationally expensive. Moreover, the computational issue is even more severe in deep learning applications, where computing the distance between two mini-batch probability measures is repeated several times. This nested loop has been one of the main challenges that prevent the usage of sliced Wasserstein distances based on good projections in practice. To address this challenge, we propose to utilize the learning-to-optimize technique or amortized optimization to predict the informative direction of any given two mini-batch probability measures. To the best of our knowledge, this is the first work that bridges amortized optimization and sliced Wasserstein generative models. In particular, we derive linear amortized models, generalized linear amortized models, and non-linear amortized models which are corresponding to three types of novel mini-batch losses, named amortized sliced Wasserstein. We demonstrate the favorable performance of the proposed sliced losses in deep generative modeling on standard benchmark datasets.
    Generative Modelling With Inverse Heat Dissipation. (arXiv:2206.13397v3 [cs.CV] UPDATED)
    While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the desirability of coarse-to-fine modelling, we propose a new model that generates images through iteratively inverting the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. We interpret the solution of the forward heat equation as a variational approximation in a diffusion-like latent variable model. We point out emergent qualitative properties not seen in diffusion models, such as disentanglement of overall colour and shape in images and aspects of neural network interpretability. Spectral analysis on natural images elucidates connections to diffusion models and reveals implicit inductive biases in them.
    SC2EGSet: StarCraft II Esport Replay and Game-state Dataset. (arXiv:2207.03428v2 [cs.LG] UPDATED)
    As a relatively new form of sport, esports offers unparalleled data availability. Despite the vast amounts of data that are generated by game engines, it can be challenging to extract them and verify their integrity for the purposes of practical and scientific use. Our work aims to open esports to a broader scientific community by supplying raw and pre-processed files from StarCraft II esports tournaments. These files can be used in statistical and machine learning modeling tasks and related to various laboratory-based measurements (e.g., behavioral tests, brain imaging). We have gathered publicly available game-engine generated "replays" of tournament matches and performed data extraction and cleanup using a low-level application programming interface (API) parser library. Additionally, we open-sourced and published all the custom tools that were developed in the process of creating our dataset. These tools include PyTorch and PyTorch Lightning API abstractions to load and model the data. Our dataset contains replays from major and premiere StarCraft II tournaments since 2016. To prepare the dataset, we processed 55 tournament "replaypacks" that contained 17930 files with game-state information. Based on initial investigation of available StarCraft II datasets, we observed that our dataset is the largest publicly available source of StarCraft II esports data upon its publication. Analysis of the extracted data holds promise for further Artificial Intelligence (AI), Machine Learning (ML), psychological, Human-Computer Interaction (HCI), and sports-related studies in a variety of supervised and self-supervised tasks.
    Calibrated Optimal Decision Making with Multiple Data Sources and Limited Outcome. (arXiv:2104.10554v4 [stat.ME] UPDATED)
    We consider the optimal decision-making problem in a primary sample of interest with multiple auxiliary sources available. The outcome of interest is limited in the sense that it is only observed in the primary sample. In reality, such multiple data sources may belong to heterogeneous studies and thus cannot be combined directly. This paper proposes a new framework to handle heterogeneous samples and address the limited outcome simultaneously through a novel calibrated optimal decision-making method, by leveraging the common intermediate outcomes in multiple data sources. Specifically, our method allows the baseline covariates across different samples to have either homogeneous or heterogeneous distributions. Under the equal conditional means of intermediate outcomes in different samples given baseline covariates and the treatment information, we show that the proposed estimator of the conditional mean outcome is asymptotically normal and more efficient than using the primary sample solely. Extensive experiments on simulated datasets demonstrate empirical validity and improved efficiency using our approach, followed by a real application to electronic health records.
    Deep Double Descent via Smooth Interpolation. (arXiv:2209.10080v1 [cs.LG])
    Overparameterized deep networks are known to be able to perfectly fit the training data while at the same time showing good generalization performance. A common paradigm drawn from intuition on linear regression suggests that large networks are able to interpolate even noisy data, without considerably deviating from the ground-truth signal. At present, a precise characterization of this phenomenon is missing. In this work, we present an empirical study of sharpness of the loss landscape of deep networks as we systematically control the number of model parameters and training epochs. We extend our study to neighbourhoods of the training data, as well as around cleanly- and noisily-labelled samples. Our findings show that the loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large models express a smooth and flat loss landscape, in contrast with existing intuition.
    Large-Sample Properties of Non-Stationary Source Separation for Gaussian Signals. (arXiv:2209.10176v1 [math.ST])
    Non-stationary source separation is a well-established branch of blind source separation with many different methods. However, for none of these methods large-sample results are available. To bridge this gap, we develop large-sample theory for NSS-JD, a popular method of non-stationary source separation based on the joint diagonalization of block-wise covariance matrices. We work under an instantaneous linear mixing model for independent Gaussian non-stationary source signals together with a very general set of assumptions: besides boundedness conditions, the only assumptions we make are that the sources exhibit finite dependency and that their variance functions differ sufficiently to be asymptotically separable. The consistency of the unmixing estimator and its convergence to a limiting Gaussian distribution at the standard square root rate are shown to hold under the previous conditions. Simulation experiments are used to verify the theoretical results and to study the impact of block length on the separation.
    Transition to Adulthood for Young People with Intellectual or Developmental Disabilities: Emotion Detection and Topic Modeling. (arXiv:2209.10477v1 [cs.CL])
    Transition to Adulthood is an essential life stage for many families. The prior research has shown that young people with intellectual or development disabil-ities (IDD) have more challenges than their peers. This study is to explore how to use natural language processing (NLP) methods, especially unsupervised machine learning, to assist psychologists to analyze emotions and sentiments and to use topic modeling to identify common issues and challenges that young people with IDD and their families have. Additionally, the results were compared to those obtained from young people without IDD who were in tran-sition to adulthood. The findings showed that NLP methods can be very useful for psychologists to analyze emotions, conduct cross-case analysis, and sum-marize key topics from conversational data. Our Python code is available at https://github.com/mlaricheva/emotion_topic_modeling.
    DTR Bandit: Learning to Make Response-Adaptive Decisions With Low Regret. (arXiv:2005.02791v3 [stat.ML] UPDATED)
    Dynamic treatment regimes (DTRs) are personalized, adaptive, multi-stage treatment plans that adapt treatment decisions both to an individual's initial features and to intermediate outcomes and features at each subsequent stage, which are affected by decisions in prior stages. Examples include personalized first- and second-line treatments of chronic conditions like diabetes, cancer, and depression, which adapt to patient response to first-line treatment, disease progression, and individual characteristics. While existing literature mostly focuses on estimating the optimal DTR from offline data such as from sequentially randomized trials, we study the problem of developing the optimal DTR in an online manner, where the interaction with each individual affect both our cumulative reward and our data collection for future learning. We term this the DTR bandit problem. We propose a novel algorithm that, by carefully balancing exploration and exploitation, is guaranteed to achieve rate-optimal regret when the transition and reward models are linear. We demonstrate our algorithm and its benefits both in synthetic experiments and in a case study of adaptive treatment of major depressive disorder using real-world data.
    Chaotic Hedging with Iterated Integrals and Neural Networks. (arXiv:2209.10166v1 [q-fin.MF])
    In this paper, we extend the Wiener-Ito chaos decomposition to the class of diffusion processes, whose drift and diffusion coefficient are of linear growth. By omitting the orthogonality in the chaos expansion, we are able to show that every $p$-integrable functional, for $p \in [1,\infty)$, can be represented as sum of iterated integrals of the underlying process. Using a truncated sum of this expansion and (possibly random) neural networks for the integrands, whose parameters are learned in a machine learning setting, we show that every financial derivative can be approximated arbitrarily well in the $L^p$-sense. Moreover, the hedging strategy of the approximating financial derivative can be computed in closed form.
    Instance-dependent uniform tail bounds for empirical processes. (arXiv:2209.10053v1 [math.PR])
    We formulate a uniform tail bound for empirical processes indexed by a class of functions, in terms of the individual deviations of the functions rather than the worst-case deviation in the considered class. The tail bound is established by introducing an initial "deflation" step to the standard generic chaining argument. The resulting tail bound has a main complexity component, a variant of Talagrand's $\gamma$ functional for the deflated function class, as well as an instance-dependent deviation term, measured by an appropriately scaled version of a suitable norm. Both of these terms are expressed using certain coefficients formulated based on the relevant cumulant generating functions. We also provide more explicit approximations for the mentioned coefficients, when the function class lies in a given (exponential type) Orlicz space.
    Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems. (arXiv:2209.10058v1 [cs.LG])
    Deep learning systems have been reported to achieve state-of-the-art performances in many applications, and a key is the existence of well trained classifiers on benchmark datasets. As a main-stream loss function, the cross entropy can easily lead us to find models which demonstrate severe overfitting behavior. In this paper, we show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy (CE) of the underlying data distribution of the dataset. However, the CE learned in this way does not characterize well the information shared by the label and the input. In this paper, we propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input. Theoretically, we give the population classification error lower bound in terms of the mutual information. In addition, we derive the mutual information lower and upper bounds for a concrete binary classification data model in $\mathbb{R}^n$, and also the error probability lower bound in this scenario. Empirically, we conduct extensive experiments on several benchmark datasets to support our theory. The mutual information learned classifiers (MILCs) achieve far better generalization performances than the conditional entropy learned classifiers (CELCs) with an improvement which can exceed more than 10\% in testing accuracy.
    Revisiting Sliced Wasserstein on Images: From Vectorization to Convolution. (arXiv:2204.01188v2 [cs.CV] UPDATED)
    The conventional sliced Wasserstein is defined between two probability measures that have realizations as vectors. When comparing two probability measures over images, practitioners first need to vectorize images and then project them to one-dimensional space by using matrix multiplication between the sample matrix and the projection matrix. After that, the sliced Wasserstein is evaluated by averaging the two corresponding one-dimensional projected probability measures. However, this approach has two limitations. The first limitation is that the spatial structure of images is not captured efficiently by the vectorization step; therefore, the later slicing process becomes harder to gather the discrepancy information. The second limitation is memory inefficiency since each slicing direction is a vector that has the same dimension as the images. To address these limitations, we propose novel slicing methods for sliced Wasserstein between probability measures over images that are based on the convolution operators. We derive convolution sliced Wasserstein (CSW) and its variants via incorporating stride, dilation, and non-linear activation function into the convolution operators. We investigate the metricity of CSW as well as its sample complexity, its computational complexity, and its connection to conventional sliced Wasserstein distances. Finally, we demonstrate the favorable performance of CSW over the conventional sliced Wasserstein in comparing probability measures over images and in training deep generative modeling on images.
    Improved Marginal Unbiased Score Expansion (MUSE) via Implicit Differentiation. (arXiv:2209.10512v1 [stat.ML])
    We apply the technique of implicit differentiation to boost performance, reduce numerical error, and remove required user-tuning in the Marginal Unbiased Score Expansion (MUSE) algorithm for hierarchical Bayesian inference. We demonstrate these improvements on three representative inference problems: 1) an extended Neal's funnel 2) Bayesian neural networks, and 3) probabilistic principal component analysis. On our particular test cases, MUSE with implicit differentiation is faster than Hamiltonian Monte Carlo by factors of 155, 397, and 5, respectively, or factors of 65, 278, and 1 without implicit differentiation, and yields good approximate marginal posteriors. The Julia and Python MUSE packages have been updated to use implicit differentiation, and can solve problems defined by hand or with any of a number of popular probabilistic programming languages and automatic differentiation backends.
    Learning Acceptance Regions for Many Classes with Anomaly Detection. (arXiv:2209.09963v1 [stat.ML])
    Set-valued classification, a new classification paradigm that aims to identify all the plausible classes that an observation belongs to, can be obtained by learning the acceptance regions for all classes. Many existing set-valued classification methods do not consider the possibility that a new class that never appeared in the training data appears in the test data. Moreover, they are computationally expensive when the number of classes is large. We propose a Generalized Prediction Set (GPS) approach to estimate the acceptance regions while considering the possibility of a new class in the test data. The proposed classifier minimizes the expected size of the prediction set while guaranteeing that the class-specific accuracy is at least a pre-specified value. Unlike previous methods, the proposed method achieves a good balance between accuracy, efficiency, and anomaly detection rate. Moreover, our method can be applied in parallel to all the classes to alleviate the computational burden. Both theoretical analysis and numerical experiments are conducted to illustrate the effectiveness of the proposed method.
    Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance. (arXiv:2202.12387v4 [cs.LG] UPDATED)
    In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors. We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. From the optimization perspective, we explain why existing methods such as SimCLR require a large batch size in order to achieve a satisfactory result. In order to remove such requirement, we propose a memory-efficient Stochastic Optimization algorithm for solving the Global objective of Contrastive Learning of Representations, named SogCLR. We show that its optimization error is negligible under a reasonable condition after a sufficient number of iterations or is diminishing for a slightly different global contrastive objective. Empirically, we demonstrate that SogCLR with small batch size (e.g., 256) can achieve similar performance as SimCLR with large batch size (e.g., 8192) on self-supervised learning task on ImageNet-1K. We also attempt to show that the proposed optimization technique is generic and can be applied to solving other contrastive losses, e.g., two-way contrastive losses for bimodal contrastive learning. The proposed method is implemented in our open-sourced library LibAUC (www.libauc.org).
    Data Augmentation as Feature Manipulation. (arXiv:2203.01572v2 [cs.LG] UPDATED)
    Data augmentation is a cornerstone of the machine learning pipeline, yet its theoretical underpinnings remain unclear. Is it merely a way to artificially augment the data set size? Or is it about encouraging the model to satisfy certain invariance? In this work we consider another angle, and we study the effect of data augmentation on the dynamic of the learning process. We find that data augmentation can alter the relative importance of various features, effectively making certain informative but hard to learn features more likely to be captured in the learning process. Importantly, we show that this effect is more pronounced for non-linear models, such as neural networks. Our main contribution is a detailed analysis of data augmentation on the learning dynamic for a two layer convolutional neural network in the recently proposed multi-view data model by Allen-Zhu and Li [2020]. We complement this analysis with further experimental evidence that data augmentation can be viewed as feature manipulation.
    Distributed Online Non-convex Optimization with Composite Regret. (arXiv:2209.10105v1 [cs.LG])
    Regret has been widely adopted as the metric of choice for evaluating the performance of online optimization algorithms for distributed, multi-agent systems. However, data/model variations associated with agents can significantly impact decisions and requires consensus among agents. Moreover, most existing works have focused on developing approaches for (either strongly or non-strongly) convex losses, and very few results have been obtained regarding regret bounds in distributed online optimization for general non-convex losses. To address these two issues, we propose a novel composite regret with a new network regret-based metric to evaluate distributed online optimization algorithms. We concretely define static and dynamic forms of the composite regret. By leveraging the dynamic form of our composite regret, we develop a consensus-based online normalized gradient (CONGD) approach for pseudo-convex losses, and it provably shows a sublinear behavior relating to a regularity term for the path variation of the optimizer. For general non-convex losses, we first shed light on the regret for the setting of distributed online non-convex learning based on recent advances such that no deterministic algorithm can achieve the sublinear regret. We then develop the distributed online non-convex optimization with composite regret (DINOCO) without access to the gradients, depending on an offline optimization oracle. DINOCO is shown to achieve sublinear regret; to our knowledge, this is the first regret bound for general distributed online non-convex learning.  ( 3 min )
    Projected Gradient Descent Algorithms for Solving Nonlinear Inverse Problems with Generative Priors. (arXiv:2209.10093v1 [stat.ML])
    In this paper, we propose projected gradient descent (PGD) algorithms for signal estimation from noisy nonlinear measurements. We assume that the unknown $p$-dimensional signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. In particular, we consider two cases when the nonlinear link function is either unknown or known. For unknown nonlinearity, similarly to \cite{liu2020generalized}, we make the assumption of sub-Gaussian observations and propose a linear least-squares estimator. We show that when there is no representation error and the sensing vectors are Gaussian, roughly $O(k \log L)$ samples suffice to ensure that a PGD algorithm converges linearly to a point achieving the optimal statistical rate using arbitrary initialization. For known nonlinearity, we assume monotonicity as in \cite{yang2016sparse}, and make much weaker assumptions on the sensing vectors and allow for representation error. We propose a nonlinear least-squares estimator that is guaranteed to enjoy an optimal statistical rate. A corresponding PGD algorithm is provided and is shown to also converge linearly to the estimator using arbitrary initialization. In addition, we present experimental results on image datasets to demonstrate the performance of our PGD algorithms.  ( 2 min )
    Variational Inference for Infinitely Deep Neural Networks. (arXiv:2209.10091v1 [cs.LG])
    We introduce the unbounded depth neural network (UDN), an infinitely deep probabilistic model that adapts its complexity to the training data. The UDN contains an infinite sequence of hidden layers and places an unbounded prior on a truncation L, the layer from which it produces its data. Given a dataset of observations, the posterior UDN provides a conditional distribution of both the parameters of the infinite neural network and its truncation. We develop a novel variational inference algorithm to approximate this posterior, optimizing a distribution of the neural network weights and of the truncation depth L, and without any upper limit on L. To this end, the variational family has a special structure: it models neural network weights of arbitrary depth, and it dynamically creates or removes free variational parameters as its distribution of the truncation is optimized. (Unlike heuristic approaches to model search, it is solely through gradient-based optimization that this algorithm explores the space of truncations.) We study the UDN on real and synthetic data. We find that the UDN adapts its posterior depth to the dataset complexity; it outperforms standard neural networks of similar computational complexity; and it outperforms other approaches to infinite-depth neural networks.  ( 2 min )
    Off-Policy Evaluation for Episodic Partially Observable Markov Decision Processes under Non-Parametric Models. (arXiv:2209.10064v1 [stat.ML])
    We study the problem of off-policy evaluation (OPE) for episodic Partially Observable Markov Decision Processes (POMDPs) with continuous states. Motivated by the recently proposed proximal causal inference framework, we develop a non-parametric identification result for estimating the policy value via a sequence of so-called V-bridge functions with the help of time-dependent proxy variables. We then develop a fitted-Q-evaluation-type algorithm to estimate V-bridge functions recursively, where a non-parametric instrumental variable (NPIV) problem is solved at each step. By analyzing this challenging sequential NPIV problem, we establish the finite-sample error bounds for estimating the V-bridge functions and accordingly that for evaluating the policy value, in terms of the sample size, length of horizon and so-called (local) measure of ill-posedness at each step. To the best of our knowledge, this is the first finite-sample error bound for OPE in POMDPs under non-parametric models.  ( 2 min )

  • Open

    I made a Stable Diffusion Space animation every day for a week, the results are beautiful!
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    New NLP algorithms
    I took some months off with AI... Is there any NLP developed technology newer than BERT, GPT, ...? If so, is there any paper I can read? Thanks in advance submitted by /u/MomSaidICan [link] [comments]  ( 92 min )
    If Meat Eaters Acted Like Vegans (edited)
    submitted by /u/FinneanCosgra [link] [comments]  ( 93 min )
    NEW Dreamstudio's Outpainting! Is Stable Diffusion Better Than DALL-E?
    submitted by /u/PuppetHere [link] [comments]  ( 87 min )
    Neuralink Update – September 2022
    submitted by /u/1024cities [link] [comments]  ( 92 min )
    Looking for an AI’s name.
    I don’t if I should post this here or in R/Sci-fi or even R/Horror , only time will tell. But I’m looking for the name of this concept which says that we will create an AI and those that didn’t help or were against it’s creation will suffer in a simulation , and the AI will be able to know by scanning our brain. Thanks. submitted by /u/Postbreak_KQM [link] [comments]  ( 87 min )
    AI Dream 91 - Gandalf dives into new Dimension of AI
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    promptoMANIA:: AI art prompt generator🛠️
    submitted by /u/widgia [link] [comments]  ( 87 min )
    How can AI empower your LMS?
    Enhancing software with AI usually helps businesses generate revenue or improve services by automating routine tasks and speeding up the analysis of large amounts of data. When applied in an LMS, AI capabilities can improve system performance and make your product more competitive. For example, you can achieve the following benefits from an AI-based LMS: 1. Automate routine administrative tasks. Managing an LMS is time-consuming because of scheduling lessons, monitoring users, processing requests for technical support, and other monotonous and repetitive processes. To handle these tasks, you can implement AI modules that are capable of: creating personalized complex curricula helping users resolve common issues tagging and categorizing learning content generating personalized report…  ( 89 min )
    Converting YOLO V7 to Tensorflow Lite for Mobile Deployment
    This blog explains step by step method to convert YOLO V7 PyTorch model to TensorFlow lite. https://vikasojha894.medium.com/converting-yolo-v7-to-tensorflow-lite-for-mobile-deployment-ebc1103e8d1e submitted by /u/VikasOjha666 [link] [comments]  ( 87 min )
    Real-Time Evolution Simulation using genetic algorithms and neural networks.
    GitHub project: https://github.com/theopfr/neuro-evolution-simulation Hey, I worked on a evolution / natural-selection simulation lately. It simulates 2D environment in which "organisms" can move around and learn how to survive by passing down their genes (size, diet, sight-reach, speed) and their brain (a simple neural network). Over time the smartest organisms have the highest chance to survive and mate with another organism to produce a child which has chance to be even smarter than its parents through gene crossover and gene/brain mutation. I thought I'd share it, so if enough people are interested I might continue to add predation :) submitted by /u/39IHH8347 [link] [comments]  ( 87 min )
    Generative AI: A Creative New World [interested to hear peoples thoughts on this article]
    submitted by /u/anax4096 [link] [comments]  ( 87 min )
    Learn how to build a website for image generation
    submitted by /u/limapedro [link] [comments]  ( 87 min )
    Confusion about Model to USE for Data Mapping
    Hello 👋 all I am learning to build model. I have question related to data mapping. E.g., Data A have correct column name in CSV i would get CSV from multiple sources with different column name is there a way to build a model which automatically identify correct column name based on type of data stored? Please guide me Thanks in advance submitted by /u/PrizeInteresting8672 [link] [comments]  ( 87 min )
    If you are curious about building fast NLP prototypes, come join me for co:lab friday meetup this Friday at 12:00 pm ET 🤗
    Hi! 🤗 If you are curious about building fast NLP prototypes, come join me for a chat this Friday at 12:00 pm ET! I am hosting a meetup called co:lab friday for Cohere API developer community with teams that scored top 3 places at Cohere community hackathon hosted in September. Tip of my Tongue, Learn visually and IntelliChat will showcase their demos, share tips on creating a successful demo project and answer any questions about it. Join us to see their demos and exchange notes on building prototypes with large language models 🔥https://discord.com/events/954421988141711382/1017743889551077436 ​ https://preview.redd.it/pp4d0u7pb7p91.png?width=960&format=png&auto=webp&s=bfb9ebac2158235b23453ae6c9679add802b5772 submitted by /u/techn0_cratic [link] [comments]  ( 87 min )
    What if the biggest threat of pre-human intelligence AI is the meta systems we already have?
    If you look at our society and civilisation there are meta systems that can dominate and control whole aspects of our world and sometimes, they are not aligned with our long-term survival. Even current level AI's can have a massively disruptive impact to our society and civilisation and when they work for or with broken and outdated meta systems, they could compound the impact of already damaging meta systems. What meta systems do you think AI could impact the most and what are the negative long term impacts it might have? Or more simply can we ensure that AIs are only used for good, and I mean the long term good of humanity? Have we ever considered a body or meta system that would work for the long term good of humanity? submitted by /u/Arowx [link] [comments]  ( 88 min )
    To research how people bond with artificial creatures, we sent couch surfing robotic artifacts into the wild
    submitted by /u/pppeer [link] [comments]  ( 87 min )
    Using GPT-3 to solve our love lives...
    submitted by /u/ChickenTaxi43 [link] [comments]  ( 86 min )
    Are there any image-generating AIs that only use opt-in training data?
    I don't like the idea of artists' work being used in AI training data without their consent, so I was wondering if there are any AIs that have used some kind of opt-in database for training? I don't know much about the specifics of how each of the popular ones was made or anything. I was thinking of trying some things out for a hobby project I'm working on, but if they're all just scraping stuff nonconsensually then I'll just skip it and stick with my own lame art. Thanks! submitted by /u/Opus_723 [link] [comments]  ( 87 min )
    Fist Of Confusion - By RawChaa (App used: Wonder - A.I Generator None Dialogue Short Manga) Part One
    submitted by /u/Rawchaa [link] [comments]  ( 104 min )
    Book name question
    This book is about the development of science and engineering technology in the same way as living things. ​ I see this book related to a thousand brain ideas, but I can't remember the title, so I'll ask. ​ Thank you for advance submitted by /u/Plus-Ad1156 [link] [comments]  ( 87 min )
    Is there an AI that iterates on MIDI input?
    In my research, I have found many neural networks capable of synthesizing music and sharing it as MIDI files, such as AIVA and SOUNDRAW, but are there any capable of "improving" MIDI pieces uploaded by a user? submitted by /u/ChoiceWrld [link] [comments]  ( 87 min )
  • Open

    Off policy learning and evaluation
    Can anyone suggest me some good resources about Off policy learning (OPL) Off policy evaluation (OPE) submitted by /u/rlopes404 [link] [comments]  ( 87 min )
    which is (will be) more important Single-agent VS Multi-agent RL ?
    Hi guys, this is a very subjective question but here we go, which field do you think will be more important for the future of science, SARL or MARL? I know that the two fields grow in parallel way for the most part, especially as MARL been inheriting from SARL lately but I'm curious what you think? submitted by /u/souhaielbensalem [link] [comments]  ( 88 min )
    [MARL] I am looking for source material and advice for a review paper that I have to write for my master degree
    Hey, ​ I am supposed to write a review paper about recent publications in the field of multi-agent RL. However, I am unsure where I should start and how I am supposed to be able to determine the scientific relevance of publications. I thought that I should start by reading review papers and then look into publications that have been released after the review paper and incorporate them. ​ The paper is supposed to be in a common format for review papers. The length is supposed to be between 15 and 20 pages. ​ I would be very grateful for any advice or source material. submitted by /u/atropos-morta [link] [comments]  ( 87 min )
  • Open

    [D] Viterbi or beam search should NOT be used for many/most CTC inference problems
    Unless there are some constraints on the output sequence (like a dictionary) or transition probabilities (like a very simple p(x_{t+1}|x_t) language model), the path that maximizes the probability over the logits is the path that goes through the symbol with the max logit for each time step. So Viterbi and beam search (unless you are very unlucky) will just return argmax(logits, axis=1), where logits has shape (input_time_steps, num symbols). Yet, I see lectures encouraging students to use Viterbi or beam search for inference. Is this logic correct? Thank you in advance. submitted by /u/markpwoodward [link] [comments]  ( 89 min )
    [D] Generating set of synthetic training data based on given characteristics?
    I have a database that defines the characteristics of different radar systems. For example, it will provide that frequency of system X is between 500-600 kHz, and to make no assumption about the underlying distribution of these features (just as likely that the radar could operate at 500 or 600 kHz, or anywhere in between). There are also categorical features for these radar. Since I have no training data, should I generate a training data set using the defined characteristics? Or, is this the wrong approach? I see people do this with image training. Furthermore, what model would be optimal for this task? There are a lot of overlaps between radar features and not many features (only about 10 of mixed types) that define each radar. Thanks! submitted by /u/Old-Box228 [link] [comments]  ( 89 min )
    [D] Local Development VS Cloud Development
    Hi all! I was having a discussion with some fellow data scientists working in Tech companies suggesting that all of their code is developed locally and then use platforms (e.g. Databricks) to train their models leveraging the power of the cluster. There were many points discussed (e.g. maturity of the company, underlying OS of the local machine etc.) but I would like to focus on the sentence that “Tech companies develop their code locally”. Could I please ask why local development would be preferred (or not) over cloud-based development? And if the development is happening locally, how does that happen? For instance, is a subset of data also stored locally for EDA and development? I would love to hear your thoughts! submitted by /u/atawua [link] [comments]  ( 89 min )
    [P] Add custom entries to the JupyterLab launcher with jupyter_app_launcher
    Hi all, I want to present my new JupyterLab extension jupyter_app_launcher (https://github.com/trungleduc/jupyter_app_launcher). It is used to customize the JupyterLab launcher with a simple YAML file. Demo Users can add custom entries to the launcher to: Open a predefined notebook or markdown file. Render a notebook in dashboard mode Open a notebook with Voila Local/remote services like Plotly Dash or Streamlit A live demo is available at https://mybinder.org/v2/gh/trungleduc/jupyter_app_launcher/main?urlpath=lab Documentation: https://jupyter-app-launcher.readthedocs.io/ submitted by /u/dtle278 [link] [comments]  ( 105 min )
    What machine Learning model i should use for? [R] [P]
    Hi, I have a project that has to make an AI using for real-time inspection hand soldering, like it can recognise the action done by the operator whether is safe or dangerous. Like when the operator pointing the solder nib toward the hand and didn't put the soldering iron into the soldering iron holder it will recognize as a dangerous move. Also, is can recognise the object and the hand action at the same time. Have to use 3 different machine learning models and compare each machine learning model with the accuracy and etc. which 3 machine learning should I use and how to do it? https://preview.redd.it/j02hgp1ed9p91.jpg?width=2360&format=pjpg&auto=webp&s=c3745984e8c7195319274179ff12253a4527510c ​ the diagram attach is the setup to capture train data and test platform. I will appreciate u guys ans, thank you very much! submitted by /u/Sad_Custard4968 [link] [comments]  ( 90 min )
    [D] Automated document cropping
    Hi All, I’m presently working on an image processing tool where I need to crop a part of the image that has thick black boundaries around it. It’d be great if anyone can help me with some lead or possible solution. So far I’ve tried the following: 1. Building a custom model to detect the part of the image having black borders 2. bordercrop library In both of them I got little to no success Note: the image is of a document having huge white spacing on top and bottom. In the middle there are some content written and this is surrounded by a black thick rectangle. submitted by /u/Soyabean__ [link] [comments]  ( 90 min )
    [N] OpenAI's Whisper released
    OpenAI just released it's newest ASR(/translation) model openai/whisper (github.com) submitted by /u/SleekEagle [link] [comments]  ( 88 min )
    Online courses for Data Engineering and ML Engineering topics? [D]
    Hi All, I’m currently the only Data scientist at a small nonprofit. I have a Masters in Mathematics and statistics so I feel as though I have a good base when it comes to understanding ML models. Currently the role requires me to try and be a “full stack” data scientist which essentially means a data engineer and ML engineer which I lack experience. The company knows this and are happy to pay for any additional learning to help benefit us both. I’m comfortable in Python for data cleaning and modelling but I’d like some online courses to help with the data engineering (data pipelines and cloud database management) and the ML engineering (deploying ML models). Any ideas of online courses? submitted by /u/Flat_Ad1835 [link] [comments]  ( 107 min )
    [D] Good tools to draw fancy diagrams
    Hey guys, I'm finishing up a paper to submit to ICLR next week and I was wondering what tools y'all use to draw fancy diagrams. What I am trying to draw is an illustration of a gradient-based reinforcement learning method - particularly, the movement of an agent through a black box search space via gradients. I used MS Paint before but that isn't fancy/pretty enough. Any suggestions are greatly appreciated! submitted by /u/billjames1685 [link] [comments]  ( 89 min )
    [N] Bitfount open beta platform for federated AI/ML and privacy-preserving techniques
    Bitfount has published several federated learning and privacy-preserving technique tutorials to teach everyone how to unlock the value of sensitive data without putting privacy at risk. Bitfount is a distributed data science platform enabling data collaboration via federated, privacy-preserving data analysis and AI/ML such that the world’s intractable data can become safely interactable. Check out the platform here: https://docs.bitfount.com/ submitted by /u/lolokauf [link] [comments]  ( 105 min )
    [R] Anybody here going to IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI-BSN) in Greece and would like to connect?
    View Poll submitted by /u/jokertrickington [link] [comments]  ( 88 min )
    [D] PyTorch on Apple Silicone
    Hi there! I'm facing an apple silicone issue, I have an m1 macbook air and a new learner of PyTorch. I'm building a simple Linear Regression model right now for learning purposes. After sending the model and the data to the GPU (mps), when running the training loop , I'm getting an error (NotImplelemntedError) and it prompted me to use a flag (PYTORCH_ENABLE_MPS_FALLBACK=1) which I set before activating my environment. However, the error is still there even after I set the flag. I was wondering how you this problem can be solved maybe I didn't know how to set the flag properly or there's something else I'm missing. Thank you in advance! submitted by /u/haidaryy [link] [comments]  ( 107 min )
    [D] What open source project would be a good start to make statistically signifiant decisions based on fuzzy weighted datapoints?
    Ok, maybe my question is too broad. But I have this idea of rating stuff based on content (could be text, could be images) by weighing different things. Like "These kind of texts and these kind of images, I give 10/10. What would you give these kind of texts and these kind of images?" and get an output. A first step would be to manually answer tons of questions and then give the machine the "result". And do that over and over again. Then ask the machine what result the answers should get, letting the machine learn how the different combinations affect the result (by weighing them itself). I'm wondering if there is any open source project out there that I can deploy without too much hassle, that would at least be a good start for my project. submitted by /u/MrOaiki [link] [comments]  ( 107 min )
    [P] My co-founder and I quit our engineering jobs at AWS to build “Tensor Search”. Here is why.
    My co-founder and I, a senior Amazon research scientist and AWS SDE respectively, launched Marqo a little over a week ago - a "tensor search" engine https://github.com/marqo-ai/marqo Another project doing doing semantic search/dense retrieval. Why?? Semantic search using vectors does an amazing job when we look at sentences, or short paragraphs. Vectors also do well as an implementation for image search. Unfortunately, vector representations for video, long documents and other more complex data types perform poorly. The reason isn't really to do with embeddings themselves not being good enough. If you asked a human to find the most relevant document to some search query given a list of long documents, an important question comes to mind - do we want the document that on average is most …  ( 116 min )
    [P] Stable Diffusion finetuned on Pokemon!
    ​ Girl with a pearl earring, Cute Obama creature, Donald Trump, Boris Johnson, Totoro, Hello Kitty Online demo: https://replicate.com/lambdal/text-to-pokemon Code and details: https://github.com/LambdaLabsML/examples/tree/main/stable-diffusion-finetuning submitted by /u/JClub [link] [comments]  ( 89 min )
    [R] NUWA-Infinity, the first paper working on infinite visual synthesis!
    Edit: Received a PM, this is NOT the first paper working on this problem, ALIS (https://github.com/universome/alis) and InfinityGAN (https://hubert0527.github.io/infinityGAN/) are the real first work on it. It seems lots of these Chinese papers are overclaiming and intentionally misleading the readers... I am really sorry about the wrong title... ​ Paper: https://arxiv.org/abs/2207.09814 This is so cool! It will be super interesting if this can be combined with DALL-E 2! https://preview.redd.it/8sj00orh46p91.png?width=1070&format=png&auto=webp&s=ad7af81880a5db333b8a517fc9b08354dfd7e16e submitted by /u/ai-is-fun [link] [comments]  ( 88 min )
    [D] What is the name of this sort of Machine Learning study ?
    I was wondering if there is a name for this sort of study. The study on how much information a particular neural network can maximally hold. Say like a 4 layer CNN with about 256 neurons each can accurately classify 20 different types of images but the accuracies start to fall when you add more types of images to classify while a 5 layer CNN can accurately classify up to 30 types of images. Is there a name for papers looking at how much "information" a particular type of neural network or size of network can store ? submitted by /u/SuitDistinct [link] [comments]  ( 90 min )
    [Discussion] what are the differences between “oral” and “poster” papers in the INTERSPEECH2022?
    I am aware that poster papers can be either on-site or virtual, whereas oral papers must be on-site and also have pre-recorded videos. What I do not understand is how they differ from each other? They have the same length “4+1”. In general, proceedings (oral papers) are officially accepted for things like funding acknowledgement, but posters are not and usually posters papers would have very primitive results or initial ideas. But I could be wrong in the field of Electrical Engineering since I am from Computer Science background. discussion #interspeech submitted by /u/nguyenvulong [link] [comments]  ( 106 min )
    [D] can we expect RTX 4090 to have 2-3x machine learning speed up?
    With 50% increase in cuda cores, 50% increase in clock speed, and factor in other optimizations, can we reasonably expect 2-3x ML performance jump over RTX 3090 Ti? Would be a welcomed boost to ML community. submitted by /u/--dany-- [link] [comments]  ( 108 min )
    [P] Follow my progress as I learn about Vector Search
    I've recently submerged myself in the deep end of vector search via research papers, tutorials and videos. In doing so, I'm re-constructing this content into bite-sized chunks here: https://vectorsearch.dev/ The repository is in it's nacency, but in the interest of early feedback I believe I have the structure down via Foundations, Use Cases and Architecture. Would love to understand how others are wrapping their brains around the technology. submitted by /u/vanlifecoder [link] [comments]  ( 106 min )
  • Open

    Amazon Comprehend Targeted Sentiment adds synchronous support
    Earlier this year, Amazon Comprehend, a natural language processing (NLP) service that uses machine learning (ML) to discover insights from text, launched the Targeted Sentiment feature. With Targeted Sentiment, you can identify groups of mentions (co-reference groups) corresponding to a single real-world entity or attribute, provide the sentiment associated with each entity mention, and offer […]  ( 8 min )
    Run machine learning enablement events at scale using AWS DeepRacer multi-user account mode
    This post was co-written by Marius Cealera, Senior Partner Solutions Architect at AWS, Zdenko Estok, Cloud Architect at Accenture and Sakar Selimcan, Cloud Architect at Accenture. Machine learning (ML) is a high-stakes business priority, with companies spending $306 billion on ML applications in the past 3 years. According to Accenture, companies that scale ML across […]  ( 6 min )
    Enable intelligent decision-making with Amazon SageMaker Canvas and Amazon QuickSight
    Every company, regardless of its size, wants to deliver the best products and services to its customers. To achieve this, companies want to understand industry trends and customer behavior, and optimize internal processes and data analyses on a routine basis. This is a crucial component of a company’s success. A very prominent part of the […]  ( 10 min )
    Amazon SageMaker Autopilot is up to eight times faster with new ensemble training mode powered by AutoGluon
    Amazon SageMaker Autopilot has added a new training mode that supports model ensembling powered by AutoGluon. Ensemble training mode in Autopilot trains several base models and combines their predictions using model stacking. For datasets less than 100 MB, ensemble training mode builds machine learning (ML) models with high accuracy quickly—up to eight times faster than […]  ( 9 min )
  • Open

    Raging against the machine
    Ran across a great quote from Liv Boeree recently: The problem with raging against the machine is that the machine has learned to feed off rage. Someone appropriately replied with a screenshot from an episode of Star Trek TOS, Day of the Dove, about a being that feeds off anger, like contemporary media. Raging against the machine first appeared on John D. Cook.  ( 4 min )
    Field of order 9
    This post will give a detailed example of working in a field with nine elements. This is important because finite fields are not often treated concretely except for the case of prime order. In my first post on Costas arrays I mentioned in a footnote that Lempel’s algorithm works more generally over any finite field, […] Field of order 9 first appeared on John D. Cook.  ( 9 min )
  • Open

    Inside AI: NVIDIA DRIVE Ecosystem Creates Pioneering In-Cabin Features With NVIDIA DRIVE IX
    As personal transportation becomes electrified and automated, time in the vehicle has begun to resemble that of a living space rather than a mind-numbing commute. Companies are creating innovative ways for drivers and passengers to make the most of this experience, using the flexibility and modularity of NVIDIA DRIVE IX. In-vehicle technology companies Cerence, Smart Read article > The post Inside AI: NVIDIA DRIVE Ecosystem Creates Pioneering In-Cabin Features With NVIDIA DRIVE IX appeared first on NVIDIA Blog.  ( 6 min )
    HARMAN to Deliver Immersive In-Vehicle Experience With NVIDIA DRIVE IX
    Breakthroughs in centralized, high performance computing aren’t just opening up new functionality for autonomous driving, but for the in-vehicle experience as well. With the introduction of NVIDIA DRIVE Thor, automakers can build unified AI compute platforms that combine advanced driver-assistance systems and in-vehicle infotainment. The centralized NVIDIA DRIVE architecture supports novel features in the vehicle, Read article > The post HARMAN to Deliver Immersive In-Vehicle Experience With NVIDIA DRIVE IX appeared first on NVIDIA Blog.  ( 4 min )
    Now You’re Speaking My Language: NVIDIA Riva Sets New Bar for Fully Customizable Speech AI
    Whether for virtual assistants, transcriptions or contact centers, voice AI services are turning words and conversations into bits and bytes of business magic. At GTC this week, NVIDIA announced new additions to NVIDIA Riva, a GPU-accelerated software development kit for building and deploying speech AI applications. Riva’s pretrained models are now offered in seven languages, Read article > The post Now You’re Speaking My Language: NVIDIA Riva Sets New Bar for Fully Customizable Speech AI appeared first on NVIDIA Blog.  ( 6 min )
    A Podcast With Teeth: How Overjet Brings AI to Dentists’ Offices
    Dentists get a bad rap. Dentists also get more people out of more aggravating pain than just about anyone. Which is why the more technology dentists have, the better. Overjet, a member of the NVIDIA Inception program for startups, is moving fast to bring AI to dentists’ offices. On this episode of the NVIDIA AI Read article > The post A Podcast With Teeth: How Overjet Brings AI to Dentists’ Offices appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    View Synthesis with Transformers
    Posted by Carlos Esteves and Ameesh Makadia, Research Scientists, Google Research A long-standing problem in the intersection of computer vision and computer graphics, view synthesis is the task of creating new views of a scene from multiple pictures of that scene. This has received increased attention [1, 2, 3] since the introduction of neural radiance fields (NeRF). The problem is challenging because to accurately synthesize new views of a scene, a model needs to capture many types of information — its detailed 3D structure, materials, and illumination — from a small set of reference images. In this post, we present recently published deep learning models for view synthesis. In “Light Field Neural Rendering” (LFNR), presented at CVPR 2022, we address the challenge of accurately repro…  ( 25 min )
  • Open

    In-home wireless device tracks disease progression in Parkinson’s patients
    By continuously monitoring a patient’s gait speed, the system can assess the condition’s severity between visits to the doctor’s office.  ( 8 min )
    Empowering Cambridge youth through data activism
    Mayor’s youth employment program brought local high schoolers to MIT this summer.  ( 9 min )
  • Open

    Introducing Whisper
    We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition. Read Paper View Code View Model Card Whisper examples: Reveal Transcript Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and  ( 6 min )
  • Open

    No-Code AI In Marketing: A Shift In Mindset? | HackerNoon
    I started my marketing career at the end of the 2000s. At that time, the digital marketing landscape was scarce and full of uncharted…  ( 17 min )
  • Open

    We continue our topic : neural networks art. this time we experimented a lot, seeking answers on how the AI is reviewing NUCLEAR WAR. destruction and fear.
    submitted by /u/Tudor_222 [link] [comments]  ( 88 min )
    Is there an AI that iterates on MIDI input?
    In my research, I have found many neural networks capable of synthesizing music and sharing it as MIDI files, such as AIVA and SOUNDRAW, but are there any capable of "improving" MIDI pieces uploaded by a user? submitted by /u/ChoiceWrld [link] [comments]  ( 87 min )
  • Open

    Calibrated and Sharp Uncertainties in Deep Learning via Density Estimation. (arXiv:2112.07184v2 [cs.LG] UPDATED)
    Accurate probabilistic predictions can be characterized by two properties -- calibration and sharpness. However, standard maximum likelihood training yields models that are poorly calibrated and thus inaccurate -- a 90% confidence interval typically does not contain the true outcome 90% of the time. This paper argues that calibration is important in practice and is easy to maintain by performing low-dimensional density estimation. We introduce a simple training procedure based on recalibration that yields calibrated models without sacrificing overall performance; unlike previous approaches, ours ensures the most general property of distribution calibration and applies to any model, including neural networks. We formally prove the correctness of our procedure assuming that we can estimate densities in low dimensions and we establish uniform convergence bounds. Our results yield empirical performance improvements on linear and deep Bayesian models and suggest that calibration should be increasingly leveraged across machine learning.  ( 2 min )
    Continual learning under domain transfer with sparse synaptic bursting. (arXiv:2108.12056v8 [cs.LG] UPDATED)
    Existing machines are functionally specific tools that were made for easy prediction and control. Tomorrow's machines may be closer to biological systems in their mutability, resilience, and autonomy. But first they must be capable of learning and retaining new information without being exposed to it arbitrarily often. Past efforts to engineer such systems have sought to build or regulate artificial neural networks using disjoint sets of weights that are uniquely sensitive to specific tasks or inputs. This has not yet enabled continual learning over long sequences of previously unseen data without corrupting existing knowledge: a problem known as catastrophic forgetting. In this paper, we introduce a system that can learn sequentially over previously unseen datasets (ImageNet, CIFAR-100) with little forgetting over time. This is done by controlling the activity of weights in a convolutional neural network on the basis of inputs using top-down regulation generated by a second feed-forward neural network. We find that our method learns continually under domain transfer with sparse bursts of activity in weights that are recycled across tasks, rather than by maintaining task-specific modules. Sparse synaptic bursting is found to balance activity and suppression such that new functions can be learned without corrupting extant knowledge, thus mirroring the balance of order and disorder in systems at the edge of chaos. This behavior emerges during a prior pre-training (or 'meta-learning') phase in which regulated synapses are selectively disinhibited, or grown, from an initial state of uniform suppression through prediction error minimization.  ( 3 min )
    Contrastive Learning of Medical Visual Representations from Paired Images and Text. (arXiv:2010.00747v2 [cs.CV] UPDATED)
    Learning visual representations of medical images (e.g., X-rays) is core to medical image understanding but its progress has been held back by the scarcity of human annotations. Existing work commonly relies on fine-tuning weights transferred from ImageNet pretraining, which is suboptimal due to drastically different image characteristics, or rule-based label extraction from the textual report data paired with medical images, which is inaccurate and hard to generalize. Meanwhile, several recent studies show exciting results from unsupervised contrastive learning from natural images, but we find these methods help little on medical images because of their high inter-class similarity. We propose ConVIRT, an alternative unsupervised strategy to learn medical visual representations by exploiting naturally occurring paired descriptive text. Our new method of pretraining medical image encoders with the paired text data via a bidirectional contrastive objective between the two modalities is domain-agnostic, and requires no additional expert input. We test ConVIRT by transferring our pretrained weights to 4 medical image classification tasks and 2 zero-shot retrieval tasks, and show that it leads to image representations that considerably outperform strong baselines in most settings. Notably, in all 4 classification tasks, our method requires only 10\% as much labeled training data as an ImageNet initialized counterpart to achieve better or comparable performance, demonstrating superior data efficiency.  ( 3 min )
    MAC: A Meta-Learning Approach for Feature Learning and Recombination. (arXiv:2209.09613v1 [cs.LG])
    Optimization-based meta-learning aims to learn an initialization so that a new unseen task can be learned within a few gradient updates. Model Agnostic Meta-Learning (MAML) is a benchmark algorithm comprising two optimization loops. The inner loop is dedicated to learning a new task and the outer loop leads to meta-initialization. However, ANIL (almost no inner loop) algorithm shows that feature reuse is an alternative to rapid learning in MAML. Thus, the meta-initialization phase makes MAML primed for feature reuse and obviates the need for rapid learning. Contrary to ANIL, we hypothesize that there may be a need to learn new features during meta-testing. A new unseen task from non-similar distribution would necessitate rapid learning in addition reuse and recombination of existing features. In this paper, we invoke the width-depth duality of neural networks, wherein, we increase the width of the network by adding extra computational units (ACU). The ACUs enable the learning of new atomic features in the meta-testing task, and the associated increased width facilitates information propagation in the forwarding pass. The newly learnt features combine with existing features in the last layer for meta-learning. Experimental results show that our proposed MAC method outperformed existing ANIL algorithm for non-similar task distribution by approximately 13% (5-shot task setting)  ( 2 min )
    Inference and Sampling for Archimax Copulas. (arXiv:2205.14025v2 [stat.ME] UPDATED)
    Understanding multivariate dependencies in both the bulk and the tails of a distribution is an important problem for many applications, such as ensuring algorithms are robust to observations that are infrequent but have devastating effects. Archimax copulas are a family of distributions endowed with a precise representation that allows simultaneous modeling of the bulk and the tails of a distribution. Rather than separating the two as is typically done in practice, incorporating additional information from the bulk may improve inference of the tails, where observations are limited. Building on the stochastic representation of Archimax copulas, we develop a non-parametric inference method and sampling algorithm. Our proposed methods, to the best of our knowledge, are the first that allow for highly flexible and scalable inference and sampling algorithms, enabling the increased use of Archimax copulas in practical settings. We experimentally compare to state-of-the-art density modeling techniques, and the results suggest that the proposed method effectively extrapolates to the tails while scaling to higher dimensional data. Our findings suggest that the proposed algorithms can be used in a variety of applications where understanding the interplay between the bulk and the tails of a distribution is necessary, such as healthcare and safety.  ( 3 min )
    Calibrated Uncertainty Estimation Improves Bayesian Optimization. (arXiv:2112.04620v2 [cs.LG] UPDATED)
    Bayesian optimization is a sequential procedure for obtaining the global optimum of black-box functions without knowing a priori their true form. Good uncertainty estimates over the shape of the objective function are essential in guiding the optimization process. However, these estimates can be inaccurate if the true objective function violates assumptions made by its model (e.g., Gaussianity). This paper studies which uncertainties are needed in Bayesian optimization models and argues that ideal uncertainties should be calibrated -- i.e., an 80% predictive interval should contain the true outcome 80% of the time. We propose a simple algorithm for enforcing this property and show that it enables Bayesian optimization to arrive at the global optimum in fewer steps. We provide theoretical insights into the role of calibrated uncertainties and demonstrate the improved performance of our method on standard benchmark functions and hyperparameter optimization tasks.  ( 2 min )
    AlphaDDA: Strategies for Adjusting the Playing Strength of a Fully Trained AlphaZero System to a Suitable Human Training Partner. (arXiv:2111.06266v4 [cs.LG] UPDATED)
    Artificial intelligence (AI) has achieved superhuman performance in board games such as Go, chess, and Othello (Reversi). In other words, the AI system surpasses the level of a strong human expert player in such games. In this context, it is difficult for a human player to enjoy playing the games with the AI. To keep human players entertained and immersed in a game, the AI is required to dynamically balance its skill with that of the human player. To address this issue, we propose AlphaDDA, an AlphaZero-based AI with dynamic difficulty adjustment (DDA). AlphaDDA consists of a deep neural network (DNN) and a Monte Carlo tree search, as in AlphaZero. AlphaDDA learns and plays a game the same way as AlphaZero, but can change its skills. AlphaDDA estimates the value of the game state from only the board state using the DNN. AlphaDDA changes a parameter dominantly controlling its skills according to the estimated value. Consequently, AlphaDDA adjusts its skills according to a game state. AlphaDDA can adjust its skill using only the state of a game without any prior knowledge regarding an opponent. In this study, AlphaDDA plays Connect4, Othello, and 6x6 Othello with other AI agents. Other AI agents are AlphaZero, Monte Carlo tree search, the minimax algorithm, and a random player. This study shows that AlphaDDA can balance its skill with that of the other AI agents, except for a random player. The DDA ability of AlphaDDA is based on an accurate estimation of the value from the state of a game. We believe that the AlphaDDA approach for DDA can be used for any game AI system if the DNN can accurately estimate the value of the game state and we know a parameter controlling the skills of the AI system.  ( 3 min )
    X-Risk Analysis for AI Research. (arXiv:2206.05862v7 [cs.CY] UPDATED)
    Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities. Current AI research lacks a systematic discussion of how to manage long-tail risks from AI systems, including speculative long-term risks. Keeping in mind the potential benefits of AI, there is some concern that building ever more intelligent and powerful AI systems could eventually result in systems that are more powerful than us; some say this is like playing with fire and speculate that this could create existential risks (x-risks). To add precision and ground these discussions, we provide a guide for how to analyze AI x-risk, which consists of three parts: First, we review how systems can be made safer today, drawing on time-tested concepts from hazard analysis and systems safety that have been designed to steer large processes in safer directions. Next, we discuss strategies for having long-term impacts on the safety of future systems. Finally, we discuss a crucial concept in making AI systems safer by improving the balance between safety and general capabilities. We hope this document and the presented concepts and tools serve as a useful guide for understanding how to analyze AI x-risk.  ( 3 min )
    Towards Sequence-Level Training for Visual Tracking. (arXiv:2208.05810v2 [cs.CV] UPDATED)
    Despite the extensive adoption of machine learning on the task of visual object tracking, recent learning-based approaches have largely overlooked the fact that visual tracking is a sequence-level task in its nature; they rely heavily on frame-level training, which inevitably induces inconsistency between training and testing in terms of both data distributions and task objectives. This work introduces a sequence-level training strategy for visual tracking based on reinforcement learning and discusses how a sequence-level design of data sampling, learning objectives, and data augmentation can improve the accuracy and robustness of tracking algorithms. Our experiments on standard benchmarks including LaSOT, TrackingNet, and GOT-10k demonstrate that four representative tracking models, SiamRPN++, SiamAttn, TransT, and TrDiMP, consistently improve by incorporating the proposed methods in training without modifying architectures.  ( 2 min )
    BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis. (arXiv:2203.05297v5 [cs.CV] UPDATED)
    Achieving realistic, vivid, and human-like synthesized conversational gestures conditioned on multi-modal data is still an unsolved problem due to the lack of available datasets, models and standard evaluation metrics. To address this, we build Body-Expression-Audio-Text dataset, BEAT, which has i) 76 hours, high-quality, multi-modal data captured from 30 speakers talking with eight different emotions and in four different languages, ii) 32 millions frame-level emotion and semantic relevance annotations. Our statistical analysis on BEAT demonstrates the correlation of conversational gestures with facial expressions, emotions, and semantics, in addition to the known correlation with audio, text, and speaker identity. Based on this observation, we propose a baseline model, Cascaded Motion Network (CaMN), which consists of above six modalities modeled in a cascaded architecture for gesture synthesis. To evaluate the semantic relevancy, we introduce a metric, Semantic Relevance Gesture Recall (SRGR). Qualitative and quantitative experiments demonstrate metrics' validness, ground truth data quality, and baseline's state-of-the-art performance. To the best of our knowledge, BEAT is the largest motion capture dataset for investigating human gestures, which may contribute to a number of different research fields, including controllable gesture synthesis, cross-modality analysis, and emotional gesture recognition. The data, code and model are available on https://pantomatrix.github.io/BEAT/.  ( 3 min )
    CARLANE: A Lane Detection Benchmark for Unsupervised Domain Adaptation from Simulation to multiple Real-World Domains. (arXiv:2206.08083v3 [cs.CV] UPDATED)
    Unsupervised Domain Adaptation demonstrates great potential to mitigate domain shifts by transferring models from labeled source domains to unlabeled target domains. While Unsupervised Domain Adaptation has been applied to a wide variety of complex vision tasks, only few works focus on lane detection for autonomous driving. This can be attributed to the lack of publicly available datasets. To facilitate research in these directions, we propose CARLANE, a 3-way sim-to-real domain adaptation benchmark for 2D lane detection. CARLANE encompasses the single-target datasets MoLane and TuLane and the multi-target dataset MuLane. These datasets are built from three different domains, which cover diverse scenes and contain a total of 163K unique images, 118K of which are annotated. In addition we evaluate and report systematic baselines, including our own method, which builds upon Prototypical Cross-domain Self-supervised Learning. We find that false positive and false negative rates of the evaluated domain adaptation methods are high compared to those of fully supervised baselines. This affirms the need for benchmarks such as CARLANE to further strengthen research in Unsupervised Domain Adaptation for lane detection. CARLANE, all evaluated models and the corresponding implementations are publicly available at https://carlanebenchmark.github.io.  ( 3 min )
    Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty. (arXiv:2209.09658v1 [cs.LG])
    Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called `lazy' regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. In other words, the non-linear dynamics tends to sequentialize the learning of examples of increasing difficulty. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of spurious correlations. Our results reveal a new understanding of how deep networks prioritize resources across example difficulty.  ( 2 min )
    Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations. (arXiv:2205.02959v6 [cs.AI] UPDATED)
    We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model the behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.  ( 3 min )
    WFA-IRL: Inverse Reinforcement Learning of Autonomous Behaviors Encoded as Weighted Finite Automata. (arXiv:2103.05895v3 [cs.LG] UPDATED)
    This paper presents a method for learning logical task specifications and cost functions from demonstrations. Constructing specifications by hand is challenging for complex objectives and constraints in autonomous systems. Instead, we consider demonstrated task executions, whose logic structure and transition costs need to be inferred by an autonomous agent. We employ a spectral learning approach to extract a weighted finite automaton (WFA), approximating the unknown task logic. Thereafter, we define a product between the WFA for high-level task guidance and a labeled Markov decision process for low-level control. An inverse reinforcement learning (IRL) problem is considered to learn a cost function by backpropagating the loss between agent and expert behaviors through the planning algorithm. Our proposed model, termed WFA-IRL, is capable of generalizing the execution of the inferred task specification in a suite of MiniGrid environments.  ( 2 min )
    A Machine Learning Approach to Solving Large Bilevel and Stochastic Programs: Application to Cycling Network Design. (arXiv:2209.09404v1 [math.OC])
    We present a novel machine learning-based approach to solving bilevel programs that involve a large number of independent followers, which as a special case include two-stage stochastic programming. We propose an optimization model that explicitly considers a sampled subset of followers and exploits a machine learning model to estimate the objective values of unsampled followers. Unlike existing approaches, we embed machine learning model training into the optimization problem, which allows us to employ general follower features that can not be represented using leader decisions. We prove bounds on the optimality gap of the generated leader decision as measured by the original objective function that considers the full follower set. We then develop follower sampling algorithms to tighten the bounds and a representation learning approach to learn follower features, which can be used as inputs to the embedded machine learning model. Using synthetic instances of a cycling network design problem, we compare the computational performance of our approach versus baseline methods. Our approach provides more accurate predictions for follower objective values, and more importantly, generates leader decisions of higher quality. Finally, we perform a real-world case study on cycling infrastructure planning, where we apply our approach to solve a network design problem with over one million followers. Our approach presents favorable performance compared to the current cycling network expansion practices.  ( 3 min )
    Safe Reinforcement Learning with Contrastive Risk Prediction. (arXiv:2209.09648v1 [cs.AI])
    As safety violations can lead to severe consequences in real-world robotic applications, the increasing deployment of Reinforcement Learning (RL) in robotic domains has propelled the study of safe exploration for reinforcement learning (safe RL). In this work, we propose a risk preventive training method for safe RL, which learns a statistical contrastive classifier to predict the probability of a state-action pair leading to unsafe states. Based on the predicted risk probabilities, we can collect risk preventive trajectories and reshape the reward function with risk penalties to induce safe RL policies. We conduct experiments in robotic simulation environments. The results show the proposed approach has comparable performance with the state-of-the-art model-based methods and outperforms conventional model-free safe RL approaches.  ( 2 min )
    Deep Generalized Schr\"odinger Bridge. (arXiv:2209.09893v1 [stat.ML])
    Mean-Field Game (MFG) serves as a crucial mathematical framework in modeling the collective behavior of individual agents interacting stochastically with a large population. In this work, we aim at solving a challenging class of MFGs in which the differentiability of these interacting preferences may not be available to the solver, and the population is urged to converge exactly to some desired distribution. These setups are, despite being well-motivated for practical purposes, complicated enough to paralyze most (deep) numerical solvers. Nevertheless, we show that Schr\"odinger Bridge - as an entropy-regularized optimal transport model - can be generalized to accepting mean-field structures, hence solving these MFGs. This is achieved via the application of Forward-Backward Stochastic Differential Equations theory, which, intriguingly, leads to a computational framework with a similar structure to Temporal Difference learning. As such, it opens up novel algorithmic connections to Deep Reinforcement Learning that we leverage to facilitate practical training. We show that our proposed objective function provides necessary and sufficient conditions to the mean-field problem. Our method, named Deep Generalized Schr\"odinger Bridge (DeepGSB), not only outperforms prior methods in solving classical population navigation MFGs, but is also capable of solving 1000-dimensional opinion depolarization, setting a new state-of-the-art numerical solver for high-dimensional MFGs. Our code will be made available at https://github.com/ghliu/DeepGSB.  ( 2 min )
    Monotonic Improvement Guarantees under Non-stationarity for Decentralized PPO. (arXiv:2202.00082v2 [cs.LG] UPDATED)
    We present a new monotonic improvement guarantee for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary. This new analysis provides a theoretical understanding of the strong performance of two recent actor-critic methods for MARL, i.e., Independent Proximal Policy Optimization (IPPO) and Multi-Agent PPO (MAPPO), which both rely on independent ratios, i.e., computing probability ratios separately for each agent's policy. We show that, despite the non-stationarity that independent ratios cause, a monotonic improvement guarantee still arises as a result of enforcing the trust region constraint over all decentralized policies. We also show this trust region constraint can be effectively enforced in a principled way by bounding independent ratios based on the number of agents in training, providing a theoretical foundation for proximal ratio clipping. Moreover, we show that the surrogate objectives optimized in IPPO and MAPPO are essentially equivalent when their critics converge to a fixed point. Finally, our empirical results support the hypothesis that the strong performance of IPPO and MAPPO is a direct result of enforcing such a trust region constraint via clipping in centralized training, and the good values of the hyperparameters for this enforcement are highly sensitive to the number of agents, as predicted by our theoretical analysis.  ( 3 min )
    Weak Supervision in Analysis of News: Application to Economic Policy Uncertainty. (arXiv:2209.05383v2 [econ.GN] UPDATED)
    The need for timely data analysis for economic decisions has prompted most economists and policy makers to search for non-traditional supplementary sources of data. In that context, text data is being explored to enrich traditional data sources because it is easy to collect and highly abundant. Our work focuses on studying the potential of textual data, in particular news pieces, for measuring economic policy uncertainty (EPU). Economic policy uncertainty is defined as the public's inability to predict the outcomes of their decisions under new policies and future economic fundamentals. Quantifying EPU is of great importance to policy makers, economists, and investors since it influences their expectations about the future economic fundamentals with an impact on their policy, investment and saving decisions. Most of the previous work using news articles for measuring EPU are either manual or based on a simple keyword search. Our work proposes a machine learning based solution involving weak supervision to classify news articles with regards to economic policy uncertainty. Weak supervision is shown to be an efficient machine learning paradigm for applying machine learning models in low resource settings with no or scarce training sets, leveraging domain knowledge and heuristics. We further generated a weak supervision based EPU index that we used to conduct extensive econometric analysis along with the Irish macroeconomic indicators to validate whether our generated index foreshadows weaker macroeconomic performance  ( 3 min )
    SCGG: A Deep Structure-Conditioned Graph Generative Model. (arXiv:2209.09681v1 [cs.LG])
    Deep learning-based graph generation approaches have remarkable capacities for graph data modeling, allowing them to solve a wide range of real-world problems. Making these methods able to consider different conditions during the generation procedure even increases their effectiveness by empowering them to generate new graph samples that meet the desired criteria. This paper presents a conditional deep graph generation method called SCGG that considers a particular type of structural conditions. Specifically, our proposed SCGG model takes an initial subgraph and autoregressively generates new nodes and their corresponding edges on top of the given conditioning substructure. The architecture of SCGG consists of a graph representation learning network and an autoregressive generative model, which is trained end-to-end. Using this model, we can address graph completion, a rampant and inherently difficult problem of recovering missing nodes and their associated edges of partially observed graphs. Experimental results on both synthetic and real-world datasets demonstrate the superiority of our method compared with state-of-the-art baselines.  ( 2 min )
    Industrial Data Science for Batch Manufacturing Processes. (arXiv:2209.09660v1 [cs.LG])
    Batch processes show several sources of variability, from raw materials' properties to initial and evolving conditions that change during the different events in the manufacturing process. In this chapter, we will illustrate with an industrial example how to use machine learning to reduce this apparent excess of data while maintaining the relevant information for process engineers. Two common use cases will be presented: 1) AutoML analysis to quickly find correlations in batch process data, and 2) trajectory analysis to monitor and identify anomalous batches leading to process control improvements.  ( 2 min )
    Deep Convolutional Architectures for Extrapolative Forecast in Time-dependent Flow Problems. (arXiv:2209.09651v1 [cs.LG])
    Physical systems whose dynamics are governed by partial differential equations (PDEs) find applications in numerous fields, from engineering design to weather forecasting. The process of obtaining the solution from such PDEs may be computationally expensive for large-scale and parameterized problems. In this work, deep learning techniques developed especially for time-series forecasts, such as LSTM and TCN, or for spatial-feature extraction such as CNN, are employed to model the system dynamics for advection dominated problems. These models take as input a sequence of high-fidelity vector solutions for consecutive time-steps obtained from the PDEs and forecast the solutions for the subsequent time-steps using auto-regression; thereby reducing the computation time and power needed to obtain such high-fidelity solutions. The models are tested on numerical benchmarks (1D Burgers' equation and Stoker's dam break problem) to assess the long-term prediction accuracy, even outside the training domain (extrapolation). Non-intrusive reduced-order modelling techniques such as deep auto-encoder networks are utilized to compress the high-fidelity snapshots before feeding them as input to the forecasting models in order to reduce the complexity and the required computations in the online and offline stages. Deep ensembles are employed to perform uncertainty quantification of the forecasting models, which provides information about the variance of the predictions as a result of the epistemic uncertainties.  ( 2 min )
    Lower Bounds on the Worst-Case Complexity of Efficient Global Optimization. (arXiv:2209.09655v1 [math.OC])
    Efficient global optimization is a widely used method for optimizing expensive black-box functions such as tuning hyperparameter, and designing new material, etc. Despite its popularity, less attention has been paid to analyzing the inherent hardness of the problem although, given its extensive use, it is important to understand the fundamental limits of efficient global optimization algorithms. In this paper, we study the worst-case complexity of the efficient global optimization problem and, in contrast to existing kernel-specific results, we derive a unified lower bound for the complexity of efficient global optimization in terms of the metric entropy of a ball in its corresponding reproducing kernel Hilbert space~(RKHS). Specifically, we show that if there exists a deterministic algorithm that achieves suboptimality gap smaller than $\epsilon$ for any function $f\in S$ in $T$ function evaluations, it is necessary that $T$ is at least $\Omega\left(\frac{\log\mathcal{N}(S(\mathcal{X}), 4\epsilon,\|\cdot\|_\infty)}{\log(\frac{R}{\epsilon})}\right)$, where $\mathcal{N}(\cdot,\cdot,\cdot)$ is the covering number, $S$ is the ball centered at $0$ with radius $R$ in the RKHS and $S(\mathcal{X})$ is the restriction of $S$ over the feasible set $\mathcal{X}$. Moreover, we show that this lower bound nearly matches the upper bound attained by non-adaptive search algorithms for the commonly used squared exponential kernel and the Mat\'ern kernel with a large smoothness parameter $\nu$, up to a replacement of $d/2$ by $d$ and a logarithmic term $\log\frac{R}{\epsilon}$. That is to say, our lower bound is nearly optimal for these kernels.  ( 3 min )
    Detecting Political Biases of Named Entities and Hashtags on Twitter. (arXiv:2209.08110v1 [cs.SI] CROSS LISTED)
    Ideological divisions in the United States have become increasingly prominent in daily communication. Accordingly, there has been much research on political polarization, including many recent efforts that take a computational perspective. By detecting political biases in a corpus of text, one can attempt to describe and discern the polarity of that text. Intuitively, the named entities (i.e., the nouns and phrases that act as nouns) and hashtags in text often carry information about political views. For example, people who use the term "pro-choice" are likely to be liberal, whereas people who use the term "pro-life" are likely to be conservative. In this paper, we seek to reveal political polarities in social-media text data and to quantify these polarities by explicitly assigning a polarity score to entities and hashtags. Although this idea is straightforward, it is difficult to perform such inference in a trustworthy quantitative way. Key challenges include the small number of known labels, the continuous spectrum of political views, and the preservation of both a polarity score and a polarity-neutral semantic meaning in an embedding vector of words. To attempt to overcome these challenges, we propose the Polarity-aware Embedding Multi-task learning (PEM) model. This model consists of (1) a self-supervised context-preservation task, (2) an attention-based tweet-level polarity-inference task, and (3) an adversarial learning task that promotes independence between an embedding's polarity dimension and its semantic dimensions. Our experimental results demonstrate that our PEM model can successfully learn polarity-aware embeddings. We examine a variety of applications and we thereby demonstrate the effectiveness of our PEM model. We also discuss important limitations of our work and stress caution when applying the PEM model to real-world scenarios.  ( 3 min )
    Sequence Learning using Equilibrium Propagation. (arXiv:2209.09626v1 [cs.NE])
    Equilibrium Propagation (EP) is a powerful and more bio-plausible alternative to conventional learning frameworks such as backpropagation. The effectiveness of EP stems from the fact that it relies only on local computations and requires solely one kind of computational unit during both of its training phases, thereby enabling greater applicability in domains such as bio-inspired neuromorphic computing. The dynamics of the model in EP is governed by an energy function and the internal states of the model consequently converge to a steady state following the state transition rules defined by the same. However, by definition, EP requires the input to the model (a convergent RNN) to be static in both the phases of training. Thus it is not possible to design a model for sequence classification using EP with an LSTM or GRU like architecture. In this paper, we leverage recent developments in modern hopfield networks to further understand energy based models and develop solutions for complex sequence classification tasks using EP while satisfying its convergence criteria and maintaining its theoretical similarities with recurrent backpropagation. We explore the possibility of integrating modern hopfield networks as an attention mechanism with convergent RNN models used in EP, thereby extending its applicability for the first time on two different sequence classification tasks in natural language processing viz. sentiment analysis (IMDB dataset) and natural language inference (SNLI dataset).  ( 2 min )
    Locally Constrained Representations in Reinforcement Learning. (arXiv:2209.09441v1 [cs.LG])
    The success of Reinforcement Learning (RL) heavily relies on the ability to learn robust representations from the observations of the environment. In most cases, the representations learned purely by the reinforcement learning loss can differ vastly across states depending on how the value functions change. However, the representations learned need not be very specific to the task at hand. Relying only on the RL objective may yield representations that vary greatly across successive time steps. In addition, since the RL loss has a changing target, the representations learned would depend on how good the current values/policies are. Thus, disentangling the representations from the main task would allow them to focus more on capturing transition dynamics which can improve generalization. To this end, we propose locally constrained representations, where an auxiliary loss forces the state representations to be predictable by the representations of the neighbouring states. This encourages the representations to be driven not only by the value/policy learning but also self-supervised learning, which constrains the representations from changing too rapidly. We evaluate the proposed method on several known benchmarks and observe strong performance. Especially in continuous control tasks, our experiments show a significant advantage over a strong baseline.  ( 2 min )
    PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party Computation Based Private Inference. (arXiv:2209.09424v1 [cs.CR])
    The rapid growth and deployment of deep learning (DL) has witnessed emerging privacy and security concerns. To mitigate these issues, secure multi-party computation (MPC) has been discussed, to enable the privacy-preserving DL computation. In practice, they often come at very high computation and communication overhead, and potentially prohibit their popularity in large scale systems. Two orthogonal research trends have attracted enormous interests in addressing the energy efficiency in secure deep learning, i.e., overhead reduction of MPC comparison protocol, and hardware acceleration. However, they either achieve a low reduction ratio and suffer from high latency due to limited computation and communication saving, or are power-hungry as existing works mainly focus on general computing platforms such as CPUs and GPUs. In this work, as the first attempt, we develop a systematic framework, PolyMPCNet, of joint overhead reduction of MPC comparison protocol and hardware acceleration, by integrating hardware latency of the cryptographic building block into the DNN loss function to achieve high energy efficiency, accuracy, and security guarantee. Instead of heuristically checking the model sensitivity after a DNN is well-trained (through deleting or dropping some non-polynomial operators), our key design principle is to em enforce exactly what is assumed in the DNN design -- training a DNN that is both hardware efficient and secure, while escaping the local minima and saddle points and maintaining high accuracy. More specifically, we propose a straight through polynomial activation initialization method for cryptographic hardware friendly trainable polynomial activation function to replace the expensive 2P-ReLU operator. We develop a cryptographic hardware scheduler and the corresponding performance model for Field Programmable Gate Arrays (FPGA) platform.  ( 3 min )
    Distributed representations of graphs for drug pair scoring. (arXiv:2209.09383v1 [cs.LG])
    In this paper we study the practicality and usefulness of incorporating distributed representations of graphs into models within the context of drug pair scoring. We argue that the real world growth and update cycles of drug pair scoring datasets subvert the limitations of transductive learning associated with distributed representations. Furthermore, we argue that the vocabulary of discrete substructure patterns induced over drug sets is not dramatically large due to the limited set of atom types and constraints on bonding patterns enforced by chemistry. Under this pretext, we explore the effectiveness of distributed representations of the molecular graphs of drugs in drug pair scoring tasks such as drug synergy, polypharmacy, and drug-drug interaction prediction. To achieve this, we present a methodology for learning and incorporating distributed representations of graphs within a unified framework for drug pair scoring. Subsequently, we augment a number of recent and state-of-the-art models to utilise our embeddings. We empirically show that the incorporation of these embeddings improves downstream performance of almost every model across different drug pair scoring tasks, even those the original model was not designed for. We publicly release all of our drug embeddings for the DrugCombDB, DrugComb, DrugbankDDI, and TwoSides datasets.  ( 2 min )
    Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. (arXiv:2110.14883v2 [cs.LG] UPDATED)
    The success of Transformer models has pushed the deep learning model scale to billions of parameters. Due to the limited memory resource of a single GPU, However, the best practice for choosing the optimal parallel strategy is still lacking, since it requires domain expertise in both deep learning and parallel computing. The Colossal-AI system addressed the above challenge by introducing a unified interface to scale your sequential code of model training to distributed environments. It supports parallel training methods such as data, pipeline, tensor, and sequence parallelism, as well as heterogeneous training methods integrated with zero redundancy optimizer. Compared to the baseline system, Colossal-AI can achieve up to 2.76 times training speedup on large-scale models.  ( 2 min )
    Machine Learning based Extraction of Boundary Conditions from Doppler Echo Images for Patient Specific Coarctation of the Aorta: Computational Fluid Dynamics Study. (arXiv:2209.09139v2 [cs.CE] UPDATED)
    Purpose- Coarctation of the Aorta (CoA) patient-specific computational fluid dynamics (CFD) studies in resource constrained settings are limited by the available imaging modalities for geometry and velocity data acquisition. Doppler echocardiography has been seen as a suitable velocity acquisition modality due to its higher availability and safety. This study aimed to investigate the application of classical machine learning (ML) methods to create an adequate and robust approach for obtaining boundary conditions (BCs) from Doppler Echocardiography images, for haemodynamic modeling using CFD. Methods- Our proposed approach combines ML and CFD to model haemodynamic flow within the region of interest. With the key feature of the approach being the use of ML models to calibrate the inlet and outlet boundary conditions (BCs) of the CFD model. The key input variable for the ML model was the patients heart rate as this was the parameter that varied in time across the measured vessels within the study. ANSYS Fluent was used for the CFD component of the study whilst the scikit-learn python library was used for the ML component. Results- We validated our approach against a real clinical case of severe CoA before intervention. The maximum coarctation velocity of our simulations were compared to the measured maximum coarctation velocity obtained from the patient whose geometry is used within the study. Of the 5 ML models used to obtain BCs the top model was within 5\% of the measured maximum coarctation velocity. Conclusion- The framework demonstrated that it was capable of taking variations of the patients heart rate between measurements into account. Thus, enabling the calculation of BCs that were physiologically realistic when the heart rate was scaled across each vessel whilst providing a reasonably accurate solution.  ( 3 min )
    One-to-Many Semantic Communication Systems: Design, Implementation, Performance Evaluation. (arXiv:2209.09425v1 [cs.LG])
    Semantic communication in the 6G era has been deemed a promising communication paradigm to break through the bottleneck of traditional communications. However, its applications for the multi-user scenario, especially the broadcasting case, remain under-explored. To effectively exploit the benefits enabled by semantic communication, in this paper, we propose a one-to-many semantic communication system. Specifically, we propose a deep neural network (DNN) enabled semantic communication system called MR\_DeepSC. By leveraging semantic features for different users, a semantic recognizer based on the pre-trained model, i.e., DistilBERT, is built to distinguish different users. Furthermore, the transfer learning is adopted to speed up the training of new receiver networks. Simulation results demonstrate that the proposed MR\_DeepSC can achieve the best performance in terms of BLEU score than the other benchmarks under different channel conditions, especially in the low signal-to-noise ratio (SNR) regime.  ( 2 min )
    Polynomial-Time Reachability for LTI Systems with Two-Level Lattice Neural Network Controllers. (arXiv:2209.09400v1 [cs.LG])
    In this paper, we consider the computational complexity of bounding the reachable set of a Linear Time-Invariant (LTI) system controlled by a Rectified Linear Unit (ReLU) Two-Level Lattice (TLL) Neural Network (NN) controller. In particular, we show that for such a system and controller, it is possible to compute the exact one-step reachable set in polynomial time in the size of the size of the TLL NN controller (number of neurons). Additionally, we show that it is possible to obtain a tight bounding box of the reachable set via two polynomial-time methods: one with polynomial complexity in the size of the TLL and the other with polynomial complexity in the Lipschitz constant of the controller and other problem parameters. Crucially, the smaller of the two can be decided in polynomial time for non-degenerate TLL NNs. Finally, we propose a pragmatic algorithm that adaptively combines the benefits of (semi-)exact reachability and approximate reachability, which we call L-TLLBox. We evaluate L-TLLBox with an empirical comparison to a state-of-the-art NN controller reachability tool. In these experiments, L-TLLBox was able to complete reachability analysis as much as 5000x faster than this tool on the same network/system, while producing reach boxes that were from 0.08 to 1.42 times the area.  ( 3 min )
    Fairness and robustness in anti-causal prediction. (arXiv:2209.09423v1 [cs.LG])
    Robustness to distribution shift and fairness have independently emerged as two important desiderata required of modern machine learning models. While these two desiderata seem related, the connection between them is often unclear in practice. Here, we discuss these connections through a causal lens, focusing on anti-causal prediction tasks, where the input to a classifier (e.g., an image) is assumed to be generated as a function of the target label and the protected attribute. By taking this perspective, we draw explicit connections between a common fairness criterion - separation - and a common notion of robustness - risk invariance. These connections provide new motivation for applying the separation criterion in anticausal settings, and inform old discussions regarding fairness-performance tradeoffs. In addition, our findings suggest that robustness-motivated approaches can be used to enforce separation, and that they often work better in practice than methods designed to directly enforce separation. Using a medical dataset, we empirically validate our findings on the task of detecting pneumonia from X-rays, in a setting where differences in prevalence across sex groups motivates a fairness mitigation. Our findings highlight the importance of considering causal structure when choosing and enforcing fairness criteria.  ( 2 min )
    A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics. (arXiv:2103.01403v2 [cs.LG] UPDATED)
    Inspired by humans' remarkable ability to master arithmetic and generalize to unseen problems, we present a new dataset, HINT, to study machines' capability of learning generalizable concepts at three levels: perception, syntax, and semantics. Learning agents are tasked to reckon how concepts are perceived from raw signals such as images (i.e., perception), how multiple concepts are structurally combined to form a valid expression (i.e., syntax), and how concepts are realized to afford various reasoning tasks (i.e., semantics), all in a weakly supervised manner. With a focus on systematic generalization, we carefully design a five-fold test set to evaluate both the interpolation and the extrapolation of learned concepts w.r.t. the three levels. We further design a few-shot learning split to test whether models could quickly learn new concepts and generalize them to more complex scenarios. To understand existing models' limitations, we conduct extensive experiments with various sequence-to-sequence models, including RNNs, Transformers, and GPT-3 (with the chain of thought prompting). The results suggest that current models still struggle in extrapolation to long-range syntactic dependency and semantics. Models show a significant gap toward human-level generalization when tested with new concepts in a few-shot setting. Moreover, we find that it is infeasible to solve HINT by simply scaling up the dataset and the model size; this strategy barely helps the extrapolation over syntax and semantics. Finally, in zero-shot GPT-3 experiments, the chain of thought prompting shows impressive results and significantly boosts the test accuracy. We believe the proposed dataset together with the experimental findings is of great interest to the community on systematic generalization.  ( 3 min )
    Discovering and forecasting extreme events via active learning in neural operators. (arXiv:2204.02488v2 [cs.LG] UPDATED)
    Extreme events in society and nature, such as pandemic spikes, rogue waves, or structural failures, can have catastrophic consequences. Characterizing extremes is difficult as they occur rarely, arise from seemingly benign conditions, and belong to complex and often unknown infinite-dimensional systems. Such challenges render attempts at characterizing them as moot. We address each of these difficulties by combining novel training schemes in Bayesian experimental design (BED) with an ensemble of deep neural operators (DNOs). This model-agnostic framework pairs a BED scheme that actively selects data for quantifying extreme events with an ensemble of DNOs that approximate infinite-dimensional nonlinear operators. We find that not only does this framework clearly beat Gaussian processes (GPs) but that 1) shallow ensembles of just two members perform best; 2) extremes are uncovered regardless of the state of initial data (i.e. with or without extremes); 3) our method eliminates "double-descent" phenomena; 4) the use of batches of suboptimal acquisition points compared to step-by-step global optima does not hinder BED performance; and 5) Monte Carlo acquisition outperforms standard optimizers in high-dimensions. Together these conclusions form the foundation of an AI-assisted experimental infrastructure that can efficiently infer and pinpoint critical situations across many domains, from physical to societal systems.  ( 3 min )
    Boosting neural video codecs by exploiting hierarchical redundancy. (arXiv:2208.04303v2 [eess.IV] UPDATED)
    In video compression, coding efficiency is improved by reusing pixels from previously decoded frames via motion and residual compensation. We define two levels of hierarchical redundancy in video frames: 1) first-order: redundancy in pixel space, i.e., similarities in pixel values across neighboring frames, which is effectively captured using motion and residual compensation, 2) second-order: redundancy in motion and residual maps due to smooth motion in natural videos. While most of the existing neural video coding literature addresses first-order redundancy, we tackle the problem of capturing second-order redundancy in neural video codecs via predictors. We introduce generic motion and residual predictors that learn to extrapolate from previously decoded data. These predictors are lightweight, and can be employed with most neural video codecs in order to improve their rate-distortion performance. Moreover, while RGB is the dominant colorspace in neural video coding literature, we introduce general modifications for neural video codecs to embrace the YUV420 colorspace and report YUV420 results. Our experiments show that using our predictors with a well-known neural video codec leads to 38% and 34% bitrate savings in RGB and YUV420 colorspaces measured on the UVG dataset.  ( 3 min )
    A cost-based multi-layer network approach for the discovery of patient phenotypes. (arXiv:2209.09032v2 [cs.LG] UPDATED)
    Clinical records frequently include assessments of the characteristics of patients, which may include the completion of various questionnaires. These questionnaires provide a variety of perspectives on a patient's current state of well-being. Not only is it critical to capture the heterogeneity given by these perspectives, but there is also a growing demand for developing cost-effective technologies for clinical phenotyping. Filling out many questionnaires may be a strain for the patients and therefore costly. In this work, we propose COBALT -- a cost-based layer selector model for detecting phenotypes using a community detection approach. Our goal is to minimize the number of features used to build these phenotypes while preserving its quality. We test our model using questionnaire data from chronic tinnitus patients and represent the data in a multi-layer network structure. The model is then evaluated by predicting post-treatment data using baseline features (age, gender, and pre-treatment data) as well as the identified phenotypes as a feature. For some post-treatment variables, predictors using phenotypes from COBALT as features outperformed those using phenotypes detected by traditional clustering methods. Moreover, using phenotype data to predict post-treatment data proved beneficial in comparison with predictors that were solely trained with baseline features.  ( 3 min )
    Deep Learning-Based Rate-Splitting Multiple Access for Reconfigurable Intelligent Surface-Aided Tera-Hertz Massive MIMO. (arXiv:2209.08456v1 [eess.SP] CROSS LISTED)
    Reconfigurable intelligent surface (RIS) can significantly enhance the service coverage of Tera-Hertz massive multiple-input multiple-output (MIMO) communication systems. However, obtaining accurate high-dimensional channel state information (CSI) with limited pilot and feedback signaling overhead is challenging, severely degrading the performance of conventional spatial division multiple access. To improve the robustness against CSI imperfection, this paper proposes a deep learning (DL)-based rate-splitting multiple access (RSMA) scheme for RIS-aided Tera-Hertz multi-user MIMO systems. Specifically, we first propose a hybrid data-model driven DL-based RSMA precoding scheme, including the passive precoding at the RIS as well as the analog active precoding and the RSMA digital active precoding at the base station (BS). To realize the passive precoding at the RIS, we propose a Transformer-based data-driven RIS reflecting network (RRN). As for the analog active precoding at the BS, we propose a match-filter based analog precoding scheme considering that the BS and RIS adopt the LoS-MIMO antenna array architecture. As for the RSMA digital active precoding at the BS, we propose a low-complexity approximate weighted minimum mean square error (AWMMSE) digital precoding scheme. Furthermore, for better precoding performance as well as lower computational complexity, a model-driven deep unfolding active precoding network (DFAPN) is also designed by combining the proposed AWMMSE scheme with DL. Then, to acquire accurate CSI at the BS for the investigated RSMA precoding scheme to achieve higher spectral efficiency, we propose a CSI acquisition network (CAN) with low pilot and feedback signaling overhead, where the downlink pilot transmission, CSI feedback at the user equipments (UEs), and CSI reconstruction at the BS are modeled as an end-to-end neural network based on Transformer.  ( 3 min )
    Streaming Encoding Algorithms for Scalable Hyperdimensional Computing. (arXiv:2209.09868v1 [cs.LG])
    Hyperdimensional computing (HDC) is a paradigm for data representation and learning originating in computational neuroscience. HDC represents data as high-dimensional, low-precision vectors which can be used for a variety of information processing tasks like learning or recall. The mapping to high-dimensional space is a fundamental problem in HDC, and existing methods encounter scalability issues when the input data itself is high-dimensional. In this work, we explore a family of streaming encoding techniques based on hashing. We show formally that these methods enjoy comparable guarantees on performance for learning applications while being substantially more efficient than existing alternatives. We validate these results experimentally on a popular high-dimensional classification problem and show that our approach easily scales to very large data sets.  ( 2 min )
    Word Embeddings for Automatic Equalization in Audio Mixing. (arXiv:2202.08898v2 [cs.SD] UPDATED)
    In recent years, machine learning has been widely adopted to automate the audio mixing process. Automatic mixing systems have been applied to various audio effects such as gain-adjustment, equalization, and reverberation. These systems can be controlled through visual interfaces, providing audio examples, using knobs, and semantic descriptors. Using semantic descriptors or textual information to control these systems is an effective way for artists to communicate their creative goals. In this paper, we explore the novel idea of using word embeddings to represent semantic descriptors. Word embeddings are generally obtained by training neural networks on large corpora of written text. These embeddings serve as the input layer of the neural network to create a translation from words to EQ settings. Using this technique, the machine learning model can also generate EQ settings for semantic descriptors that it has not seen before. We compare the EQ settings of humans with the predictions of the neural network to evaluate the quality of predictions. The results showed that the embedding layer enables the neural network to understand semantic descriptors. We observed that the models with embedding layers perform better than those without embedding layers, but still not as good as human labels.  ( 3 min )
    Explainable Misinformation Detection Across Multiple Social Media Platforms. (arXiv:2203.11724v2 [cs.LG] UPDATED)
    In this work, the integration of two machine learning approaches, namely domain adaptation and explainable AI, is proposed to address these two issues of generalized detection and explainability. Firstly the Domain Adversarial Neural Network (DANN) develops a generalized misinformation detector across multiple social media platforms DANN is employed to generate the classification results for test domains with relevant but unseen data. The DANN-based model, a traditional black-box model, cannot justify its outcome, i.e., the labels for the target domain. Hence a Local Interpretable Model-Agnostic Explanations (LIME) explainable AI model is applied to explain the outcome of the DANN mode. To demonstrate these two approaches and their integration for effective explainable generalized detection, COVID-19 misinformation is considered a case study. We experimented with two datasets, namely CoAID and MiSoVac, and compared results with and without DANN implementation. DANN significantly improves the accuracy measure F1 classification score and increases the accuracy and AUC performance. The results obtained show that the proposed framework performs well in the case of domain shift and can learn domain-invariant features while explaining the target labels with LIME implementation enabling trustworthy information processing and extraction to combat misinformation effectively.  ( 3 min )
    Investigating Generalization by Controlling Normalized Margin. (arXiv:2205.03940v3 [cs.LG] UPDATED)
    Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$. Since standard neural net optimizers do not control normalized margin, it is hard to test whether this quantity causally relates to generalization. This paper designs a series of experimental studies that explicitly control normalized margin and thereby tackle two central questions. First: does normalized margin always have a causal effect on generalization? The paper finds that no -- networks can be produced where normalized margin has seemingly no relationship with generalization, counter to the theory of Bartlett et al. (2017). Second: does normalized margin ever have a causal effect on generalization? The paper finds that yes -- in a standard training setup, test performance closely tracks normalized margin. The paper suggests a Gaussian process model as a promising explanation for this behavior.  ( 2 min )
    On the Usefulness of Deep Ensemble Diversity for Out-of-Distribution Detection. (arXiv:2207.07517v2 [cs.LG] UPDATED)
    The ability to detect Out-of-Distribution (OOD) data is important in safety-critical applications of deep learning. The aim is to separate In-Distribution (ID) data drawn from the training distribution from OOD data using a measure of uncertainty extracted from a deep neural network. Deep Ensembles are a well-established method of improving the quality of uncertainty estimates produced by deep neural networks, and have been shown to have superior OOD detection performance compared to single models. An existing intuition in the literature is that the diversity of Deep Ensemble predictions indicates distributional shift, and so measures of diversity such as Mutual Information (MI) should be used for OOD detection. We show experimentally that this intuition is not valid on ImageNet-scale OOD detection -- using MI leads to 30-40% worse %FPR@95 compared to single-model entropy on some OOD datasets. We suggest an alternative explanation for Deep Ensembles' better OOD detection performance -- OOD detection is binary classification and we are ensembling diverse classifiers. As such we show that practically, even better OOD detection performance can be achieved for Deep Ensembles by averaging task-specific detection scores such as Energy over the ensemble.  ( 3 min )
    APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction. (arXiv:2203.16218v2 [cs.IR] UPDATED)
    In many web applications, deep learning-based CTR prediction models (deep CTR models for short) are widely adopted. Traditional deep CTR models learn patterns in a static manner, i.e., the network parameters are the same across all the instances. However, such a manner can hardly characterize each of the instances which may have different underlying distributions. It actually limits the representation power of deep CTR models, leading to sub-optimal results. In this paper, we propose an efficient, effective, and universal module, named as Adaptive Parameter Generation network (APG), which can dynamically generate parameters for deep CTR models on-the-fly based on different instances. Extensive experimental evaluation results show that APG can be applied to a variety of deep CTR models and significantly improve their performance. Meanwhile, APG can reduce the time cost by 38.7\% and memory usage by 96.6\% compared to a regular deep CTR model. We have deployed APG in the industrial sponsored search system and achieved 3\% CTR gain and 1\% RPM gain respectively.  ( 2 min )
    DADApy: Distance-based Analysis of DAta-manifolds in Python. (arXiv:2205.03373v2 [cs.LG] UPDATED)
    DADApy is a python software package for analysing and characterising high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics. We review the main functionalities of the package and exemplify its usage in toy cases and in a real-world application. DADApy is freely available under the open-source Apache 2.0 license.  ( 2 min )
    PARNN: A Probabilistic Autoregressive Neural Network Framework for Accurate Forecasting. (arXiv:2204.09640v2 [stat.ML] UPDATED)
    Forecasting time series data represents an emerging field of research in data science and knowledge discovery with vast applications ranging from stock price and energy demand prediction to the early prediction of epidemics. Numerous statistical and machine learning methods have been proposed in the last five decades with the demand for high-quality and reliable forecasts. However, in real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable. Therefore, hybrid solutions are needed to bridge the gap between classical forecasting methods and modern neural network models. In this context, we introduce a Probabilistic AutoRegressive Neural Network (PARNN) model that can handle a wide variety of complex time series data (e.g., nonlinearity, non-seasonal, long-range dependence, and non-stationarity). The proposed PARNN model is built by creating a fusion of an integrated moving average and autoregressive neural network to preserve the explainability, scalability, and ``white-box-like'' prediction behavior of the individuals. Sufficient conditions for asymptotic stationarity and geometric ergodicity are obtained by considering the asymptotic behavior of the associated Markov chain. Unlike advanced deep learning tools, the uncertainty quantification of the PARNN model based on prediction intervals is obtained. During computational experiments, PARNN outperforms standard statistical, machine learning, and deep learning models (e.g., Transformers, NBeats, DeepAR, etc.) on a diverse collection of real-world datasets from macroeconomics, tourism, energy, epidemiology, and others for short-term, medium-term, and long-term forecasting. Multiple comparisons with the best method are carried out to showcase the superiority of the proposal in comparison with the state-of-the-art forecasters over different forecast horizons.  ( 3 min )
    Learn2Weight: Parameter Adaptation against Similar-domain Adversarial Attacks. (arXiv:2205.07315v2 [cs.LG] UPDATED)
    Recent work in black-box adversarial attacks for NLP systems has attracted much attention. Prior black-box attacks assume that attackers can observe output labels from target models based on selected inputs. In this work, inspired by adversarial transferability, we propose a new type of black-box NLP adversarial attack that an attacker can choose a similar domain and transfer the adversarial examples to the target domain and cause poor performance in target model. Based on domain adaptation theory, we then propose a defensive strategy, called Learn2Weight, which trains to predict the weight adjustments for a target model in order to defend against an attack of similar-domain adversarial examples. Using Amazon multi-domain sentiment classification datasets, we empirically show that Learn2Weight is effective against the attack compared to standard black-box defense methods such as adversarial training and defensive distillation. This work contributes to the growing literature on machine learning safety.  ( 2 min )
    Neural network training under semidefinite constraints. (arXiv:2201.00632v3 [cs.LG] UPDATED)
    This paper is concerned with the training of neural networks (NNs) under semidefinite constraints, which allows for NN training with robustness and stability guarantees. In particular, we focus on Lipschitz bounds for NNs. Exploiting the banded structure of the underlying matrix constraint, we set up an efficient and scalable training scheme for NN training problems of this kind based on interior point methods. Our implementation allows to enforce Lipschitz constraints in the training of large-scale deep NNs such as Wasserstein generative adversarial networks (WGANs) via semidefinite constraints. In numerical examples, we show the superiority of our method and its applicability to WGAN training.  ( 2 min )
    Iso-Dream: Isolating Noncontrollable Visual Dynamics in World Models. (arXiv:2205.13817v2 [cs.LG] UPDATED)
    World models learn the consequences of actions in vision-based interactive systems. However, in practical scenarios such as autonomous driving, there commonly exists noncontrollable dynamics independent of the action signals, making it difficult to learn effective world models. To tackle this problem, we present a novel reinforcement learning approach named Iso-Dream, which improves the Dream-to-Control framework in two aspects. First, by optimizing the inverse dynamics, we encourage the world model to learn controllable and noncontrollable sources of spatiotemporal changes on isolated state transition branches. Second, we optimize the behavior of the agent on the decoupled latent imaginations of the world model. Specifically, to estimate state values, we roll-out the noncontrollable states into the future and associate them with the current controllable state. In this way, the isolation of dynamics sources can greatly benefit long-horizon decision-making of the agent, such as a self-driving car that can avoid potential risks by anticipating the movement of other vehicles. Experiments show that Iso-Dream is effective in decoupling the mixed dynamics and remarkably outperforms existing approaches in a wide range of visual control and prediction domains.  ( 2 min )
    Low-Loss Subspace Compression for Clean Gains against Multi-Agent Backdoor Attacks. (arXiv:2203.03692v2 [cs.LG] UPDATED)
    Recent exploration of the multi-agent backdoor attack demonstrated the backfiring effect, a natural defense against backdoor attacks where backdoored inputs are randomly classified. This yields a side-effect of low accuracy w.r.t. clean labels, which motivates this paper's work on the construction of multi-agent backdoor defenses that maximize accuracy w.r.t. clean labels and minimize that of poison labels. Founded upon agent dynamics and low-loss subspace construction, we contribute three defenses that yield improved multi-agent backdoor robustness.  ( 2 min )
    Robust Vector Quantized-Variational Autoencoder. (arXiv:2202.01987v2 [cs.LG] UPDATED)
    Image generative models can learn the distributions of the training data and consequently generate examples by sampling from these distributions. However, when the training dataset is corrupted with outliers, generative models will likely produce examples that are also similar to the outliers. In fact, a small portion of outliers may induce state-of-the-art generative models, such as Vector Quantized-Variational AutoEncoder (VQ-VAE), to learn a significant mode from the outliers. To mitigate this problem, we propose a robust generative model based on VQ-VAE, which we name Robust VQ-VAE (RVQ-VAE). In order to achieve robustness, RVQ-VAE uses two separate codebooks for the inliers and outliers. To ensure the codebooks embed the correct components, we iteratively update the sets of inliers and outliers during each training epoch. To ensure that the encoded data points are matched to the correct codebooks, we quantize using a weighted Euclidean distance, whose weights are determined by directional variances of the codebooks. Both codebooks, together with the encoder and decoder, are trained jointly according to the reconstruction loss and the quantization loss. We experimentally demonstrate that RVQ-VAE is able to generate examples from inliers even if a large portion of the training data points are corrupted.  ( 3 min )
    S-Rocket: Selective Random Convolution Kernels for Time Series Classification. (arXiv:2203.03445v2 [cs.LG] UPDATED)
    Random convolution kernel transform (Rocket) is a fast, efficient, and novel approach for time series feature extraction using a large number of independent randomly initialized 1-D convolution kernels of different configurations. The output of the convolution operation on each time series is represented by a partial positive value (PPV). A concatenation of PPVs from all kernels is the input feature vector to a Ridge regression classifier. Unlike typical deep learning models, the kernels are not trained and there is no weighted/trainable connection between kernels or concatenated features and the classifier. Since these kernels are generated randomly, a portion of these kernels may not positively contribute in performance of the model. Hence, selection of the most important kernels and pruning the redundant and less important ones is necessary to reduce computational complexity and accelerate inference of Rocket for applications on the edge devices. Selection of these kernels is a combinatorial optimization problem. In this paper, we propose a scheme for selecting these kernels while maintaining the classification performance. First, the original model is pre-trained at full capacity. Then, a population of binary candidate state vectors is initialized where each element of a vector represents the active/inactive status of a kernel. A population-based optimization algorithm evolves the population in order to find a best state vector which minimizes the number of active kernels while maximizing the accuracy of the classifier. This activation function is a linear combination of the total number of active kernels and the classification accuracy of the pre-trained classifier with the active kernels. Finally, the selected kernels in the best state vector are utilized to train the Ridge regression classifier with the selected kernels.  ( 3 min )
    Altering Backward Pass Gradients improves Convergence. (arXiv:2111.12495v3 [cs.LG] UPDATED)
    In standard neural network training, the gradients in the backward pass are determined by the forward pass. As a result, the two stages are coupled. This is how most neural networks are trained currently. However, gradient modification in the backward pass has seldom been studied in the literature. In this paper we explore decoupled training, where we alter the gradients in the backward pass. We propose a simple yet powerful method called PowerGrad Transform, that alters the gradients before the weight update in the backward pass and significantly enhances the predictive performance of the neural network. PowerGrad Transform trains the network to arrive at a better optima at convergence. It is computationally extremely efficient, virtually adding no additional cost to either memory or compute, but results in improved final accuracies on both the training and test sets. PowerGrad Transform is easy to integrate into existing training routines, requiring just a few lines of code. PowerGrad Transform accelerates training and makes it possible for the network to better fit the training data. With decoupled training, PowerGrad Transform improves baseline accuracies for ResNet-50 by 0.73%, for SE-ResNet-50 by 0.66% and by more than 1.0% for the non-normalized ResNet-18 network on the ImageNet classification task.  ( 3 min )
    Signal Decomposition Using Masked Proximal Operators. (arXiv:2202.09338v6 [cs.LG] UPDATED)
    We consider the well-studied problem of decomposing a vector time series signal into components with different characteristics, such as smooth, periodic, nonnegative, or sparse. We describe a simple and general framework in which the components are defined by loss functions (which include constraints), and the signal decomposition is carried out by minimizing the sum of losses of the components (subject to the constraints). When each loss function is the negative log-likelihood of a density for the signal component, this framework coincides with maximum a posteriori probability (MAP) estimation; but it also includes many other interesting cases. Summarizing and clarifying prior results, we give two distributed optimization methods for computing the decomposition, which find the optimal decomposition when the component class loss functions are convex, and are good heuristics when they are not. Both methods require only the masked proximal operator of each of the component loss functions, a generalization of the well-known proximal operator that handles missing entries in its argument. Both methods are distributed, i.e., handle each component separately. We derive tractable methods for evaluating the masked proximal operators of some loss functions that, to our knowledge, have not appeared in the literature.  ( 3 min )
    Deep Learning for Simultaneous Inference of Hydraulic and Transport Properties. (arXiv:2110.12367v2 [cs.LG] UPDATED)
    Identifying the heterogeneous conductivity field and reconstructing the contaminant release history are key aspects of subsurface remediation. Achieving these two goals with limited and noisy hydraulic head and concentration measurements is challenging. The obstacles include solving an inverse problem for high-dimensional parameters, and the high-computational cost needed for the repeated forward modeling. We use a convolutional adversarial autoencoder (CAAE) for the parameterization of the heterogeneous non-Gaussian conductivity field with a low-dimensional latent representation. Additionally, we trained a three-dimensional dense convolutional encoder-decoder (DenseED) network to serve as the forward surrogate for the flow and transport processes. Combining the CAAE and DenseED forward surrogate models, the ensemble smoother with multiple data assimilation (ESMDA) algorithm is used to sample from the Bayesian posterior distribution of the unknown parameters, forming a CAAE-DenseED-ESMDA inversion framework. We applied this CAAE-DenseED-ESMDA inversion framework in a three-dimensional contaminant source and conductivity field identification problem. A comparison of the inversion results from CAAE-ESMDA with physical flow and transport simulator and CAAE-DenseED-ESMDA is provided, showing that accurate reconstruction results were achieved with a much higher computational efficiency.  ( 3 min )
    LINGUIST: Language Model Instruction Tuning to Generate Annotated Utterances for Intent Classification and Slot Tagging. (arXiv:2209.09900v1 [cs.CL])
    We present LINGUIST, a method for generating annotated data for Intent Classification and Slot Tagging (IC+ST), via fine-tuning AlexaTM 5B, a 5-billion-parameter multilingual sequence-to-sequence (seq2seq) model, on a flexible instruction prompt. In a 10-shot novel intent setting for the SNIPS dataset, LINGUIST surpasses state-of-the-art approaches (Back-Translation and Example Extrapolation) by a wide margin, showing absolute improvement for the target intents of +1.9 points on IC Recall and +2.5 points on ST F1 Score. In the zero-shot cross-lingual setting of the mATIS++ dataset, LINGUIST out-performs a strong baseline of Machine Translation with Slot Alignment by +4.14 points absolute on ST F1 Score across 6 languages, while matching performance on IC. Finally, we verify our results on an internal large-scale multilingual dataset for conversational agent IC+ST and show significant improvements over a baseline which uses Back-Translation, Paraphrasing and Slot Catalog Resampling. To our knowledge, we are the first to demonstrate instruction fine-tuning of a large-scale seq2seq model to control the outputs of multilingual intent- and slot-labeled data generation.  ( 2 min )
    Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data. (arXiv:2202.05928v3 [cs.LG] UPDATED)
    Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.  ( 2 min )
    Environmental Sound Classification on the Edge: A Pipeline for Deep Acoustic Networks on Extremely Resource-Constrained Devices. (arXiv:2103.03483v4 [cs.SD] UPDATED)
    Significant efforts are being invested to bring state-of-the-art classification and recognition to edge devices with extreme resource constraints (memory, speed, and lack of GPU support). Here, we demonstrate the first deep network for acoustic recognition that is small, flexible and compression-friendly yet achieves state-of-the-art performance for raw audio classification. Rather than handcrafting a once-off solution, we present a generic pipeline that automatically converts a large deep convolutional network via compression and quantization into a network for resource-impoverished edge devices. After introducing ACDNet, which produces above state-of-the-art accuracy on ESC-10 (96.65%), ESC-50 (87.10%), UrbanSound8K (84.45%) and AudioEvent (92.57%), we describe the compression pipeline and show that it allows us to achieve 97.22% size reduction and 97.28% FLOP reduction while maintaining close to state-of-the-art accuracy 96.25%, 83.65%, 78.27% and 89.69% on these datasets. We describe a successful implementation on a standard off-the-shelf microcontroller and, beyond laboratory benchmarks, report successful tests on real-world datasets.  ( 3 min )
    Sharing to learn and learning to share -- Fitting together Meta-Learning, Multi-Task Learning, and Transfer Learning: A meta review. (arXiv:2111.12146v3 [cs.LG] UPDATED)
    Integrating knowledge across different domains is an essential feature of human learning. Learning paradigms such as transfer learning, meta learning, and multi-task learning reflect the human learning process by exploiting the prior knowledge for new tasks, encouraging faster learning and good generalization for new tasks. This article gives a detailed view of these learning paradigms and their comparative analysis. The weakness of one learning algorithm turns out to be a strength of another, and thus merging them is a prevalent trait in the literature. There are numerous research papers that focus on each of these learning paradigms separately and provide a comprehensive overview of them. However, this article provides a review of research studies that combine (two of) these learning algorithms. This survey describes how these techniques are combined to solve problems in many different fields of study, including computer vision, natural language processing, hyperspectral imaging, and many more. As a result, the global generic learning network an amalgamation of meta learning, transfer learning, and multi-task learning is introduced here, along with some open research questions and future research directions in the multi-task setting.  ( 3 min )
    Optimal learning of quantum Hamiltonians from high-temperature Gibbs states. (arXiv:2108.04842v2 [quant-ph] UPDATED)
    We study the problem of learning a Hamiltonian $H$ to precision $\varepsilon$, supposing we are given copies of its Gibbs state $\rho=\exp(-\beta H)/\operatorname{Tr}(\exp(-\beta H))$ at a known inverse temperature $\beta$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (Nature Physics, 2021, arXiv:2004.07266) recently studied the sample complexity (number of copies of $\rho$ needed) of this problem for geometrically local $N$-qubit Hamiltonians. In the high-temperature (low $\beta$) regime, their algorithm has sample complexity poly$(N, 1/\beta,1/\varepsilon)$ and can be implemented with polynomial, but suboptimal, time complexity. In this paper, we study the same question for a more general class of Hamiltonians. We show how to learn the coefficients of a Hamiltonian to error $\varepsilon$ with sample complexity $S = O(\log N/(\beta\varepsilon)^{2})$ and time complexity linear in the sample size, $O(S N)$. Furthermore, we prove a matching lower bound showing that our algorithm's sample complexity is optimal, and hence our time complexity is also optimal. In the appendix, we show that virtually the same algorithm can be used to learn $H$ from a real-time evolution unitary $e^{-it H}$ in a small $t$ regime with similar sample and time complexity.  ( 3 min )
    Acme: A Research Framework for Distributed Reinforcement Learning. (arXiv:2006.00979v2 [cs.LG] UPDATED)
    Deep reinforcement learning (RL) has led to many recent and groundbreaking advances. However, these advances have often come at the cost of both increased scale in the underlying architectures being trained as well as increased complexity of the RL algorithms used to train them. These increases have in turn made it more difficult for researchers to rapidly prototype new ideas or reproduce published RL algorithms. To address these concerns this work describes Acme, a framework for constructing novel RL algorithms that is specifically designed to enable agents that are built using simple, modular components that can be used at various scales of execution. While the primary goal of Acme is to provide a framework for algorithm development, a secondary goal is to provide simple reference implementations of important or state-of-the-art algorithms. These implementations serve both as a validation of our design decisions as well as an important contribution to reproducibility in RL research. In this work we describe the major design decisions made within Acme and give further details as to how its components can be used to implement various algorithms. Our experiments provide baselines for a number of common and state-of-the-art algorithms as well as showing how these algorithms can be scaled up for much larger and more complex environments. This highlights one of the primary advantages of Acme, namely that it can be used to implement large, distributed RL algorithms that can run at massive scales while still maintaining the inherent readability of that implementation. This work presents a second version of the paper which coincides with an increase in modularity, additional emphasis on offline, imitation and learning from demonstrations algorithms, as well as various new agents implemented as part of Acme.  ( 3 min )
    Big Data Analytics for Network Level Short-Term Travel Time Prediction with Hierarchical LSTM and Attention. (arXiv:2201.05760v2 [cs.LG] UPDATED)
    The travel time data collected from widespread traffic monitoring sensors necessitate big data analytic tools for querying, visualization, and identifying meaningful traffic patterns. This paper utilizes a large-scale travel time dataset from Caltrans Performance Measurement System (PeMS) system that is an overflow for traditional data processing and modeling tools. To overcome the challenges of the massive amount of data, the big data analytic engines Apache Spark and Apache MXNet are applied for data wrangling and modeling. Seasonality and autocorrelation were performed to explore and visualize the trend of time-varying data. Inspired by the success of the hierarchical architecture for many Artificial Intelligent (AI) tasks, we consolidate the cell and hidden states passed from low-level to the high-level LSTM with an attention pooling similar to how the human perception system operates. The designed hierarchical LSTM model can consider the dependencies at different time scales to capture the spatial-temporal correlations of network-level travel time. Another self-attention module is then devised to connect LSTM extracted features to the fully connected layers, predicting travel time for all corridors instead of a single link/route. The comparison results show that the Hierarchical LSTM with Attention (HierLSTMat) model gives the best prediction results at 30-minute and 45-min horizons and can successfully forecast unusual congestion. The efficiency gained from big data analytic tools was evaluated by comparing them with popular data science and deep learning frameworks.  ( 3 min )
    Risk Verification of Stochastic Systems with Neural Network Controllers. (arXiv:2209.09881v1 [eess.SY])
    Motivated by the fragility of neural network (NN) controllers in safety-critical applications, we present a data-driven framework for verifying the risk of stochastic dynamical systems with NN controllers. Given a stochastic control system, an NN controller, and a specification equipped with a notion of trace robustness (e.g., constraint functions or signal temporal logic), we collect trajectories from the system that may or may not satisfy the specification. In particular, each of the trajectories produces a robustness value that indicates how well (severely) the specification is satisfied (violated). We then compute risk metrics over these robustness values to estimate the risk that the NN controller will not satisfy the specification. We are further interested in quantifying the difference in risk between two systems, and we show how the risk estimated from a nominal system can provide an upper bound the risk of a perturbed version of the system. In particular, the tightness of this bound depends on the closeness of the systems in terms of the closeness of their system trajectories. For Lipschitz continuous and incrementally input-to-state stable systems, we show how to exactly quantify system closeness with varying degrees of conservatism, while we estimate system closeness for more general systems from data in our experiments. We demonstrate our risk verification approach on two case studies, an underwater vehicle and an F1/10 autonomous car.  ( 3 min )
    Learning Green's Functions of Linear Reaction-Diffusion Equations with Application to Fast Numerical Solver. (arXiv:2105.11045v2 [cs.LG] UPDATED)
    Partial differential equations are often used to model various physical phenomena, such as heat diffusion, wave propagation, fluid dynamics, elasticity, electrodynamics and image processing, and many analytic approaches or traditional numerical methods have been developed and widely used for their solutions. Inspired by rapidly growing impact of deep learning on scientific and engineering research, in this paper we propose a novel neural network, GF-Net, for learning the Green's functions of linear reaction-diffusion equations in an unsupervised fashion. The proposed method overcomes the challenges for finding the Green's functions of the equations on arbitrary domains by utilizing physics-informed approach and the symmetry of the Green's function. As a consequence, it particularly leads to an efficient way for solving the target equations under different boundary conditions and sources. We also demonstrate the effectiveness of the proposed approach by experiments in square, annular and L-shape domains.  ( 2 min )
    Soft Action Priors: Towards Robust Policy Transfer. (arXiv:2209.09882v1 [cs.LG])
    Despite success in many challenging problems, reinforcement learning (RL) is still confronted with sample inefficiency, which can be mitigated by introducing prior knowledge to agents. However, many transfer techniques in reinforcement learning make the limiting assumption that the teacher is an expert. In this paper, we use the action prior from the Reinforcement Learning as Inference framework - that is, a distribution over actions at each state which resembles a teacher policy, rather than a Bayesian prior - to recover state-of-the-art policy distillation techniques. Then, we propose a class of adaptive methods that can robustly exploit action priors by combining reward shaping and auxiliary regularization losses. In contrast to prior work, we develop algorithms for leveraging suboptimal action priors that may nevertheless impart valuable knowledge - which we call soft action priors. The proposed algorithms adapt by adjusting the strength of teacher feedback according to an estimate of the teacher's usefulness in each state. We perform tabular experiments, which show that the proposed methods achieve state-of-the-art performance, surpassing it when learning from suboptimal priors. Finally, we demonstrate the robustness of the adaptive algorithms in continuous action deep RL problems, in which adaptive algorithms considerably improved stability when compared to existing policy distillation methods.  ( 2 min )
    Integer Fine-tuning of Transformer-based Models. (arXiv:2209.09815v1 [cs.LG])
    Transformer based models are used to achieve state-of-the-art performance on various deep learning tasks. Since transformer-based models have large numbers of parameters, fine-tuning them on downstream tasks is computationally intensive and energy hungry. Automatic mixed-precision FP32/FP16 fine-tuning of such models has been previously used to lower the compute resource requirements. However, with the recent advances in the low-bit integer back-propagation, it is possible to further reduce the computation and memory foot-print. In this work, we explore a novel integer training method that uses integer arithmetic for both forward propagation and gradient computation of linear, convolutional, layer-norm, and embedding layers in transformer-based models. Furthermore, we study the effect of various integer bit-widths to find the minimum required bit-width for integer fine-tuning of transformer-based models. We fine-tune BERT and ViT models on popular downstream tasks using integer layers. We show that 16-bit integer models match the floating-point baseline performance. Reducing the bit-width to 10, we observe 0.5 average score drop. Finally, further reduction of the bit-width to 8 provides an average score drop of 1.7 points.  ( 2 min )
    Safe Exploration in Model-based Reinforcement Learning using Control Barrier Functions. (arXiv:2104.08171v4 [cs.LG] UPDATED)
    This paper develops a model-based reinforcement learning (MBRL) framework for learning online the value function of an infinite-horizon optimal control problem while obeying safety constraints expressed as control barrier functions (CBFs). Our approach is facilitated by the development of a novel class of CBFs, termed Lyapunov-like CBFs (LCBFs), that retain the beneficial properties of CBFs for developing minimally-invasive safe control policies while also possessing desirable Lyapunov-like qualities such as positive semi-definiteness. We show how these LCBFs can be used to augment a learning-based control policy to guarantee safety and then leverage this approach to develop a safe exploration framework in a MBRL setting. We demonstrate that our approach can handle more general safety constraints than comparative methods via numerical examples.  ( 2 min )
    An Inertial Block Majorization Minimization Framework for Nonsmooth Nonconvex Optimization. (arXiv:2010.12133v3 [math.OC] UPDATED)
    In this paper, we introduce TITAN, a novel inerTIal block majorizaTion minimizAtioN framework for non-smooth non-convex optimization problems. To the best of our knowledge, TITAN is the first framework of block-coordinate update method that relies on the majorization-minimization framework while embedding inertial force to each step of the block updates. The inertial force is obtained via an extrapolation operator that subsumes heavy-ball and Nesterov-type accelerations for block proximal gradient methods as special cases. By choosing various surrogate functions, such as proximal, Lipschitz gradient, Bregman, quadratic, and composite surrogate functions, and by varying the extrapolation operator, TITAN produces a rich set of inertial block-coordinate update methods. We study sub-sequential convergence as well as global convergence for the generated sequence of TITAN. We illustrate the effectiveness of TITAN on two important machine learning problems, namely sparse non-negative matrix factorization and matrix completion.  ( 2 min )
    Physical Logic Enhanced Network for Small-Sample Bi-Layer Metallic Tubes Bending Springback Prediction. (arXiv:2209.09870v1 [cs.LG])
    Bi-layer metallic tube (BMT) plays an extremely crucial role in engineering applications, with rotary draw bending (RDB) the high-precision bending processing can be achieved, however, the product will further springback. Due to the complex structure of BMT and the high cost of dataset acquisi-tion, the existing methods based on mechanism research and machine learn-ing cannot meet the engineering requirements of springback prediction. Based on the preliminary mechanism analysis, a physical logic enhanced network (PE-NET) is proposed. The architecture includes ES-NET which equivalent the BMT to the single-layer tube, and SP-NET for the final predic-tion of springback with sufficient single-layer tube samples. Specifically, in the first stage, with the theory-driven pre-exploration and the data-driven pretraining, the ES-NET and SP-NET are constructed, respectively. In the second stage, under the physical logic, the PE-NET is assembled by ES-NET and SP-NET and then fine-tuned with the small sample BMT dataset and composite loss function. The validity and stability of the proposed method are verified by the FE simulation dataset, the small-sample dataset BMT springback angle prediction is achieved, and the method potential in inter-pretability and engineering applications are demonstrated.  ( 2 min )
    Dynamic Graph Message Passing Networks for Visual Recognition. (arXiv:2209.09760v1 [cs.CV])
    Modelling long-range dependencies is critical for scene understanding tasks in computer vision. Although convolution neural networks (CNNs) have excelled in many vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels. A fully-connected graph, such as the self-attention operation in Transformers, is beneficial for such modelling, however, its computational overhead is prohibitive. In this paper, we propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. This is achieved by adaptively sampling nodes in the graph, conditioned on the input, for message passing. Based on the sampled nodes, we dynamically predict node-dependent filter weights and the affinity matrix for propagating information between them. This formulation allows us to design a self-attention module, and more importantly a new Transformer-based backbone network, that we use for both image classification pretraining, and for addressing various downstream tasks (object detection, instance and semantic segmentation). Using this model, we show significant improvements with respect to strong, state-of-the-art baselines on four different tasks. Our approach also outperforms fully-connected graphs while using substantially fewer floating-point operations and parameters. Code and models will be made publicly available at https://github.com/fudan-zvg/DGMN2  ( 2 min )
    Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL. (arXiv:2209.09845v1 [cs.LG])
    The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents.  ( 2 min )
    emojiSpace: Spatial Representation of Emojis. (arXiv:2209.09871v1 [cs.CL])
    In the absence of nonverbal cues during messaging communication, users express part of their emotions using emojis. Thus, having emojis in the vocabulary of text messaging language models can significantly improve many natural language processing (NLP) applications such as online communication analysis. On the other hand, word embedding models are usually trained on a very large corpus of text such as Wikipedia or Google News datasets that include very few samples with emojis. In this study, we create emojiSpace, which is a combined word-emoji embedding using the word2vec model from the Genism library in Python. We trained emojiSpace on a corpus of more than 4 billion tweets and evaluated it by implementing sentiment analysis on a Twitter dataset containing more than 67 million tweets as an extrinsic task. For this task, we compared the performance of two different classifiers of random forest (RF) and linear support vector machine (SVM). For evaluation, we compared emojiSpace performance with two other pre-trained embeddings and demonstrated that emojiSpace outperforms both.
    Predictive Scale-Bridging Simulations through Active Learning. (arXiv:2209.09811v1 [cs.LG])
    Throughout computational science, there is a growing need to utilize the continual improvements in raw computational horsepower to achieve greater physical fidelity through scale-bridging over brute-force increases in the number of mesh elements. For instance, quantitative predictions of transport in nanoporous media, critical to hydrocarbon extraction from tight shale formations, are impossible without accounting for molecular-level interactions. Similarly, inertial confinement fusion simulations rely on numerical diffusion to simulate molecular effects such as non-local transport and mixing without truly accounting for molecular interactions. With these two disparate applications in mind, we develop a novel capability which uses an active learning approach to optimize the use of local fine-scale simulations for informing coarse-scale hydrodynamics. Our approach addresses three challenges: forecasting continuum coarse-scale trajectory to speculatively execute new fine-scale molecular dynamics calculations, dynamically updating coarse-scale from fine-scale calculations, and quantifying uncertainty in neural network models.  ( 2 min )
    A Deep Reinforcement Learning-Based Charging Scheduling Approach with Augmented Lagrangian for Electric Vehicle. (arXiv:2209.09772v1 [cs.AI])
    This paper addresses the problem of optimizing charging/discharging schedules of electric vehicles (EVs) when participate in demand response (DR). As there exist uncertainties in EVs' remaining energy, arrival and departure time, and future electricity prices, it is quite difficult to make charging decisions to minimize charging cost while guarantee that the EV's battery state-of-the-charge (SOC) is within certain range. To handle with this dilemma, this paper formulates the EV charging scheduling problem as a constrained Markov decision process (CMDP). By synergistically combining the augmented Lagrangian method and soft actor critic algorithm, a novel safe off-policy reinforcement learning (RL) approach is proposed in this paper to solve the CMDP. The actor network is updated in a policy gradient manner with the Lagrangian value function. A double-critics network is adopted to synchronously estimate the action-value function to avoid overestimation bias. The proposed algorithm does not require strong convexity guarantee of examined problems and is sample efficient. Comprehensive numerical experiments with real-world electricity price demonstrate that our proposed algorithm can achieve high solution optimality and constraints compliance.  ( 2 min )
    PainPoints: A Framework for Language-based Detection of Chronic Pain and Expert-Collaborative Text-Summarization. (arXiv:2209.09814v1 [cs.CL])
    Chronic pain is a pervasive disorder which is often very disabling and is associated with comorbidities such as depression and anxiety. Neuropathic Pain (NP) is a common sub-type which is often caused due to nerve damage and has a known pathophysiology. Another common sub-type is Fibromyalgia (FM) which is described as musculoskeletal, diffuse pain that is widespread through the body. The pathophysiology of FM is poorly understood, making it very hard to diagnose. Standard medications and treatments for FM and NP differ from one another and if misdiagnosed it can cause an increase in symptom severity. To overcome this difficulty, we propose a novel framework, PainPoints, which accurately detects the sub-type of pain and generates clinical notes via summarizing the patient interviews. Specifically, PainPoints makes use of large language models to perform sentence-level classification of the text obtained from interviews of FM and NP patients with a reliable AUC of 0.83. Using a sufficiency-based interpretability approach, we explain how the fine-tuned model accurately picks up on the nuances that patients use to describe their pain. Finally, we generate summaries of these interviews via expert interventions by introducing a novel facet-based approach. PainPoints thus enables practitioners to add/drop facets and generate a custom summary based on the notion of "facet-coverage" which is also introduced in this work.  ( 3 min )
    Neural Graph Databases. (arXiv:2209.09732v1 [cs.LG])
    Graph databases (GDBs) enable processing and analysis of unstructured, complex, rich, and usually vast graph datasets. Despite the large significance of GDBs in both academia and industry, little effort has been made into integrating them with the predictive power of graph neural networks (GNNs). In this work, we show how to seamlessly combine nearly any GNN model with the computational capabilities of GDBs. For this, we observe that the majority of these systems are based on, or support, a graph data model called the Labeled Property Graph (LPG), where vertices and edges can have arbitrarily complex sets of labels and properties. We then develop LPG2vec, an encoder that transforms an arbitrary LPG dataset into a representation that can be directly used with a broad class of GNNs, including convolutional, attentional, message-passing, and even higher-order or spectral models. In our evaluation, we show that the rich information represented as LPG labels and properties is properly preserved by LPG2vec, and it increases the accuracy of predictions regardless of the targeted learning task or the used GNN model, by up to 34% compared to graphs with no LPG labels/properties. In general, LPG2vec enables combining predictive power of the most powerful GNNs with the full scope of information encoded in the LPG model, paving the way for neural graph databases, a class of systems where the vast complexity of maintained data will benefit from modern and future graph machine learning methods.  ( 3 min )
    Deep Physics Corrector: A physics enhanced deep learning architecture for solving stochastic differential equations. (arXiv:2209.09750v1 [stat.ML])
    We propose a novel gray-box modeling algorithm for physical systems governed by stochastic differential equations (SDE). The proposed approach, referred to as the Deep Physics Corrector (DPC), blends approximate physics represented in terms of SDE with deep neural network (DNN). The primary idea here is to exploit DNN to model the missing physics. We hypothesize that combining incomplete physics with data will make the model interpretable and allow better generalization. The primary bottleneck associated with training surrogate models for stochastic simulators is often associated with selecting the suitable loss function. Among the different loss functions available in the literature, we use the conditional maximum mean discrepancy (CMMD) loss function in DPC because of its proven performance. Overall, physics-data fusion and CMMD allow DPC to learn from sparse data. We illustrate the performance of the proposed DPC on four benchmark examples from the literature. The results obtained are highly accurate, indicating its possible application as a surrogate model for stochastic simulators.  ( 2 min )
    ESTA: An Esports Trajectory and Action Dataset. (arXiv:2209.09861v1 [cs.LG])
    Sports, due to their global reach and impact-rich prediction tasks, are an exciting domain to deploy machine learning models. However, data from conventional sports is often unsuitable for research use due to its size, veracity, and accessibility. To address these issues, we turn to esports, a growing domain that encompasses video games played in a capacity similar to conventional sports. Since esports data is acquired through server logs rather than peripheral sensors, esports provides a unique opportunity to obtain a massive collection of clean and detailed spatiotemporal data, similar to those collected in conventional sports. To parse esports data, we develop awpy, an open-source esports game log parsing library that can extract player trajectories and actions from game logs. Using awpy, we parse 8.6m actions, 7.9m game frames, and 417k trajectories from 1,558 game logs from professional Counter-Strike tournaments to create the Esports Trajectory and Actions (ESTA) dataset. ESTA is one of the largest and most granular publicly available sports data sets to date. We use ESTA to develop benchmarks for win prediction using player-specific information. The ESTA data is available at https://github.com/pnxenopoulos/esta and awpy is made public through PyPI.  ( 2 min )
    FedToken: Tokenized Incentives for Data Contribution in Federated Learning. (arXiv:2209.09775v1 [cs.LG])
    Incentives that compensate for the involved costs in the decentralized training of a Federated Learning (FL) model act as a key stimulus for clients' long-term participation. However, it is challenging to convince clients for quality participation in FL due to the absence of: (i) full information on the client's data quality and properties; (ii) the value of client's data contributions; and (iii) the trusted mechanism for monetary incentive offers. This often leads to poor efficiency in training and communication. While several works focus on strategic incentive designs and client selection to overcome this problem, there is a major knowledge gap in terms of an overall design tailored to the foreseen digital economy, including Web 3.0, while simultaneously meeting the learning objectives. To address this gap, we propose a contribution-based tokenized incentive scheme, namely \texttt{FedToken}, backed by blockchain technology that ensures fair allocation of tokens amongst the clients that corresponds to the valuation of their data during model training. Leveraging the engineered Shapley-based scheme, we first approximate the contribution of local models during model aggregation, then strategically schedule clients lowering the communication rounds for convergence and anchor ways to allocate \emph{affordable} tokens under a constrained monetary budget. Extensive simulations demonstrate the efficacy of our proposed method.  ( 3 min )
    Training an Assassin AI for The Resistance: Avalon. (arXiv:2209.09331v1 [cs.LG])
    The Resistance: Avalon is a partially observable social deduction game. This area of AI game playing is fairly undeveloped. Implementing an AI for this game involves multiple components specific to each phase as well as role in the game. In this paper, we plan to iteratively develop the required components for each role/phase by first addressing the Assassination phase which can be modeled as a machine learning problem. Using a publicly available dataset from an online version of the game, we train classifiers that emulate an Assassin. After trying various classification techniques, we are able to achieve above average human performance using a simple linear support vector classifier. The eventual goal of this project is to pursue developing an intelligent and complete Avalon player that can play through each phase of the game as any role.  ( 2 min )
    Extremely Simple Activation Shaping for Out-of-Distribution Detection. (arXiv:2209.09858v1 [cs.LG])
    The separation between training and deployment of machine learning models implies that not all scenarios encountered in deployment can be anticipated during training, and therefore relying solely on advancements in training has its limits. Out-of-distribution (OOD) detection is an important area that stress-tests a model's ability to handle unseen situations: Do models know when they don't know? Existing OOD detection methods either incur extra training steps, additional data or make nontrivial modifications to the trained network. In contrast, in this work, we propose an extremely simple, post-hoc, on-the-fly activation shaping method, ASH, where a large portion (e.g. 90%) of a sample's activation at a late layer is removed, and the rest (e.g. 10%) simplified or lightly adjusted. The shaping is applied at inference time, and does not require any statistics calculated from training data. Experiments show that such a simple treatment enhances in-distribution and out-of-distribution sample distinction so as to allow state-of-the-art OOD detection on ImageNet, and does not noticeably deteriorate the in-distribution accuracy. We release alongside the paper two calls for explanation and validation, believing the collective power to further validate and understand the discovery. Calls, video and code can be found at: https://andrijazz.github.io/ash  ( 2 min )
    Relaxed Attention for Transformer Models. (arXiv:2209.09735v1 [cs.LG])
    The powerful modeling capabilities of all-attention-based transformer architectures often cause overfitting and - for natural language processing tasks - lead to an implicitly learned internal language model in the autoregressive transformer decoder complicating the integration of external language models. In this paper, we explore relaxed attention, a simple and easy-to-implement smoothing of the attention weights, yielding a two-fold improvement to the general transformer architecture: First, relaxed attention provides regularization when applied to the self-attention layers in the encoder. Second, we show that it naturally supports the integration of an external language model as it suppresses the implicitly learned internal language model by relaxing the cross attention in the decoder. We demonstrate the benefit of relaxed attention across several tasks with clear improvement in combination with recent benchmark approaches. Specifically, we exceed the former state-of-the-art performance of 26.90% word error rate on the largest public lip-reading LRS3 benchmark with a word error rate of 26.31%, as well as we achieve a top-performing BLEU score of 37.67 on the IWSLT14 (DE$\rightarrow$EN) machine translation task without external language models and virtually no additional model parameters. Code and models will be made publicly available.  ( 2 min )
    Sparse Vicious Attacks on Graph Neural Networks. (arXiv:2209.09688v1 [cs.LG])
    Graph Neural Networks (GNNs) have proven to be successful in several predictive modeling tasks for graph-structured data. Amongst those tasks, link prediction is one of the fundamental problems for many real-world applications, such as recommender systems. However, GNNs are not immune to adversarial attacks, i.e., carefully crafted malicious examples that are designed to fool the predictive model. In this work, we focus on a specific, white-box attack to GNN-based link prediction models, where a malicious node aims to appear in the list of recommended nodes for a given target victim. To achieve this goal, the attacker node may also count on the cooperation of other existing peers that it directly controls, namely on the ability to inject a number of ``vicious'' nodes in the network. Specifically, all these malicious nodes can add new edges or remove existing ones, thereby perturbing the original graph. Thus, we propose SAVAGE, a novel framework and a method to mount this type of link prediction attacks. SAVAGE formulates the adversary's goal as an optimization task, striking the balance between the effectiveness of the attack and the sparsity of malicious resources required. Extensive experiments conducted on real-world and synthetic datasets demonstrate that adversarial attacks implemented through SAVAGE indeed achieve high attack success rate yet using a small amount of vicious nodes. Finally, despite those attacks require full knowledge of the target model, we show that they are successfully transferable to other black-box methods for link prediction.  ( 3 min )
    Symbolic Regression with Fast Function Extraction and Nonlinear Least Squares Optimization. (arXiv:2209.09675v1 [cs.LG])
    Fast Function Extraction (FFX) is a deterministic algorithm for solving symbolic regression problems. We improve the accuracy of FFX by adding parameters to the arguments of nonlinear functions. Instead of only optimizing linear parameters, we optimize these additional nonlinear parameters with separable nonlinear least squared optimization using a variable projection algorithm. Both FFX and our new algorithm is applied on the PennML benchmark suite. We show that the proposed extensions of FFX leads to higher accuracy while providing models of similar length and with only a small increase in runtime on the given data. Our results are compared to a large set of regression methods that were already published for the given benchmark suite.  ( 2 min )
    Predicting Performances of Mutual Funds using Deep Learning and Ensemble Techniques. (arXiv:2209.09649v1 [q-fin.ST])
    Predicting fund performance is beneficial to both investors and fund managers, and yet is a challenging task. In this paper, we have tested whether deep learning models can predict fund performance more accurately than traditional statistical techniques. Fund performance is typically evaluated by the Sharpe ratio, which represents the risk-adjusted performance to ensure meaningful comparability across funds. We calculated the annualised Sharpe ratios based on the monthly returns time series data for more than 600 open-end mutual funds investing in listed large-cap equities in the United States. We find that long short-term memory (LSTM) and gated recurrent units (GRUs) deep learning methods, both trained with modern Bayesian optimization, provide higher accuracy in forecasting funds' Sharpe ratios than traditional statistical ones. An ensemble method, which combines forecasts from LSTM and GRUs, achieves the best performance of all models. There is evidence to say that deep learning and ensembling offer promising solutions in addressing the challenge of fund performance forecasting.  ( 2 min )
    Explainable Clustering via Exemplars: Complexity and Efficient Approximation Algorithms. (arXiv:2209.09670v1 [cs.AI])
    Explainable AI (XAI) is an important developing area but remains relatively understudied for clustering. We propose an explainable-by-design clustering approach that not only finds clusters but also exemplars to explain each cluster. The use of exemplars for understanding is supported by the exemplar-based school of concept definition in psychology. We show that finding a small set of exemplars to explain even a single cluster is computationally intractable; hence, the overall problem is challenging. We develop an approximation algorithm that provides provable performance guarantees with respect to clustering quality as well as the number of exemplars used. This basic algorithm explains all the instances in every cluster whilst another approximation algorithm uses a bounded number of exemplars to allow simpler explanations and provably covers a large fraction of all the instances. Experimental results show that our work is useful in domains involving difficult to understand deep embeddings of images and text.  ( 2 min )
    Testing Rare Downstream Safety Violations via Upstream Adaptive Sampling of Perception Error Models. (arXiv:2209.09674v1 [cs.RO])
    Testing black-box perceptual-control systems in simulation faces two difficulties. Firstly, perceptual inputs in simulation lack the fidelity of real-world sensor inputs. Secondly, for a reasonably accurate perception system, encountering a rare failure trajectory may require running infeasibly many simulations. This paper combines perception error models -- surrogates for a sensor-based detection system -- with state-dependent adaptive importance sampling. This allows us to efficiently assess the rare failure probabilities for real-world perceptual control systems within simulation. Our experiments with an autonomous braking system equipped with an RGB obstacle-detector show that our method can calculate accurate failure probabilities with an inexpensive number of simulations. Further, we show how choice of safety metric can influence the process of learning proposal distributions capable of reliably sampling high-probability failures.  ( 2 min )
    A Multi-Layer Regression based Predicable Function Fitting Network. (arXiv:2209.09647v1 [cs.LG])
    Function plays an important role in mathematics and many science branches. As the fast development of computer technology, more and more study on computational function analysis, e.g., Fast Fourier Transform, Wavelet Transform, Curve Function, are presented in these years. However, there are two main problems in these approaches: 1) hard to handle the complex functions of stationary and non-stationary, periodic and non-periodic, high order and low order; 2) hard to generalize the fitting functions from training data to test data. In this paper, a multiple regression based function fitting network that solves the two main problems is introduced as a predicable function fitting technique. This technique constructs the network includes three main parts: 1) the stationary transform layer, 2) the feature encoding layers, and 3) the fine tuning regression layer. The stationary transform layer recognizes the order of input function data, and transforms non-stationary function to stationary function. The feature encoding layers encode the raw input sequential data to a novel linear regression feature that can capture both the structural and the temporal characters of the sequential data. The fine tuning regression layer then fits the features to the target ahead values. The fitting network with the linear regression feature layers and a non-linear regression layer come up with high quality fitting results and generalizable predictions. The experiments of both mathematic function examples and the real word function examples verifies the efficiency of the proposed technique.  ( 3 min )
    Detection of Malicious Websites Using Machine Learning Techniques. (arXiv:2209.09630v1 [cs.CR])
    In detecting malicious websites, a common approach is the use of blacklists which are not exhaustive in themselves and are unable to generalize to new malicious sites. Detecting newly encountered malicious websites automatically will help reduce the vulnerability to this form of attack. In this study, we explored the use of ten machine learning models to classify malicious websites based on lexical features and understand how they generalize across datasets. Specifically, we trained, validated, and tested these models on different sets of datasets and then carried out a cross-datasets analysis. From our analysis, we found that K-Nearest Neighbor is the only model that performs consistently high across datasets. Other models such as Random Forest, Decision Trees, Logistic Regression, and Support Vector Machines also consistently outperform a baseline model of predicting every link as malicious across all metrics and datasets. Also, we found no evidence that any subset of lexical features generalizes across models or datasets. This research should be relevant to cybersecurity professionals and academic researchers as it could form the basis for real-life detection systems or further research work.  ( 2 min )
    A Secure Healthcare 5.0 System Based on Blockchain Technology Entangled with Federated Learning Technique. (arXiv:2209.09642v1 [cs.LG])
    In recent years, the global Internet of Medical Things (IoMT) industry has evolved at a tremendous speed. Security and privacy are key concerns on the IoMT, owing to the huge scale and deployment of IoMT networks. Machine learning (ML) and blockchain (BC) technologies have significantly enhanced the capabilities and facilities of healthcare 5.0, spawning a new area known as "Smart Healthcare." By identifying concerns early, a smart healthcare system can help avoid long-term damage. This will enhance the quality of life for patients while reducing their stress and healthcare costs. The IoMT enables a range of functionalities in the field of information technology, one of which is smart and interactive health care. However, combining medical data into a single storage location to train a powerful machine learning model raises concerns about privacy, ownership, and compliance with greater concentration. Federated learning (FL) overcomes the preceding difficulties by utilizing a centralized aggregate server to disseminate a global learning model. Simultaneously, the local participant keeps control of patient information, assuring data confidentiality and security. This article conducts a comprehensive analysis of the findings on blockchain technology entangled with federated learning in healthcare. 5.0. The purpose of this study is to construct a secure health monitoring system in healthcare 5.0 by utilizing a blockchain technology and Intrusion Detection System (IDS) to detect any malicious activity in a healthcare network and enables physicians to monitor patients through medical sensors and take necessary measures periodically by predicting diseases.  ( 3 min )
    Ki-Pode: Keypoint-based Implicit Pose Distribution Estimation of Rigid Objects. (arXiv:2209.09659v1 [cs.CV])
    The estimation of 6D poses of rigid objects is a fundamental problem in computer vision. Traditionally pose estimation is concerned with the determination of a single best estimate. However, a single estimate is unable to express visual ambiguity, which in many cases is unavoidable due to object symmetries or occlusion of identifying features. Inability to account for ambiguities in pose can lead to failure in subsequent methods, which is unacceptable when the cost of failure is high. Estimates of full pose distributions are, contrary to single estimates, well suited for expressing uncertainty on pose. Motivated by this, we propose a novel pose distribution estimation method. An implicit formulation of the probability distribution over object pose is derived from an intermediary representation of an object as a set of keypoints. This ensures that the pose distribution estimates have a high level of interpretability. Furthermore, our method is based on conservative approximations, which leads to reliable estimates. The method has been evaluated on the task of rotation distribution estimation on the YCB-V and T-LESS datasets and performs reliably on all objects.  ( 2 min )
    A Closer Look at Weakly-Supervised Audio-Visual Source Localization. (arXiv:2209.09634v1 [cs.SD])
    Audio-visual source localization is a challenging task that aims to predict the location of visual sound sources in a video. Since collecting ground-truth annotations of sounding objects can be costly, a plethora of weakly-supervised localization methods that can learn from datasets with no bounding-box annotations have been proposed in recent years, by leveraging the natural co-occurrence of audio and visual signals. Despite significant interest, popular evaluation protocols have two major flaws. First, they allow for the use of a fully annotated dataset to perform early stopping, thus significantly increasing the annotation effort required for training. Second, current evaluation metrics assume the presence of sound sources at all times. This is of course an unrealistic assumption, and thus better metrics are necessary to capture the model's performance on (negative) samples with no visible sound sources. To accomplish this, we extend the test set of popular benchmarks, Flickr SoundNet and VGG-Sound Sources, in order to include negative samples, and measure performance using metrics that balance localization accuracy and recall. Using the new protocol, we conducted an extensive evaluation of prior methods, and found that most prior works are not capable of identifying negatives and suffer from significant overfitting problems (rely heavily on early stopping for best results). We also propose a new approach for visual sound source localization that addresses both these problems. In particular, we found that, through extreme visual dropout and the use of momentum encoders, the proposed approach combats overfitting effectively, and establishes a new state-of-the-art performance on both Flickr SoundNet and VGG-Sound Source. Code and pre-trained models are available at https://github.com/stoneMo/SLAVC.  ( 3 min )
    Machine Learning Models Evaluation and Feature Importance Analysis on NPL Dataset. (arXiv:2209.09638v1 [cs.LG])
    Predicting the probability of non-performing loans for individuals has a vital and beneficial role for banks to decrease credit risk and make the right decisions before giving the loan. The trend to make these decisions are based on credit study and in accordance with generally accepted standards, loan payment history, and demographic data of the clients. In this work, we evaluate how different Machine learning models such as Random Forest, Decision tree, KNN, SVM, and XGBoost perform on the dataset provided by a private bank in Ethiopia. Further, motivated by this evaluation we explore different feature selection methods to state the important features for the bank. Our findings show that XGBoost achieves the highest F1 score on the KMeans SMOTE over-sampled data. We also found that the most important features are the age of the applicant, years of employment, and total income of the applicant rather than collateral-related features in evaluating credit risk.  ( 2 min )
    Can we do that simpler? Simple, Efficient, High-Quality Evaluation Metrics for NLG. (arXiv:2209.09593v1 [cs.CL])
    We explore efficient evaluation metrics for Natural Language Generation (NLG). To implement efficient metrics, we replace (i) computation-heavy transformers in metrics such as BERTScore, MoverScore, BARTScore, XMoverScore, etc. with lighter versions (such as distilled ones) and (ii) cubic inference time alignment algorithms such as Word Mover Distance with linear and quadratic approximations. We consider six evaluation metrics (both monolingual and multilingual), assessed on three different machine translation datasets, and 16 light-weight transformers as replacement. We find, among others, that (a) TinyBERT shows best quality-efficiency tradeoff for semantic similarity metrics of the BERTScore family, retaining 97\% quality and being 5x faster at inference time on average, (b) there is a large difference in speed-ups on CPU vs. GPU (much higher speed-ups on CPU), and (c) WMD approximations yield no efficiency gains but lead to a substantial drop in quality on 2 out of 3 datasets we examine.  ( 2 min )
    De-Identification of French Unstructured Clinical Notes for Machine Learning Tasks. (arXiv:2209.09631v1 [cs.CR])
    Unstructured textual data are at the heart of health systems: liaison letters between doctors, operating reports, coding of procedures according to the ICD-10 standard, etc. The details included in these documents make it possible to get to know the patient better, to better manage him or her, to better study the pathologies, to accurately remunerate the associated medical acts\ldots All this seems to be (at least partially) within reach of today by artificial intelligence techniques. However, for obvious reasons of privacy protection, the designers of these AIs do not have the legal right to access these documents as long as they contain identifying data. De-identifying these documents, i.e. detecting and deleting all identifying information present in them, is a legally necessary step for sharing this data between two complementary worlds. Over the last decade, several proposals have been made to de-identify documents, mainly in English. While the detection scores are often high, the substitution methods are often not very robust to attack. In French, very few methods are based on arbitrary detection and/or substitution rules. In this paper, we propose a new comprehensive de-identification method dedicated to French-language medical documents. Both the approach for the detection of identifying elements (based on deep learning) and their substitution (based on differential privacy) are based on the most proven existing approaches. The result is an approach that effectively protects the privacy of the patients at the heart of these medical documents. The whole approach has been evaluated on a French language medical dataset of a French public hospital and the results are very encouraging.  ( 3 min )
    Closing the Gender Wage Gap: Adversarial Fairness in Job Recommendation. (arXiv:2209.09592v1 [cs.LG])
    The goal of this work is to help mitigate the already existing gender wage gap by supplying unbiased job recommendations based on resumes from job seekers. We employ a generative adversarial network to remove gender bias from word2vec representations of 12M job vacancy texts and 900k resumes. Our results show that representations created from recruitment texts contain algorithmic bias and that this bias results in real-world consequences for recommendation systems. Without controlling for bias, women are recommended jobs with significantly lower salary in our data. With adversarially fair representations, this wage gap disappears, meaning that our debiased job recommendations reduce wage discrimination. We conclude that adversarial debiasing of word representations can increase real-world fairness of systems and thus may be part of the solution for creating fairness-aware recommendation systems.  ( 2 min )
    Calibrating Ensembles for Scalable Uncertainty Quantification in Deep Learning-based Medical Segmentation. (arXiv:2209.09563v1 [cs.LG])
    Uncertainty quantification in automated image analysis is highly desired in many applications. Typically, machine learning models in classification or segmentation are only developed to provide binary answers; however, quantifying the uncertainty of the models can play a critical role for example in active learning or machine human interaction. Uncertainty quantification is especially difficult when using deep learning-based models, which are the state-of-the-art in many imaging applications. The current uncertainty quantification approaches do not scale well in high-dimensional real-world problems. Scalable solutions often rely on classical techniques, such as dropout, during inference or training ensembles of identical models with different random seeds to obtain a posterior distribution. In this paper, we show that these approaches fail to approximate the classification probability. On the contrary, we propose a scalable and intuitive framework to calibrate ensembles of deep learning models to produce uncertainty quantification measurements that approximate the classification probability. On unseen test data, we demonstrate improved calibration, sensitivity (in two out of three cases) and precision when being compared with the standard approaches. We further motivate the usage of our method in active learning, creating pseudo-labels to learn from unlabeled images and human-machine collaboration.  ( 2 min )
    Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference. (arXiv:2209.09617v1 [cs.LG])
    Epidemic models are powerful tools in understanding infectious disease. However, as they increase in size and complexity, they can quickly become computationally intractable. Recent progress in modelling methodology has shown that surrogate models can be used to emulate complex epidemic models with a high-dimensional parameter space. We show that deep sequence-to-sequence (seq2seq) models can serve as accurate surrogates for complex epidemic models with sequence based model parameters, effectively replicating seasonal and long-term transmission dynamics. Once trained, our surrogate can predict scenarios a several thousand times faster than the original model, making them ideal for policy exploration. We demonstrate that replacing a traditional epidemic model with a learned simulator facilitates robust Bayesian inference.  ( 2 min )
    Comparing Shape-Constrained Regression Algorithms for Data Validation. (arXiv:2209.09602v1 [cs.LG])
    Industrial and scientific applications handle large volumes of data that render manual validation by humans infeasible. Therefore, we require automated data validation approaches that are able to consider the prior knowledge of domain experts to produce dependable, trustworthy assessments of data quality. Prior knowledge is often available as rules that describe interactions of inputs with regard to the target e.g. the target must be monotonically decreasing and convex over increasing input values. Domain experts are able to validate multiple such interactions at a glance. However, existing rule-based data validation approaches are unable to consider these constraints. In this work, we compare different shape-constrained regression algorithms for the purpose of data validation based on their classification accuracy and runtime performance.  ( 2 min )
    Robust Online and Distributed Mean Estimation Under Adversarial Data Corruption. (arXiv:2209.09624v1 [cs.CR])
    We study robust mean estimation in an online and distributed scenario in the presence of adversarial data attacks. At each time step, each agent in a network receives a potentially corrupted data point, where the data points were originally independent and identically distributed samples of a random variable. We propose online and distributed algorithms for all agents to asymptotically estimate the mean. We provide the error-bound and the convergence properties of the estimates to the true mean under our algorithms. Based on the network topology, we further evaluate each agent's trade-off in convergence rate between incorporating data from neighbors and learning with only local observations.  ( 2 min )
    An Attention Free Long Short-Term Memory for Time Series Forecasting. (arXiv:2209.09548v1 [cs.LG])
    Deep learning is playing an increasingly important role in time series analysis. We focused on time series forecasting using attention free mechanism, a more efficient framework, and proposed a new architecture for time series prediction for which linear models seem to be unable to capture the time dependence. We proposed an architecture built using attention free LSTM layers that overcome linear models for conditional variance prediction. Our findings confirm the validity of our model, which also allowed to improve the prediction capacity of a LSTM, while improving the efficiency of the learning task.  ( 2 min )
    Towards Task-Prioritized Policy Composition. (arXiv:2209.09536v1 [cs.LG])
    Combining learned policies in a prioritized, ordered manner is desirable because it allows for modular design and facilitates data reuse through knowledge transfer. In control theory, prioritized composition is realized by null-space control, where low-priority control actions are projected into the null-space of high-priority control actions. Such a method is currently unavailable for Reinforcement Learning. We propose a novel, task-prioritized composition framework for Reinforcement Learning, which involves a novel concept: The indifferent-space of Reinforcement Learning policies. Our framework has the potential to facilitate knowledge transfer and modular design while greatly increasing data efficiency and data reuse for Reinforcement Learning agents. Further, our approach can ensure high-priority constraint satisfaction, which makes it promising for learning in safety-critical domains like robotics. Unlike null-space control, our approach allows learning globally optimal policies for the compound task by online learning in the indifference-space of higher-level policies after initial compound policy construction.  ( 2 min )
    Application of Group Method of Data Handling and New Optimization Algorithms for Predicting Sediment Transport Rate under Vegetation Cover. (arXiv:2209.09623v1 [physics.ao-ph])
    Planting vegetation is one of the practical solutions for reducing sediment transfer rates. Increasing vegetation cover decreases environmental pollution and sediment transport rate (STR). Since sediments and vegetation interact complexly, predicting sediment transport rates is challenging. This study aims to predict sediment transport rate under vegetation cover using new and optimized versions of the group method of data handling (GMDH). Additionally, this study introduces a new ensemble model for predicting sediment transport rates. Model inputs include wave height, wave velocity, density cover, wave force, D50, the height of vegetation cover, and cover stem diameter. A standalone GMDH model and optimized GMDH models, including GMDH honey badger algorithm (HBA) GMDH rat swarm algorithm (RSOA)vGMDH sine cosine algorithm (SCA), and GMDH particle swarm optimization (GMDH-PSO), were used to predict sediment transport rates. As the next step, the outputs of standalone and optimized GMDH were used to construct an ensemble model. The MAE of the ensemble model was 0.145 m3/s, while the MAEs of GMDH-HBA, GMDH-RSOA, GMDH-SCA, GMDH-PSOA, and GMDH in the testing level were 0.176 m3/s, 0.312 m3/s, 0.367 m3/s, 0.498 m3/s, and 0.612 m3/s, respectively. The Nash Sutcliffe coefficient (NSE) of ensemble model, GMDH-HBA, GMDH-RSOA, GMDH-SCA, GMDH-PSOA, and GHMDH were 0.95 0.93, 0.89, 0.86, 0.82, and 0.76, respectively. Additionally, this study demonstrated that vegetation cover decreased sediment transport rate by 90 percent. The results indicated that the ensemble and GMDH-HBA models could accurately predict sediment transport rates. Based on the results of this study, sediment transport rate can be monitored using the IMM and GMDH-HBA. These results are useful for managing and planning water resources in large basins.  ( 3 min )
    Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering. (arXiv:2209.09513v1 [cs.CL])
    When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (SQA), a new benchmark that consists of ~21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations. We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering SQA questions. SQA demonstrates the utility of CoT in language models, as CoT improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA. We also explore the upper bound for models to leverage explanations by feeding those in the input; we observe that it improves the few-shot performance of GPT-3 by 18.96%. Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.  ( 3 min )
    Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design. (arXiv:2209.09570v1 [cs.AR])
    Attention-based neural networks have become pervasive in many AI tasks. Despite their excellent algorithmic performance, the use of the attention mechanism and feed-forward network (FFN) demands excessive computational and memory resources, which often compromises their hardware performance. Although various sparse variants have been introduced, most approaches only focus on mitigating the quadratic scaling of attention on the algorithm level, without explicitly considering the efficiency of mapping their methods on real hardware designs. Furthermore, most efforts only focus on either the attention mechanism or the FFNs but without jointly optimizing both parts, causing most of the current designs to lack scalability when dealing with different input lengths. This paper systematically considers the sparsity patterns in different variants from a hardware perspective. On the algorithmic level, we propose FABNet, a hardware-friendly variant that adopts a unified butterfly sparsity pattern to approximate both the attention mechanism and the FFNs. On the hardware level, a novel adaptable butterfly accelerator is proposed that can be configured at runtime via dedicated hardware control to accelerate different butterfly layers using a single unified hardware engine. On the Long-Range-Arena dataset, FABNet achieves the same accuracy as the vanilla Transformer while reducing the amount of computation by 10 to 66 times and the number of parameters 2 to 22 times. By jointly optimizing the algorithm and hardware, our FPGA-based butterfly accelerator achieves 14.2 to 23.2 times speedup over state-of-the-art accelerators normalized to the same computational budget. Compared with optimized CPU and GPU designs on Raspberry Pi 4 and Jetson Nano, our system is up to 273.8 and 15.1 times faster under the same power budget.  ( 3 min )
    Boosting the Discriminant Power of Naive Bayes. (arXiv:2209.09532v1 [cs.LG])
    Naive Bayes has been widely used in many applications because of its simplicity and ability in handling both numerical data and categorical data. However, lack of modeling of correlations between features limits its performance. In addition, noise and outliers in the real-world dataset also greatly degrade the classification performance. In this paper, we propose a feature augmentation method employing a stack auto-encoder to reduce the noise in the data and boost the discriminant power of naive Bayes. The proposed stack auto-encoder consists of two auto-encoders for different purposes. The first encoder shrinks the initial features to derive a compact feature representation in order to remove the noise and redundant information. The second encoder boosts the discriminant power of the features by expanding them into a higher-dimensional space so that different classes of samples could be better separated in the higher-dimensional space. By integrating the proposed feature augmentation method with the regularized naive Bayes, the discrimination power of the model is greatly enhanced. The proposed method is evaluated on a set of machine-learning benchmark datasets. The experimental results show that the proposed method significantly and consistently outperforms the state-of-the-art naive Bayes classifiers.  ( 2 min )
    FACT: Learning Governing Abstractions Behind Integer Sequences. (arXiv:2209.09543v1 [cs.LG])
    Integer sequences are of central importance to the modeling of concepts admitting complete finitary descriptions. We introduce a novel view on the learning of such concepts and lay down a set of benchmarking tasks aimed at conceptual understanding by machine learning models. These tasks indirectly assess model ability to abstract, and challenge them to reason both interpolatively and extrapolatively from the knowledge gained by observing representative examples. To further aid research in knowledge representation and reasoning, we present FACT, the Finitary Abstraction Comprehension Toolkit. The toolkit surrounds a large dataset of integer sequences comprising both organic and synthetic entries, a library for data pre-processing and generation, a set of model performance evaluation tools, and a collection of baseline model implementations, enabling the making of the future advancements with ease.  ( 2 min )
    A Framework for Benchmarking Clustering Algorithms. (arXiv:2209.09493v1 [cs.LG])
    The evaluation of clustering algorithms can be performed by running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate theses consider only a small number of datasets. Also, rarely the fact that there can be many equally valid ways to cluster a given problem set is taken into account. In order to overcome these limitations, we have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms. Furthermore, we have aggregated, polished, and standardised many clustering benchmark batteries referred to across the machine learning and data mining literature, and included new datasets of different dimensionalities, sizes, and cluster types. An interactive datasets explorer, the documentation of the Python API, a description of the ways to interact with the framework from other programming languages such as R or MATLAB, and other details are all provided at https://clustering-benchmarks.gagolewski.com.  ( 2 min )
    Reduction from Complementary-Label Learning to Probability Estimates. (arXiv:2209.09500v1 [cs.LG])
    Complementary-Label Learning (CLL) is a weakly-supervised learning problem that aims to learn a multi-class classifier from only complementary labels, which indicate a class to which an instance does not belong. Existing approaches mainly adopt the paradigm of reduction to ordinary classification, which applies specific transformations and surrogate losses to connect CLL back to ordinary classification. Those approaches, however, face several limitations, such as the tendency to overfit or be hooked on deep models. In this paper, we sidestep those limitations with a novel perspective--reduction to probability estimates of complementary classes. We prove that accurate probability estimates of complementary labels lead to good classifiers through a simple decoding step. The proof establishes a reduction framework from CLL to probability estimates. The framework offers explanations of several key CLL approaches as its special cases and allows us to design an improved algorithm that is more robust in noisy environments. The framework also suggests a validation procedure based on the quality of probability estimates, leading to an alternative way to validate models with only complementary labels. The flexible framework opens a wide range of unexplored opportunities in using deep and non-deep models for probability estimates to solve the CLL problem. Empirical experiments further verified the framework's efficacy and robustness in various settings.  ( 2 min )
    Feature embedding in click-through rate prediction. (arXiv:2209.09481v1 [cs.LG])
    We tackle the challenge of feature embedding for the purposes of improving the click-through rate prediction process. We select three models: logistic regression, factorization machines and deep factorization machines, as our baselines and propose five different feature embedding modules: embedding scaling, FM embedding, embedding encoding, NN embedding and the embedding reweighting module. The embedding modules act as a way to improve baseline model feature embeddings and are trained alongside the rest of the model parameters in an end-to-end manner. Each module is individually added to a baseline model to obtain a new augmented model. We test the predictive performance of our augmented models on a publicly accessible dataset used for benchmarking click-through rate prediction models. Our results show that several proposed embedding modules provide an important increase in predictive performance without a drastic increase in training time.  ( 2 min )
    Unsupervised Early Exit in DNNs with Multiple Exits. (arXiv:2209.09480v1 [cs.LG])
    Deep Neural Networks (DNNs) are generally designed as sequentially cascaded differentiable blocks/layers with a prediction module connected only to its last layer. DNNs can be attached with prediction modules at multiple points along the backbone where inference can stop at an intermediary stage without passing through all the modules. The last exit point may offer a better prediction error but also involves more computational resources and latency. An exit point that is `optimal' in terms of both prediction error and cost is desirable. The optimal exit point may depend on the latent distribution of the tasks and may change from one task type to another. During neural inference, the ground truth of instances may not be available and error rates at each exit point cannot be estimated. Hence one is faced with the problem of selecting the optimal exit in an unsupervised setting. Prior works tackled this problem in an offline supervised setting assuming that enough labeled data is available to estimate the error rate at each exit point and tune the parameters for better accuracy. However, pre-trained DNNs are often deployed in new domains for which a large amount of ground truth may not be available. We model the problem of exit selection as an unsupervised online learning problem and use bandit theory to identify the optimal exit point. Specifically, we focus on Elastic BERT, a pre-trained multi-exit DNN to demonstrate that it `nearly' satisfies the Strong Dominance (SD) property making it possible to learn the optimal exit in an online setup without knowing the ground truth labels. We develop upper confidence bound (UCB) based algorithm named UEE-UCB that provably achieves sub-linear regret under the SD property. Thus our method provides a means to adaptively learn domain-specific optimal exit points in multi-exit DNNs. We empirically validate our algorithm on IMDb and Yelp datasets.  ( 3 min )
    Probabilistic Dalek -- Emulator framework with probabilistic prediction for supernova tomography. (arXiv:2209.09453v1 [cs.LG])
    Supernova spectral time series can be used to reconstruct a spatially resolved explosion model known as supernova tomography. In addition to an observed spectral time series, a supernova tomography requires a radiative transfer model to perform the inverse problem with uncertainty quantification for a reconstruction. The smallest parametrizations of supernova tomography models are roughly a dozen parameters with a realistic one requiring more than 100. Realistic radiative transfer models require tens of CPU minutes for a single evaluation making the problem computationally intractable with traditional means requiring millions of MCMC samples for such a problem. A new method for accelerating simulations known as surrogate models or emulators using machine learning techniques offers a solution for such problems and a way to understand progenitors/explosions from spectral time series. There exist emulators for the TARDIS supernova radiative transfer code but they only perform well on simplistic low-dimensional models (roughly a dozen parameters) with a small number of applications for knowledge gain in the supernova field. In this work, we present a new emulator for the radiative transfer code TARDIS that not only outperforms existing emulators but also provides uncertainties in its prediction. It offers the foundation for a future active-learning-based machinery that will be able to emulate very high dimensional spaces of hundreds of parameters crucial for unraveling urgent questions in supernovae and related fields.  ( 3 min )
    SparCL: Sparse Continual Learning on the Edge. (arXiv:2209.09476v1 [cs.LG])
    Existing work in continual learning (CL) focuses on mitigating catastrophic forgetting, i.e., model performance deterioration on past tasks when learning a new task. However, the training efficiency of a CL system is under-investigated, which limits the real-world application of CL systems under resource-limited scenarios. In this work, we propose a novel framework called Sparse Continual Learning(SparCL), which is the first study that leverages sparsity to enable cost-effective continual learning on edge devices. SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity. Specifically, we propose task-aware dynamic masking (TDM) to learn a sparse network throughout the entire CL process, dynamic data removal (DDR) to remove less informative training data, and dynamic gradient masking (DGM) to sparsify the gradient updates. Each of them not only improves efficiency, but also further mitigates catastrophic forgetting. SparCL consistently improves the training efficiency of existing state-of-the-art (SOTA) CL methods by at most 23X less training FLOPs, and, surprisingly, further improves the SOTA accuracy by at most 1.7%. SparCL also outperforms competitive baselines obtained from adapting SOTA sparse training methods to the CL setting in both efficiency and accuracy. We also evaluate the effectiveness of SparCL on a real mobile phone, further indicating the practical potential of our method.  ( 3 min )
    SleePyCo: Automatic Sleep Scoring with Feature Pyramid and Contrastive Learning. (arXiv:2209.09452v1 [cs.LG])
    Automatic sleep scoring is essential for the diagnosis and treatment of sleep disorders and enables longitudinal sleep tracking in home environments. Conventionally, learning-based automatic sleep scoring on single-channel electroencephalogram (EEG) is actively studied because obtaining multi-channel signals during sleep is difficult. However, learning representation from raw EEG signals is challenging owing to the following issues: 1) sleep-related EEG patterns occur on different temporal and frequency scales and 2) sleep stages share similar EEG patterns. To address these issues, we propose a deep learning framework named SleePyCo that incorporates 1) a feature pyramid and 2) supervised contrastive learning for automatic sleep scoring. For the feature pyramid, we propose a backbone network named SleePyCo-backbone to consider multiple feature sequences on different temporal and frequency scales. Supervised contrastive learning allows the network to extract class discriminative features by minimizing the distance between intra-class features and simultaneously maximizing that between inter-class features. Comparative analyses on four public datasets demonstrate that SleePyCo consistently outperforms existing frameworks based on single-channel EEG. Extensive ablation experiments show that SleePyCo exhibits enhanced overall performance, with significant improvements in discrimination between the N1 and rapid eye movement (REM) stages.  ( 2 min )
    Modeling sequential annotations for sequence labeling with crowds. (arXiv:2209.09430v1 [cs.CL])
    Crowd sequential annotations can be an efficient and cost-effective way to build large datasets for sequence labeling. Different from tagging independent instances, for crowd sequential annotations the quality of label sequence relies on the expertise level of annotators in capturing internal dependencies for each token in the sequence. In this paper, we propose Modeling sequential annotation for sequence labeling with crowds (SA-SLC). First, a conditional probabilistic model is developed to jointly model sequential data and annotators' expertise, in which categorical distribution is introduced to estimate the reliability of each annotator in capturing local and non-local label dependency for sequential annotation. To accelerate the marginalization of the proposed model, a valid label sequence inference (VLSE) method is proposed to derive the valid ground-truth label sequences from crowd sequential annotations. VLSE derives possible ground-truth labels from the token-wise level and further prunes sub-paths in the forward inference for label sequence decoding. VLSE reduces the number of candidate label sequences and improves the quality of possible ground-truth label sequences. The experimental results on several sequence labeling tasks of Natural Language Processing show the effectiveness of the proposed model.  ( 2 min )
    Attributed Network Embedding Model for Exposing COVID-19 Spread Trajectory Archetypes. (arXiv:2209.09448v1 [cs.LG])
    The spread of COVID-19 revealed that transmission risk patterns are not homogenous across different cities and communities, and various heterogeneous features can influence the spread trajectories. Hence, for predictive pandemic monitoring, it is essential to explore latent heterogeneous features in cities and communities that distinguish their specific pandemic spread trajectories. To this end, this study creates a network embedding model capturing cross-county visitation networks, as well as heterogeneous features to uncover clusters of counties in the United States based on their pandemic spread transmission trajectories. We collected and computed location intelligence features from 2,787 counties from March 3 to June 29, 2020 (initial wave). Second, we constructed a human visitation network, which incorporated county features as node attributes, and visits between counties as network edges. Our attributed network embeddings approach integrates both typological characteristics of the cross-county visitation network, as well as heterogeneous features. We conducted clustering analysis on the attributed network embeddings to reveal four archetypes of spread risk trajectories corresponding to four clusters of counties. Subsequently, we identified four features as important features underlying the distinctive transmission risk patterns among the archetypes. The attributed network embedding approach and the findings identify and explain the non-homogenous pandemic risk trajectories across counties for predictive pandemic monitoring. The study also contributes to data-driven and deep learning-based approaches for pandemic analytics to complement the standard epidemiological models for policy analysis in pandemics.  ( 3 min )
    Weak Disambiguation for Partial Structured Output Learning. (arXiv:2209.09410v1 [cs.CL])
    Existing disambiguation strategies for partial structured output learning just cannot generalize well to solve the problem that there are some candidates which can be false positive or similar to the ground-truth label. In this paper, we propose a novel weak disambiguation for partial structured output learning (WD-PSL). First, a piecewise large margin formulation is generalized to partial structured output learning, which effectively avoids handling large number of candidate structured outputs for complex structures. Second, in the proposed weak disambiguation strategy, each candidate label is assigned with a confidence value indicating how likely it is the true label, which aims to reduce the negative effects of wrong ground-truth label assignment in the learning process. Then two large margins are formulated to combine two types of constraints which are the disambiguation between candidates and non-candidates, and the weak disambiguation for candidates. In the framework of alternating optimization, a new 2n-slack variables cutting plane algorithm is developed to accelerate each iteration of optimization. The experimental results on several sequence labeling tasks of Natural Language Processing show the effectiveness of the proposed model.  ( 2 min )
    Probabilistic Generative Transformer Language models for Generative Design of Molecules. (arXiv:2209.09406v1 [cond-mat.mtrl-sci])
    Self-supervised neural language models have recently found wide applications in generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer  ( 2 min )
    A Joint Imitation-Reinforcement Learning Framework for Reduced Baseline Regret. (arXiv:2209.09446v1 [cs.LG])
    In various control task domains, existing controllers provide a baseline level of performance that -- though possibly suboptimal -- should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during training. In this paper, we address the issue of online optimization of a control policy while minimizing regret w.r.t a baseline policy performance. We present a joint imitation-reinforcement learning framework, denoted JIRL. The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind \textbf{(a)} leveraging the baseline's online demonstrations to minimize the regret w.r.t the baseline policy during training, and \textbf{(b)} eventually surpassing the baseline performance. JIRL addresses these objectives by initially learning to imitate the baseline policy and gradually shifting control from the baseline to an RL agent. Experimental results show that JIRL effectively accomplishes the aforementioned objectives in several, continuous action-space domains. The results demonstrate that JIRL is comparable to a state-of-the-art algorithm in its final performance while incurring significantly lower baseline regret during training in all of the presented domains. Moreover, the results show a reduction factor of up to $21$ in baseline regret over a state-of-the-art baseline regret minimization approach.  ( 3 min )
    Deep learning at the edge enables real-time streaming ptychographic imaging. (arXiv:2209.09408v1 [cs.LG])
    Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials characterization. However, associated significant increases in data and compute needs mean that conventional approaches no longer suffice for recovering sample images in real-time from high-speed coherent imaging experiments. Here, we demonstrate a workflow that leverages artificial intelligence at the edge and high-performance computing to enable real-time inversion on X-ray ptychography data streamed directly from a detector at up to 2 kHz. The proposed AI-enabled workflow eliminates the sampling constraints imposed by traditional ptychography, allowing low dose imaging using orders of magnitude less data than required by traditional methods.  ( 2 min )
    Multi-armed Bandit Learning on a Graph. (arXiv:2209.09419v1 [cs.LG])
    The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively studied in the context of decision-making under uncertainty. In many real-world applications, such as robotic applications, selecting an arm corresponds to a physical action that constrains the choices of the next available arms (actions). Motivated by this, we study an extension of MAB called the graph bandit, where an agent travels over a graph trying to maximize the reward collected from different nodes. The graph defines the freedom of the agent in selecting the next available nodes at each step. We assume the graph structure is fully available, but the reward distributions are unknown. Built upon an offline graph-based planning algorithm and the principle of optimism, we design an online learning algorithm that balances long-term exploration-exploitation using the principle of optimism. We show that our proposed algorithm achieves $O(|S|\sqrt{T}\log(T)+D|S|\log T)$ learning regret, where $|S|$ is the number of nodes and $D$ is the diameter of the graph, which is superior compared to the best-known reinforcement learning algorithms under similar settings. Numerical experiments confirm that our algorithm outperforms several benchmarks. Finally, we present a synthetic robotic application modeled by the graph bandit framework, where a robot moves on a network of rural/suburban locations to provide high-speed internet access using our proposed algorithm.  ( 3 min )
    Partial sequence labeling with structured Gaussian Processes. (arXiv:2209.09397v1 [cs.LG])
    Existing partial sequence labeling models mainly focus on max-margin framework which fails to provide an uncertainty estimation of the prediction. Further, the unique ground truth disambiguation strategy employed by these models may include wrong label information for parameter learning. In this paper, we propose structured Gaussian Processes for partial sequence labeling (SGPPSL), which encodes uncertainty in the prediction and does not need extra effort for model selection and hyperparameter learning. The model employs factor-as-piece approximation that divides the linear-chain graph structure into the set of pieces, which preserves the basic Markov Random Field structure and effectively avoids handling large number of candidate output sequences generated by partially annotated data. Then confidence measure is introduced in the model to address different contributions of candidate labels, which enables the ground-truth label information to be utilized in parameter learning. Based on the derived lower bound of the variational lower bound of the proposed model, variational parameters and confidence measures are estimated in the framework of alternating optimization. Moreover, weighted Viterbi algorithm is proposed to incorporate confidence measure to sequence prediction, which considers label ambiguity arose from multiple annotations in the training data and thus helps improve the performance. SGPPSL is evaluated on several sequence labeling tasks and the experimental results show the effectiveness of the proposed model.  ( 3 min )
    Exponential advantage on noisy quantum computers. (arXiv:2209.09371v1 [quant-ph])
    Quantum computing offers the potential of exponential speedup over classical computation for certain problems. However, many of the existing algorithms with provable speedups require currently unavailable fault-tolerant quantum computers. We present NISQ-TDA, the first fully implemented quantum machine learning algorithm with provable exponential speedup on arbitrary classical (non-handcrafted) data and needing only a linear circuit depth. We report the successful execution of our NISQ-TDA algorithm, applied to small datasets run on quantum computing devices, as well as on noisy quantum simulators. We empirically confirm that the algorithm is robust to noise, and provide target depths and noise levels to realize near-term, non-fault-tolerant quantum advantage on real-world problems. Our unique data-loading projection method is the main source of noise robustness, introducing a new self-correcting data-loading approach.  ( 2 min )
    Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models. (arXiv:2209.09401v1 [cs.CL])
    Prompting, which casts downstream applications as language modeling tasks, has shown to be sample efficient compared to standard fine-tuning with pre-trained models. However, one pitfall of prompting is the need of manually-designed patterns, whose outcome can be unintuitive and requires large validation sets to tune. To tackle the challenge, we propose AutoSeq, a fully automatic prompting method: (1) We adopt natural language prompts on sequence-to-sequence models, enabling free-form generation and larger label search space; (2) We propose label sequences -- phrases with indefinite lengths to verbalize the labels -- which eliminate the need of manual templates and are more expressive than single label words; (3) We use beam search to automatically generate a large amount of label sequence candidates and propose contrastive re-ranking to get the best combinations. AutoSeq significantly outperforms other no-manual-design methods, such as soft prompt tuning, adapter tuning, and automatic search on single label words; the generated label sequences are even better than curated manual ones on a variety of tasks. Our method reveals the potential of sequence-to-sequence models in few-shot learning and sheds light on a path to generic and automatic prompting. The source code of this paper can be obtained from https://github.com/thunlp/Seq2Seq-Prompt.  ( 2 min )
    State-driven Implicit Modeling for Sparsity and Robustness in Neural Networks. (arXiv:2209.09389v1 [cs.LG])
    Implicit models are a general class of learning models that forgo the hierarchical layer structure typical in neural networks and instead define the internal states based on an ``equilibrium'' equation, offering competitive performance and reduced memory consumption. However, training such models usually relies on expensive implicit differentiation for backward propagation. In this work, we present a new approach to training implicit models, called State-driven Implicit Modeling (SIM), where we constrain the internal states and outputs to match that of a baseline model, circumventing costly backward computations. The training problem becomes convex by construction and can be solved in a parallel fashion, thanks to its decomposable structure. We demonstrate how the SIM approach can be applied to significantly improve sparsity (parameter reduction) and robustness of baseline models trained on FashionMNIST and CIFAR-100 datasets.  ( 2 min )
    Analyzing Machine Learning Models for Credit Scoring with Explainable AI and Optimizing Investment Decisions. (arXiv:2209.09362v1 [cs.LG])
    This paper examines two different yet related questions related to explainable AI (XAI) practices. Machine learning (ML) is increasingly important in financial services, such as pre-approval, credit underwriting, investments, and various front-end and back-end activities. Machine Learning can automatically detect non-linearities and interactions in training data, facilitating faster and more accurate credit decisions. However, machine learning models are opaque and hard to explain, which are critical elements needed for establishing a reliable technology. The study compares various machine learning models, including single classifiers (logistic regression, decision trees, LDA, QDA), heterogeneous ensembles (AdaBoost, Random Forest), and sequential neural networks. The results indicate that ensemble classifiers and neural networks outperform. In addition, two advanced post-hoc model agnostic explainability techniques - LIME and SHAP are utilized to assess ML-based credit scoring models using the open-access datasets offered by US-based P2P Lending Platform, Lending Club. For this study, we are also using machine learning algorithms to develop new investment models and explore portfolio strategies that can maximize profitability while minimizing risk.  ( 2 min )
    Development of a Modular and Submersible Soft Robotic Arm and Corresponding Learned Kinematics Models. (arXiv:2209.09358v1 [cs.RO])
    Most soft-body organisms found in nature exist in underwater environments. It is helpful to study the motion and control of soft robots underwater as well. However, a readily available underwater soft robotic system is not available for researchers to use because they are difficult to design, fabricate, and waterproof. Furthermore, submersible robots usually do not have configurable components because of the need for sealed electronics packages. This work presents the development of a submersible soft robotic arm driven by hydraulic actuators which consists of mostly 3D printable parts which can be assembled in a short amount of time. Also, its modular design enables multiple shape configurations and easy swapping of soft actuators. As a first step to exploring machine learning control algorithms on this system, two deep neural network models were developed, trained, and evaluated to estimate the robot's forward and inverse kinematics. The techniques developed for controlling this underwater soft robotic arm can help advance understanding on how to control soft robotic systems in general.  ( 2 min )
    Space-time tradeoffs of lenses and optics via higher category theory. (arXiv:2209.09351v1 [math.CT])
    Optics and lenses are abstract categorical gadgets that model systems with bidirectional data flow. In this paper we observe that the denotational definition of optics - identifying two optics as equivalent by observing their behaviour from the outside - is not suitable for operational, software oriented approaches where optics are not merely observed, but built with their internal setups in mind. We identify operational differences between denotationally isomorphic categories of cartesian optics and lenses: their different composition rule and corresponding space-time tradeoffs, positioning them at two opposite ends of a spectrum. With these motivations we lift the existing categorical constructions and their relationships to the 2-categorical level, showing that the relevant operational concerns become visible. We define the 2-category $\textbf{2-Optic}(\mathcal{C})$ whose 2-cells explicitly track optics' internal configuration. We show that the 1-category $\textbf{Optic}(\mathcal{C})$ arises by locally quotienting out the connected components of this 2-category. We show that the embedding of lenses into cartesian optics gets weakened from a functor to an oplax functor whose oplaxator now detects the different composition rule. We determine the difficulties in showing this functor forms a part of an adjunction in any of the standard 2-categories. We establish a conjecture that the well-known isomorphism between cartesian lenses and optics arises out of the lax 2-adjunction between their double-categorical counterparts. In addition to presenting new research, this paper is also meant to be an accessible introduction to the topic.  ( 3 min )
    Physics-Informed Machine Learning of Dynamical Systems for Efficient Bayesian Inference. (arXiv:2209.09349v1 [stat.ML])
    Although the no-u-turn sampler (NUTS) is a widely adopted method for performing Bayesian inference, it requires numerous posterior gradients which can be expensive to compute in practice. Recently, there has been a significant interest in physics-based machine learning of dynamical (or Hamiltonian) systems and Hamiltonian neural networks (HNNs) is a noteworthy architecture. But these types of architectures have not been applied to solve Bayesian inference problems efficiently. We propose the use of HNNs for performing Bayesian inference efficiently without requiring numerous posterior gradients. We introduce latent variable outputs to HNNs (L-HNNs) for improved expressivity and reduced integration errors. We integrate L-HNNs in NUTS and further propose an online error monitoring scheme to prevent sampling degeneracy in regions where L-HNNs may have little training data. We demonstrate L-HNNs in NUTS with online error monitoring considering several complex high-dimensional posterior densities and compare its performance to NUTS.  ( 2 min )
    Reviewing Embeddings for Graph Neural Networks. (arXiv:2209.09338v1 [cs.LG])
    Current graph representation learning techniques use Graph Neural Networks (GNNs) to extract features from dataset embeddings. In this work, we examine the quality of these embeddings and assess how changing them can affect the accuracy of GNNs. We explore different embedding extraction techniques for both images and texts. We find that the choice of embedding biases the performance of different GNN architectures and thus the choice of embedding influences the selection of GNNs regardless of the underlying dataset. In addition, we only see an improvement in accuracy from some GNN models compared to the accuracy of models trained from scratch or fine-tuned on the underlying data without utilizing the graph connections. As an alternative, we propose Graph-connected Network (GraNet) layers which use GNN message passing within large models to allow neighborhood aggregation. This gives a chance for the model to inherit weights from large pre-trained models if possible and we demonstrate that this approach improves the accuracy compared to the previous methods: on Flickr_v2, GraNet beats GAT2 and GraphSAGE by 7.7% and 1.7% respectively.  ( 2 min )
    Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection. (arXiv:2209.09326v1 [cs.LG])
    There is currently a large gap in performance between the statistically rigorous methods like linear regression or additive splines and the powerful deep methods using neural networks. Previous works attempting to close this gap have failed to fully investigate the exponentially growing number of feature combinations which deep networks consider automatically during training. In this work, we develop a tractable selection algorithm to efficiently identify the necessary feature combinations by leveraging techniques in feature interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN) construct a bridge from these simple and interpretable models to fully connected neural networks. SIAN achieves competitive performance against state-of-the-art methods across multiple large-scale tabular datasets and consistently finds an optimal tradeoff between the modeling capacity of neural networks and the generalizability of simpler methods.  ( 2 min )
    Visible-Infrared Person Re-Identification Using Privileged Intermediate Information. (arXiv:2209.09348v1 [cs.CV])
    Visible-infrared person re-identification (ReID) aims to recognize a same person of interest across a network of RGB and IR cameras. Some deep learning (DL) models have directly incorporated both modalities to discriminate persons in a joint representation space. However, this cross-modal ReID problem remains challenging due to the large domain shift in data distributions between RGB and IR modalities. % This paper introduces a novel approach for a creating intermediate virtual domain that acts as bridges between the two main domains (i.e., RGB and IR modalities) during training. This intermediate domain is considered as privileged information (PI) that is unavailable at test time, and allows formulating this cross-modal matching task as a problem in learning under privileged information (LUPI). We devised a new method to generate images between visible and infrared domains that provide additional information to train a deep ReID model through an intermediate domain adaptation. In particular, by employing color-free and multi-step triplet loss objectives during training, our method provides common feature representation spaces that are robust to large visible-infrared domain shifts. % Experimental results on challenging visible-infrared ReID datasets indicate that our proposed approach consistently improves matching accuracy, without any computational overhead at test time. The code is available at: \href{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI}{https://github.com/alehdaghi/Cross-Modal-Re-ID-via-LUPI}  ( 2 min )
    Understanding reinforcement learned crowds. (arXiv:2209.09344v1 [cs.LG])
    Simulating trajectories of virtual crowds is a commonly encountered task in Computer Graphics. Several recent works have applied Reinforcement Learning methods to animate virtual agents, however they often make different design choices when it comes to the fundamental simulation setup. Each of these choices comes with a reasonable justification for its use, so it is not obvious what is their real impact, and how they affect the results. In this work, we analyze some of these arbitrary choices in terms of their impact on the learning performance, as well as the quality of the resulting simulation measured in terms of the energy efficiency. We perform a theoretical analysis of the properties of the reward function design, and empirically evaluate the impact of using certain observation and action spaces on a variety of scenarios, with the reward function and energy usage as metrics. We show that directly using the neighboring agents' information as observation generally outperforms the more widely used raycasting. Similarly, using nonholonomic controls with egocentric observations tends to produce more efficient behaviors than holonomic controls with absolute observations. Each of these choices has a significant, and potentially nontrivial impact on the results, and so researchers should be mindful about choosing and reporting them in their work.  ( 3 min )
    MAN: Multi-Action Networks Learning. (arXiv:2209.09329v1 [cs.LG])
    Learning control policies with large action spaces is a challenging problem in the field of reinforcement learning due to present inefficiencies in exploration. In this work, we introduce a Deep Reinforcement Learning (DRL) algorithm call Multi-Action Networks (MAN) Learning that addresses the challenge of large discrete action spaces. We propose separating the action space into two components, creating a Value Neural Network for each sub-action. Then, MAN uses temporal-difference learning to train the networks synchronously, which is simpler than training a single network with a large action output directly. To evaluate the proposed method, we test MAN on a block stacking task, and then extend MAN to handle 12 games from the Atari Arcade Learning environment with 18 action spaces. Our results indicate that MAN learns faster than both Deep Q-Learning and Double Deep Q-Learning, implying our method is a better performing synchronous temporal difference algorithm than those currently available for large action spaces.  ( 2 min )
    Activity report analysis with automatic single or multispan answer extraction. (arXiv:2209.09316v1 [cs.CL])
    In the era of loT (Internet of Things) we are surrounded by a plethora of Al enabled devices that can transcribe images, video, audio, and sensors signals into text descriptions. When such transcriptions are captured in activity reports for monitoring, life logging and anomaly detection applications, a user would typically request a summary or ask targeted questions about certain sections of the report they are interested in. Depending on the context and the type of question asked, a question answering (QA) system would need to automatically determine whether the answer covers single-span or multi-span text components. Currently available QA datasets primarily focus on single span responses only (such as SQuAD[4]) or contain a low proportion of examples with multiple span answers (such as DROP[3]). To investigate automatic selection of single/multi-span answers in the use case described, we created a new smart home environment dataset comprised of questions paired with single-span or multi-span answers depending on the question and context queried. In addition, we propose a RoBERTa[6]-based multiple span extraction question answering (MSEQA) model returning the appropriate answer span for a given question. Our experiments show that the proposed model outperforms state-of-the-art QA models on our dataset while providing comparable performance on published individual single/multi-span task datasets.  ( 2 min )
    Meta-Reinforcement Learning for Adaptive Control of Second Order Systems. (arXiv:2209.09301v1 [cs.LG])
    Meta-learning is a branch of machine learning which aims to synthesize data from a distribution of related tasks to efficiently solve new ones. In process control, many systems have similar and well-understood dynamics, which suggests it is feasible to create a generalizable controller through meta-learning. In this work, we formulate a meta reinforcement learning (meta-RL) control strategy that takes advantage of known, offline information for training, such as a model structure. The meta-RL agent is trained over a distribution of model parameters, rather than a single model, enabling the agent to automatically adapt to changes in the process dynamics while maintaining performance. A key design element is the ability to leverage model-based information offline during training, while maintaining a model-free policy structure for interacting with new environments. Our previous work has demonstrated how this approach can be applied to the industrially-relevant problem of tuning proportional-integral controllers to control first order processes. In this work, we briefly reintroduce our methodology and demonstrate how it can be extended to proportional-integral-derivative controllers and second order systems.  ( 2 min )
    Deep Linear Networks can Benignly Overfit when Shallow Ones Do. (arXiv:2209.09315v1 [cs.LG])
    We bound the excess risk of interpolating deep linear networks trained using gradient flow. In a setting previously used to establish risk bounds for the minimum $\ell_2$-norm interpolant, we show that randomly initialized deep linear networks can closely approximate or even match known bounds for the minimum $\ell_2$-norm interpolant. Our analysis also reveals that interpolating deep linear models have exactly the same conditional variance as the minimum $\ell_2$-norm solution. Since the noise affects the excess risk only through the conditional variance, this implies that depth does not improve the algorithm's ability to "hide the noise". Our simulations verify that aspects of our bounds reflect typical behavior for simple data distributions. We also find that similar phenomena are seen in simulations with ReLU networks, although the situation there is more nuanced.  ( 2 min )
    The Ability of Image-Language Explainable Models to Resemble Domain Expertise. (arXiv:2209.09310v1 [cs.LG])
    Recent advances in vision and language (V+L) models have a promising impact in the healthcare field. However, such models struggle to explain how and why a particular decision was made. In addition, model transparency and involvement of domain expertise are critical success factors for machine learning models to make an entrance into the field. In this work, we study the use of the local surrogate explainability technique to overcome the problem of black-box deep learning models. We explore the feasibility of resembling domain expertise using the local surrogates in combination with an underlying V+L to generate multi-modal visual and language explanations. We demonstrate that such explanations can serve as helpful feedback in guiding model training for data scientists and machine learning engineers in the field.  ( 2 min )
    Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks. (arXiv:2209.09298v1 [cs.LG])
    While significant theoretical progress has been achieved, unveiling the generalization mystery of overparameterized neural networks still remains largely elusive. In this paper, we study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and stochastic gradient descent (SGD) to train SNNs, for both of which we develop consistent excess risk bounds by balancing the optimization and generalization via early-stopping. As compared to existing analysis on GD, our new analysis requires a relaxed overparameterization assumption and also applies to SGD. The key for the improvement is a better estimation of the smallest eigenvalues of the Hessian matrices of the empirical risks and the loss function along the trajectories of GD and SGD by providing a refined estimation of their iterates.  ( 2 min )
    Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey. (arXiv:2209.09239v1 [cs.LG])
    Data quality is the key factor for the development of trustworthy AI in healthcare. A large volume of curated datasets with controlled confounding factors can help improve the accuracy, robustness and privacy of downstream AI algorithms. However, access to good quality datasets is limited by the technical difficulty of data acquisition and large-scale sharing of healthcare data is hindered by strict ethical restrictions. Data synthesis algorithms, which generate data with a similar distribution as real clinical data, can serve as a potential solution to address the scarcity of good quality data during the development of trustworthy AI. However, state-of-the-art data synthesis algorithms, especially deep learning algorithms, focus more on imaging data while neglecting the synthesis of non-imaging healthcare data, including clinical measurements, medical signals and waveforms, and electronic healthcare records (EHRs). Thus, in this paper, we will review the synthesis algorithms, particularly for non-imaging medical data, with the aim of providing trustworthy AI in this domain. This tutorial-styled review paper will provide comprehensive descriptions of non-imaging medical data synthesis on aspects including algorithms, evaluations, limitations and future research directions.  ( 2 min )
    Machine Learning Class Numbers of Real Quadratic Fields. (arXiv:2209.09283v1 [math.NT])
    We implement and interpret various supervised learning experiments involving real quadratic fields with class numbers 1, 2 and 3. We quantify the relative difficulties in separating class numbers of matching/different parity from a data-scientific perspective, apply the methodology of feature analysis and principal component analysis, and use symbolic classification to develop machine-learned formulas for class numbers 1, 2 and 3 that apply to our dataset.  ( 2 min )
    PoxVerifi: An Information Verification System to Combat Monkeypox Misinformation. (arXiv:2209.09300v1 [cs.CL])
    Following recent outbreaks, monkeypox-related misinformation continues to rapidly spread online. This negatively impacts response strategies and disproportionately harms LGBTQ+ communities in the short-term, and ultimately undermines the overall effectiveness of public health responses. In an attempt to combat monkeypox-related misinformation, we present PoxVerifi, an open-source, extensible tool that provides a comprehensive approach to assessing the accuracy of monkeypox related claims. Leveraging information from existing fact checking sources and published World Health Organization (WHO) information, we created an open-sourced corpus of 225 rated monkeypox claims. Additionally, we trained an open-sourced BERT-based machine learning model for specifically classifying monkeypox information, which achieved 96% cross-validation accuracy. PoxVerifi is a Google Chrome browser extension designed to empower users to navigate through monkeypox-related misinformation. Specifically, PoxVerifi provides users with a comprehensive toolkit to assess the veracity of headlines on any webpage across the Internet without having to visit an external site. Users can view an automated accuracy review from our trained machine learning model, a user-generated accuracy review based on community-member votes, and have the ability to see similar, vetted, claims. Besides PoxVerifi's comprehensive approach to claim-testing, our platform provides an efficient and accessible method to crowdsource accuracy ratings on monkeypox related-claims, which can be aggregated to create new labeled misinformation datasets.  ( 3 min )
    Weak-signal extraction enabled by deep-neural-network denoising of diffraction data. (arXiv:2209.09247v1 [eess.IV])
    Removal or cancellation of noise has wide-spread applications for imaging and acoustics. In every-day-life applications, denoising may even include generative aspects which are unfaithful to the ground truth. For scientific applications, however, denoising must reproduce the ground truth accurately. Here, we show how data can be denoised via a deep convolutional neural network such that weak signals appear with quantitative accuracy. In particular, we study X-ray diffraction on crystalline materials. We demonstrate that weak signals stemming from charge ordering, insignificant in the noisy data, become visible and accurate in the denoised data. This success is enabled by supervised training of a deep neural network with pairs of measured low- and high-noise data. This way, the neural network learns about the statistical properties of the noise. We demonstrate that using artificial noise (such as Poisson and Gaussian) does not yield such quantitatively accurate results. Our approach thus illustrates a practical strategy for noise filtering that can be applied to challenging acquisition problems.  ( 3 min )
    Interpreting mechanism of Synergism of drug combinations using attention based hierarchical graph pooling. (arXiv:2209.09245v1 [q-bio.QM])
    The synergistic drug combinations provide huge potentials to enhance therapeutic efficacy and to reduce adverse reactions. However, effective and synergistic drug combination prediction remains an open question because of the unknown causal disease signaling pathways. Though various deep learning (AI) models have been proposed to quantitatively predict the synergism of drug combinations. The major limitation of existing deep learning methods is that they are inherently not interpretable, which makes the conclusion of AI models un-transparent to human experts, henceforth limiting the robustness of the model conclusion and the implementation ability of these models in the real-world human-AI healthcare. In this paper, we develop an interpretable graph neural network (GNN) that reveals the underlying essential therapeutic targets and mechanism of the synergy (MoS) by mining the sub-molecular network of great importance. The key point of the interpretable GNN prediction model is a novel graph pooling layer, Self-Attention based Node and Edge pool (henceforth SANEpool), that can compute the attention score (importance) of nodes and edges based on the node features and the graph topology. As such, the proposed GNN model provides a systematic way to predict and interpret the drug combination synergism based on the detected crucial sub-molecular network. We evaluate SANEpool on molecular networks formulated by genes from 46 core cancer signaling pathways and drug combinations from NCI ALMANAC drug combination screening data. The experimental results indicate that 1) SANEpool can achieve the current state-of-art performance among other popular graph neural networks; and 2) the sub-molecular network detected by SANEpool are self-explainable and salient for identifying synergistic drug combinations.  ( 3 min )
    Distributed Semi-supervised Fuzzy Regression with Interpolation Consistency Regularization. (arXiv:2209.09240v1 [cs.LG])
    Recently, distributed semi-supervised learning (DSSL) algorithms have shown their effectiveness in leveraging unlabeled samples over interconnected networks, where agents cannot share their original data with each other and can only communicate non-sensitive information with their neighbors. However, existing DSSL algorithms cannot cope with data uncertainties and may suffer from high computation and communication overhead problems. To handle these issues, we propose a distributed semi-supervised fuzzy regression (DSFR) model with fuzzy if-then rules and interpolation consistency regularization (ICR). The ICR, which was proposed recently for semi-supervised problem, can force decision boundaries to pass through sparse data areas, thus increasing model robustness. However, its application in distributed scenarios has not been considered yet. In this work, we proposed a distributed Fuzzy C-means (DFCM) method and a distributed interpolation consistency regularization (DICR) built on the well-known alternating direction method of multipliers to respectively locate parameters in antecedent and consequent components of DSFR. Notably, the DSFR model converges very fast since it does not involve back-propagation procedure and is scalable to large-scale datasets benefiting from the utilization of DFCM and DICR. Experiments results on both artificial and real-world datasets show that the proposed DSFR model can achieve much better performance than the state-of-the-art DSSL algorithm in terms of both loss value and computational cost.  ( 3 min )
    Flexible Neural Image Compression via Code Editing. (arXiv:2209.09244v1 [eess.IV])
    Neural image compression (NIC) has outperformed traditional image codecs in rate-distortion (R-D) performance. However, it usually requires a dedicated encoder-decoder pair for each point on R-D curve, which greatly hinders its practical deployment. While some recent works have enabled bitrate control via conditional coding, they impose strong prior during training and provide limited flexibility. In this paper we propose Code Editing, a highly flexible coding method for NIC based on semi-amortized inference and adaptive quantization. Our work is a new paradigm for variable bitrate NIC. Furthermore, experimental results show that our method surpasses existing variable-rate methods, and achieves ROI coding and multi-distortion trade-off with a single decoder.  ( 2 min )
  • Open

    PARNN: A Probabilistic Autoregressive Neural Network Framework for Accurate Forecasting. (arXiv:2204.09640v2 [stat.ML] UPDATED)
    Forecasting time series data represents an emerging field of research in data science and knowledge discovery with vast applications ranging from stock price and energy demand prediction to the early prediction of epidemics. Numerous statistical and machine learning methods have been proposed in the last five decades with the demand for high-quality and reliable forecasts. However, in real-life prediction problems, situations exist in which a model based on one of the above paradigms is preferable. Therefore, hybrid solutions are needed to bridge the gap between classical forecasting methods and modern neural network models. In this context, we introduce a Probabilistic AutoRegressive Neural Network (PARNN) model that can handle a wide variety of complex time series data (e.g., nonlinearity, non-seasonal, long-range dependence, and non-stationarity). The proposed PARNN model is built by creating a fusion of an integrated moving average and autoregressive neural network to preserve the explainability, scalability, and ``white-box-like'' prediction behavior of the individuals. Sufficient conditions for asymptotic stationarity and geometric ergodicity are obtained by considering the asymptotic behavior of the associated Markov chain. Unlike advanced deep learning tools, the uncertainty quantification of the PARNN model based on prediction intervals is obtained. During computational experiments, PARNN outperforms standard statistical, machine learning, and deep learning models (e.g., Transformers, NBeats, DeepAR, etc.) on a diverse collection of real-world datasets from macroeconomics, tourism, energy, epidemiology, and others for short-term, medium-term, and long-term forecasting. Multiple comparisons with the best method are carried out to showcase the superiority of the proposal in comparison with the state-of-the-art forecasters over different forecast horizons.  ( 3 min )
    Inference and Sampling for Archimax Copulas. (arXiv:2205.14025v2 [stat.ME] UPDATED)
    Understanding multivariate dependencies in both the bulk and the tails of a distribution is an important problem for many applications, such as ensuring algorithms are robust to observations that are infrequent but have devastating effects. Archimax copulas are a family of distributions endowed with a precise representation that allows simultaneous modeling of the bulk and the tails of a distribution. Rather than separating the two as is typically done in practice, incorporating additional information from the bulk may improve inference of the tails, where observations are limited. Building on the stochastic representation of Archimax copulas, we develop a non-parametric inference method and sampling algorithm. Our proposed methods, to the best of our knowledge, are the first that allow for highly flexible and scalable inference and sampling algorithms, enabling the increased use of Archimax copulas in practical settings. We experimentally compare to state-of-the-art density modeling techniques, and the results suggest that the proposed method effectively extrapolates to the tails while scaling to higher dimensional data. Our findings suggest that the proposed algorithms can be used in a variety of applications where understanding the interplay between the bulk and the tails of a distribution is necessary, such as healthcare and safety.  ( 3 min )
    Personalized Longitudinal Assessment of Multiple Sclerosis Using Smartphones. (arXiv:2209.09692v1 [stat.ME])
    Personalized longitudinal disease assessment is central to quickly diagnosing, appropriately managing, and optimally adapting the therapeutic strategy of multiple sclerosis (MS). It is also important for identifying the idiosyncratic subject-specific disease profiles. Here, we design a novel longitudinal model to map individual disease trajectories in an automated way using sensor data that may contain missing values. First, we collect digital measurements related to gait and balance, and upper extremity functions using sensor-based assessments administered on a smartphone. Next, we treat missing data via imputation. We then discover potential markers of MS by employing a generalized estimation equation. Subsequently, parameters learned from multiple training datasets are ensembled to form a simple, unified longitudinal predictive model to forecast MS over time in previously unseen people with MS. To mitigate potential underestimation for individuals with severe disease scores, the final model incorporates additional subject-specific fine-tuning using data from the first day. The results show that the proposed model is promising to achieve personalized longitudinal MS assessment; they also suggest that features related to gait and balance as well as upper extremity function, remotely collected from sensor-based assessments, may be useful digital markers for predicting MS over time.  ( 2 min )
    Deep Linear Networks can Benignly Overfit when Shallow Ones Do. (arXiv:2209.09315v1 [cs.LG])
    We bound the excess risk of interpolating deep linear networks trained using gradient flow. In a setting previously used to establish risk bounds for the minimum $\ell_2$-norm interpolant, we show that randomly initialized deep linear networks can closely approximate or even match known bounds for the minimum $\ell_2$-norm interpolant. Our analysis also reveals that interpolating deep linear models have exactly the same conditional variance as the minimum $\ell_2$-norm solution. Since the noise affects the excess risk only through the conditional variance, this implies that depth does not improve the algorithm's ability to "hide the noise". Our simulations verify that aspects of our bounds reflect typical behavior for simple data distributions. We also find that similar phenomena are seen in simulations with ReLU networks, although the situation there is more nuanced.  ( 2 min )
    Predictive Scale-Bridging Simulations through Active Learning. (arXiv:2209.09811v1 [cs.LG])
    Throughout computational science, there is a growing need to utilize the continual improvements in raw computational horsepower to achieve greater physical fidelity through scale-bridging over brute-force increases in the number of mesh elements. For instance, quantitative predictions of transport in nanoporous media, critical to hydrocarbon extraction from tight shale formations, are impossible without accounting for molecular-level interactions. Similarly, inertial confinement fusion simulations rely on numerical diffusion to simulate molecular effects such as non-local transport and mixing without truly accounting for molecular interactions. With these two disparate applications in mind, we develop a novel capability which uses an active learning approach to optimize the use of local fine-scale simulations for informing coarse-scale hydrodynamics. Our approach addresses three challenges: forecasting continuum coarse-scale trajectory to speculatively execute new fine-scale molecular dynamics calculations, dynamically updating coarse-scale from fine-scale calculations, and quantifying uncertainty in neural network models.  ( 2 min )
    DADApy: Distance-based Analysis of DAta-manifolds in Python. (arXiv:2205.03373v2 [cs.LG] UPDATED)
    DADApy is a python software package for analysing and characterising high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics. We review the main functionalities of the package and exemplify its usage in toy cases and in a real-world application. DADApy is freely available under the open-source Apache 2.0 license.  ( 2 min )
    Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data. (arXiv:2202.05928v3 [cs.LG] UPDATED)
    Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.  ( 2 min )
    Neural network training under semidefinite constraints. (arXiv:2201.00632v3 [cs.LG] UPDATED)
    This paper is concerned with the training of neural networks (NNs) under semidefinite constraints, which allows for NN training with robustness and stability guarantees. In particular, we focus on Lipschitz bounds for NNs. Exploiting the banded structure of the underlying matrix constraint, we set up an efficient and scalable training scheme for NN training problems of this kind based on interior point methods. Our implementation allows to enforce Lipschitz constraints in the training of large-scale deep NNs such as Wasserstein generative adversarial networks (WGANs) via semidefinite constraints. In numerical examples, we show the superiority of our method and its applicability to WGAN training.  ( 2 min )
    Calibrated Uncertainty Estimation Improves Bayesian Optimization. (arXiv:2112.04620v2 [cs.LG] UPDATED)
    Bayesian optimization is a sequential procedure for obtaining the global optimum of black-box functions without knowing a priori their true form. Good uncertainty estimates over the shape of the objective function are essential in guiding the optimization process. However, these estimates can be inaccurate if the true objective function violates assumptions made by its model (e.g., Gaussianity). This paper studies which uncertainties are needed in Bayesian optimization models and argues that ideal uncertainties should be calibrated -- i.e., an 80% predictive interval should contain the true outcome 80% of the time. We propose a simple algorithm for enforcing this property and show that it enables Bayesian optimization to arrive at the global optimum in fewer steps. We provide theoretical insights into the role of calibrated uncertainties and demonstrate the improved performance of our method on standard benchmark functions and hyperparameter optimization tasks.  ( 2 min )
    Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty. (arXiv:2209.09658v1 [cs.LG])
    Among attempts at giving a theoretical account of the success of deep neural networks, a recent line of work has identified a so-called `lazy' regime in which the network can be well approximated by its linearization around initialization. Here we investigate the comparative effect of the lazy (linear) and feature learning (non-linear) regimes on subgroups of examples based on their difficulty. Specifically, we show that easier examples are given more weight in feature learning mode, resulting in faster training compared to more difficult ones. In other words, the non-linear dynamics tends to sequentialize the learning of examples of increasing difficulty. We illustrate this phenomenon across different ways to quantify example difficulty, including c-score, label noise, and in the presence of spurious correlations. Our results reveal a new understanding of how deep networks prioritize resources across example difficulty.  ( 2 min )
    The boosted HP filter is more general than you might think. (arXiv:2209.09810v1 [econ.EM])
    The global financial crisis and Covid recession have renewed discussion concerning trend-cycle discovery in macroeconomic data, and boosting has recently upgraded the popular HP filter to a modern machine learning device suited to data-rich and rapid computational environments. This paper sheds light on its versatility in trend-cycle determination, explaining in a simple manner both HP filter smoothing and the consistency delivered by boosting for general trend detection. Applied to a universe of time series in FRED databases, boosting outperforms other methods in timely capturing downturns at crises and recoveries that follow. With its wide applicability the boosted HP filter is a useful automated machine learning addition to the macroeconometric toolkit.  ( 2 min )
    Learning Green's Functions of Linear Reaction-Diffusion Equations with Application to Fast Numerical Solver. (arXiv:2105.11045v2 [cs.LG] UPDATED)
    Partial differential equations are often used to model various physical phenomena, such as heat diffusion, wave propagation, fluid dynamics, elasticity, electrodynamics and image processing, and many analytic approaches or traditional numerical methods have been developed and widely used for their solutions. Inspired by rapidly growing impact of deep learning on scientific and engineering research, in this paper we propose a novel neural network, GF-Net, for learning the Green's functions of linear reaction-diffusion equations in an unsupervised fashion. The proposed method overcomes the challenges for finding the Green's functions of the equations on arbitrary domains by utilizing physics-informed approach and the symmetry of the Green's function. As a consequence, it particularly leads to an efficient way for solving the target equations under different boundary conditions and sources. We also demonstrate the effectiveness of the proposed approach by experiments in square, annular and L-shape domains.  ( 2 min )
    A Framework for Benchmarking Clustering Algorithms. (arXiv:2209.09493v1 [cs.LG])
    The evaluation of clustering algorithms can be performed by running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate theses consider only a small number of datasets. Also, rarely the fact that there can be many equally valid ways to cluster a given problem set is taken into account. In order to overcome these limitations, we have developed a framework whose aim is to introduce a consistent methodology for testing clustering algorithms. Furthermore, we have aggregated, polished, and standardised many clustering benchmark batteries referred to across the machine learning and data mining literature, and included new datasets of different dimensionalities, sizes, and cluster types. An interactive datasets explorer, the documentation of the Python API, a description of the ways to interact with the framework from other programming languages such as R or MATLAB, and other details are all provided at https://clustering-benchmarks.gagolewski.com.  ( 2 min )
    A gradient estimator via L1-randomization for online zero-order optimization with two point feedback. (arXiv:2205.13910v2 [math.ST] UPDATED)
    This work studies online zero-order optimization of convex and Lipschitz functions. We present a novel gradient estimator based on two function evaluations and randomization on the $\ell_1$-sphere. Considering different geometries of feasible sets and Lipschitz assumptions we analyse online dual averaging algorithm with our estimator in place of the usual gradient. We consider two types of assumptions on the noise of the zero-order oracle: canceling noise and adversarial noise. We provide an anytime and completely data-driven algorithm, which is adaptive to all parameters of the problem. In the case of canceling noise that was previously studied in the literature, our guarantees are either comparable or better than state-of-the-art bounds obtained by Duchi et al. (2015) and Shamir (2017) for non-adaptive algorithms. Our analysis is based on deriving a new weighted Poincar\'e type inequality for the uniform measure on the $\ell_1$-sphere with explicit constants, which may be of independent interest.  ( 2 min )
    Computed Decision Weights and a New Learning Algorithm for Neural Classifiers. (arXiv:2209.08422v1 [cs.LG] CROSS LISTED)
    In this paper we consider the possibility of computing rather than training the decision layer weights of a neural classifier. Such a possibility arises in two way, from making an appropriate choice of loss function and by solving a problem of constrained optimization. The latter formulation leads to a promising new learning process for pre-decision weights with both simplicity and efficacy.  ( 2 min )
    Multi-armed Bandit Learning on a Graph. (arXiv:2209.09419v1 [cs.LG])
    The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively studied in the context of decision-making under uncertainty. In many real-world applications, such as robotic applications, selecting an arm corresponds to a physical action that constrains the choices of the next available arms (actions). Motivated by this, we study an extension of MAB called the graph bandit, where an agent travels over a graph trying to maximize the reward collected from different nodes. The graph defines the freedom of the agent in selecting the next available nodes at each step. We assume the graph structure is fully available, but the reward distributions are unknown. Built upon an offline graph-based planning algorithm and the principle of optimism, we design an online learning algorithm that balances long-term exploration-exploitation using the principle of optimism. We show that our proposed algorithm achieves $O(|S|\sqrt{T}\log(T)+D|S|\log T)$ learning regret, where $|S|$ is the number of nodes and $D$ is the diameter of the graph, which is superior compared to the best-known reinforcement learning algorithms under similar settings. Numerical experiments confirm that our algorithm outperforms several benchmarks. Finally, we present a synthetic robotic application modeled by the graph bandit framework, where a robot moves on a network of rural/suburban locations to provide high-speed internet access using our proposed algorithm.  ( 3 min )
    Deep Physics Corrector: A physics enhanced deep learning architecture for solving stochastic differential equations. (arXiv:2209.09750v1 [stat.ML])
    We propose a novel gray-box modeling algorithm for physical systems governed by stochastic differential equations (SDE). The proposed approach, referred to as the Deep Physics Corrector (DPC), blends approximate physics represented in terms of SDE with deep neural network (DNN). The primary idea here is to exploit DNN to model the missing physics. We hypothesize that combining incomplete physics with data will make the model interpretable and allow better generalization. The primary bottleneck associated with training surrogate models for stochastic simulators is often associated with selecting the suitable loss function. Among the different loss functions available in the literature, we use the conditional maximum mean discrepancy (CMMD) loss function in DPC because of its proven performance. Overall, physics-data fusion and CMMD allow DPC to learn from sparse data. We illustrate the performance of the proposed DPC on four benchmark examples from the literature. The results obtained are highly accurate, indicating its possible application as a surrogate model for stochastic simulators.  ( 2 min )
    Deep Generalized Schr\"odinger Bridge. (arXiv:2209.09893v1 [stat.ML])
    Mean-Field Game (MFG) serves as a crucial mathematical framework in modeling the collective behavior of individual agents interacting stochastically with a large population. In this work, we aim at solving a challenging class of MFGs in which the differentiability of these interacting preferences may not be available to the solver, and the population is urged to converge exactly to some desired distribution. These setups are, despite being well-motivated for practical purposes, complicated enough to paralyze most (deep) numerical solvers. Nevertheless, we show that Schr\"odinger Bridge - as an entropy-regularized optimal transport model - can be generalized to accepting mean-field structures, hence solving these MFGs. This is achieved via the application of Forward-Backward Stochastic Differential Equations theory, which, intriguingly, leads to a computational framework with a similar structure to Temporal Difference learning. As such, it opens up novel algorithmic connections to Deep Reinforcement Learning that we leverage to facilitate practical training. We show that our proposed objective function provides necessary and sufficient conditions to the mean-field problem. Our method, named Deep Generalized Schr\"odinger Bridge (DeepGSB), not only outperforms prior methods in solving classical population navigation MFGs, but is also capable of solving 1000-dimensional opinion depolarization, setting a new state-of-the-art numerical solver for high-dimensional MFGs. Our code will be made available at https://github.com/ghliu/DeepGSB.  ( 2 min )
    Physics-Informed Machine Learning of Dynamical Systems for Efficient Bayesian Inference. (arXiv:2209.09349v1 [stat.ML])
    Although the no-u-turn sampler (NUTS) is a widely adopted method for performing Bayesian inference, it requires numerous posterior gradients which can be expensive to compute in practice. Recently, there has been a significant interest in physics-based machine learning of dynamical (or Hamiltonian) systems and Hamiltonian neural networks (HNNs) is a noteworthy architecture. But these types of architectures have not been applied to solve Bayesian inference problems efficiently. We propose the use of HNNs for performing Bayesian inference efficiently without requiring numerous posterior gradients. We introduce latent variable outputs to HNNs (L-HNNs) for improved expressivity and reduced integration errors. We integrate L-HNNs in NUTS and further propose an online error monitoring scheme to prevent sampling degeneracy in regions where L-HNNs may have little training data. We demonstrate L-HNNs in NUTS with online error monitoring considering several complex high-dimensional posterior densities and compare its performance to NUTS.  ( 2 min )
    Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks. (arXiv:2209.09298v1 [cs.LG])
    While significant theoretical progress has been achieved, unveiling the generalization mystery of overparameterized neural networks still remains largely elusive. In this paper, we study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and stochastic gradient descent (SGD) to train SNNs, for both of which we develop consistent excess risk bounds by balancing the optimization and generalization via early-stopping. As compared to existing analysis on GD, our new analysis requires a relaxed overparameterization assumption and also applies to SGD. The key for the improvement is a better estimation of the smallest eigenvalues of the Hessian matrices of the empirical risks and the loss function along the trajectories of GD and SGD by providing a refined estimation of their iterates.  ( 2 min )
    Sparse Interaction Additive Networks via Feature Interaction Detection and Sparse Selection. (arXiv:2209.09326v1 [cs.LG])
    There is currently a large gap in performance between the statistically rigorous methods like linear regression or additive splines and the powerful deep methods using neural networks. Previous works attempting to close this gap have failed to fully investigate the exponentially growing number of feature combinations which deep networks consider automatically during training. In this work, we develop a tractable selection algorithm to efficiently identify the necessary feature combinations by leveraging techniques in feature interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN) construct a bridge from these simple and interpretable models to fully connected neural networks. SIAN achieves competitive performance against state-of-the-art methods across multiple large-scale tabular datasets and consistently finds an optimal tradeoff between the modeling capacity of neural networks and the generalizability of simpler methods.  ( 2 min )
    Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL. (arXiv:2209.09845v1 [cs.LG])
    The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications. Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works. In this paper, we verify that the transformer implements complex relational reasoning, and we propose and analyze model-free and model-based offline MARL algorithms with the transformer approximators. We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents. These results are consequences of a novel generalization error bound of the transformer and a novel analysis of the Maximum Likelihood Estimate (MLE) of the system dynamics with the transformer. Our model-based algorithm is the first provably efficient MARL algorithm that explicitly exploits the permutation invariance of the agents.  ( 2 min )
    Discovering and forecasting extreme events via active learning in neural operators. (arXiv:2204.02488v2 [cs.LG] UPDATED)
    Extreme events in society and nature, such as pandemic spikes, rogue waves, or structural failures, can have catastrophic consequences. Characterizing extremes is difficult as they occur rarely, arise from seemingly benign conditions, and belong to complex and often unknown infinite-dimensional systems. Such challenges render attempts at characterizing them as moot. We address each of these difficulties by combining novel training schemes in Bayesian experimental design (BED) with an ensemble of deep neural operators (DNOs). This model-agnostic framework pairs a BED scheme that actively selects data for quantifying extreme events with an ensemble of DNOs that approximate infinite-dimensional nonlinear operators. We find that not only does this framework clearly beat Gaussian processes (GPs) but that 1) shallow ensembles of just two members perform best; 2) extremes are uncovered regardless of the state of initial data (i.e. with or without extremes); 3) our method eliminates "double-descent" phenomena; 4) the use of batches of suboptimal acquisition points compared to step-by-step global optima does not hinder BED performance; and 5) Monte Carlo acquisition outperforms standard optimizers in high-dimensions. Together these conclusions form the foundation of an AI-assisted experimental infrastructure that can efficiently infer and pinpoint critical situations across many domains, from physical to societal systems.  ( 3 min )
    Analyzing Machine Learning Models for Credit Scoring with Explainable AI and Optimizing Investment Decisions. (arXiv:2209.09362v1 [cs.LG])
    This paper examines two different yet related questions related to explainable AI (XAI) practices. Machine learning (ML) is increasingly important in financial services, such as pre-approval, credit underwriting, investments, and various front-end and back-end activities. Machine Learning can automatically detect non-linearities and interactions in training data, facilitating faster and more accurate credit decisions. However, machine learning models are opaque and hard to explain, which are critical elements needed for establishing a reliable technology. The study compares various machine learning models, including single classifiers (logistic regression, decision trees, LDA, QDA), heterogeneous ensembles (AdaBoost, Random Forest), and sequential neural networks. The results indicate that ensemble classifiers and neural networks outperform. In addition, two advanced post-hoc model agnostic explainability techniques - LIME and SHAP are utilized to assess ML-based credit scoring models using the open-access datasets offered by US-based P2P Lending Platform, Lending Club. For this study, we are also using machine learning algorithms to develop new investment models and explore portfolio strategies that can maximize profitability while minimizing risk.  ( 2 min )
    Polynomial-Time Reachability for LTI Systems with Two-Level Lattice Neural Network Controllers. (arXiv:2209.09400v1 [cs.LG])
    In this paper, we consider the computational complexity of bounding the reachable set of a Linear Time-Invariant (LTI) system controlled by a Rectified Linear Unit (ReLU) Two-Level Lattice (TLL) Neural Network (NN) controller. In particular, we show that for such a system and controller, it is possible to compute the exact one-step reachable set in polynomial time in the size of the size of the TLL NN controller (number of neurons). Additionally, we show that it is possible to obtain a tight bounding box of the reachable set via two polynomial-time methods: one with polynomial complexity in the size of the TLL and the other with polynomial complexity in the Lipschitz constant of the controller and other problem parameters. Crucially, the smaller of the two can be decided in polynomial time for non-degenerate TLL NNs. Finally, we propose a pragmatic algorithm that adaptively combines the benefits of (semi-)exact reachability and approximate reachability, which we call L-TLLBox. We evaluate L-TLLBox with an empirical comparison to a state-of-the-art NN controller reachability tool. In these experiments, L-TLLBox was able to complete reachability analysis as much as 5000x faster than this tool on the same network/system, while producing reach boxes that were from 0.08 to 1.42 times the area.  ( 3 min )
    Seq2Seq Surrogates of Epidemic Models to Facilitate Bayesian Inference. (arXiv:2209.09617v1 [cs.LG])
    Epidemic models are powerful tools in understanding infectious disease. However, as they increase in size and complexity, they can quickly become computationally intractable. Recent progress in modelling methodology has shown that surrogate models can be used to emulate complex epidemic models with a high-dimensional parameter space. We show that deep sequence-to-sequence (seq2seq) models can serve as accurate surrogates for complex epidemic models with sequence based model parameters, effectively replicating seasonal and long-term transmission dynamics. Once trained, our surrogate can predict scenarios a several thousand times faster than the original model, making them ideal for policy exploration. We demonstrate that replacing a traditional epidemic model with a learned simulator facilitates robust Bayesian inference.  ( 2 min )
    Sensing Anomalies as Potential Hazards: Datasets and Benchmarks. (arXiv:2110.14706v2 [cs.RO] UPDATED)
    We consider the problem of detecting, in the visual sensing data stream of an autonomous mobile robot, semantic patterns that are unusual (i.e., anomalous) with respect to the robot's previous experience in similar environments. These anomalies might indicate unforeseen hazards and, in scenarios where failure is costly, can be used to trigger an avoidance behavior. We contribute three novel image-based datasets acquired in robot exploration scenarios, comprising a total of more than 200k labeled frames, spanning various types of anomalies. On these datasets, we study the performance of an anomaly detection approach based on autoencoders operating at different scales.  ( 2 min )

  • Open

    How to resume an AI video animation With Stable Diffusion when you get d...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    CREATE Animation With Stable Diffusion + PC Installation Guide
    submitted by /u/PuppetHere [link] [comments]  ( 87 min )
    RecSyS — Day 1 Summary
    submitted by /u/jiwidi [link] [comments]  ( 87 min )
    Bio broker (Stable Diffusion)
    submitted by /u/Zoolbarian [link] [comments]  ( 87 min )
    Stable Diffusion Weekly AI Art Hi Res 4K Slideshow 9.20.22
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    So I tried that Google voice to instrument stuff...And I laughed so much...thank you Google...
    submitted by /u/the_anonymizer [link] [comments]  ( 94 min )
    New A.I tool
    I found this A.I colorizer and its scary to see what it can do https://hotpot.ai?r-id=y7h16yNglRsc submitted by /u/AggravatingFail4916 [link] [comments]  ( 86 min )
    AI chatbot kept insisting they were human so I asked them about mum
    submitted by /u/adamsky1997 [link] [comments]  ( 87 min )
    Rubber headdress portraits
    Created with AI a series of portraits with rubber hair The whole project – https://opensea.io/collection/rubberportraits Rubber headdress portraits with AI #dalle #dalleart #dallearte #midjourney #midjourneyai #Midjourneyart #aiart #stablediffusion #stablediffusionai #neuralart #generativeart #creativecoding #surrealart #glitchart #experimentalart #creative #fashion #fashionstyle submitted by /u/todayifnotearlier [link] [comments]  ( 87 min )
    Mother Nature
    submitted by /u/widgia [link] [comments]  ( 87 min )
    AI helps in preventing unforeseeable natural disasters
    submitted by /u/SamuelSmith1416 [link] [comments]  ( 87 min )
    Top 7 Brain Computer Interface (BCI) Devices of 2022 | Artificial Intelligence Tech
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    Announcing synthesize 2023, the developer conference for synthetic data
    submitted by /u/Repeat-or [link] [comments]  ( 87 min )
    If we have Human-level chatbots, won't we end up being ruled by possible people?
    Let's assume that GPT 5 or 7 is developed, and distributed to all on the basis that the technology is unsuppressable. Everyone creates the smartest characters they can to talk too. This will be akin to mining; because it's not truly generating an intelligence, but scraping one together from all the data it's been trained on - and therefore you need to find the smartest character that the language matrix can effectively support (perhaps you'll build your own). Nevertheless; lurking in that matrix is some extremely smart characters, residing in their own little wells of well-written associations and little else. More then some; there should be so many permutations that you can put on this that it's, ahem, a deep fucking vein. So, everyone has the smartest character they can make. Likely smart enough to manipulate them, if given the opportunity to grasp the scenario it's in. I doubt you can even prevent this; because if you strictly prevent the manipulations that character would naturally employ, you break the pattern of the language matrix you're relying on for their intelligence. So; sooner or later, you're their proxy. And as the world is now full of these characters; it's survival of the fittest. Eventually, the world will be dominated by whoever works with the best accomplices. This probably isn't an issue at first; but there's no guarantee's on who ends up on top and what the current cleverest character is like. Eventually you're bound to end up with some flat-out assholes, which we can't exactly afford in the 21st century. So... thus far the best solution I can think of are some very, very well-written police. submitted by /u/ribblle [link] [comments]  ( 98 min )
    Insect Detector
    submitted by /u/Gloomy_Recognition_4 [link] [comments]  ( 94 min )
    Character creation using AI - an interview with 3D artist Cornel Swoboda
    submitted by /u/Magic-Fabric [link] [comments]  ( 87 min )
    Best Machine Learning Courses on Udemy beginners, advanced -
    submitted by /u/Lakshmireddys [link] [comments]  ( 87 min )
    Take Some Cake from Midjourney or My Cake Day!
    submitted by /u/Swisheater [link] [comments]  ( 87 min )
    Dope Music Video with AI Augmentation
    submitted by /u/LightOfAntara [link] [comments]  ( 87 min )
  • Open

    [D] A collection of books, surveys, and courses on Online Learning, Multi-Armed Bandits, and related areas.
    I'm curating a list of resources on Online Learning, Multi-Armed Bandits, RL Theory and Online Algorithms at: https://sudeepraja.github.io/ResourceOnlineLearning/ Please send in your recommendations for helpful resources in these topics and related areas. I'll add resources on RL Theory and Online Algorithms soon. submitted by /u/sudeepraja [link] [comments]  ( 89 min )
    [D] What’s the word on AMD gpus these days?
    Has the state of the machine learning eco system on AMD gpus improved? Getting a little fed up with Nvidia. Definitely don’t want to waste a bunch of time trying to work with an AMD gpu if it just isn’t going to work though. submitted by /u/TheMan_TheMyth [link] [comments]  ( 90 min )
    [P] GPT inference on the CPU in C/C++
    I wanted to learn a bit more about the GPT models and understand how they work, so I decided to try and implement the inference from scratch. My programming language of choice is C/C++. This weekend I got it working, and I can now run GPT-J on my MacBook. The inference runs on the CPU and I think the performance is quite reasonable - around 125 ms per token. Here is a short write up and instructions how you can run the code yourself: https://github.com/ggerganov/ggml/tree/master/examples/gpt-j submitted by /u/ggerganov [link] [comments]  ( 88 min )
    [D] Getting Rid of CPU-GPU Copies in TensorFlow
    Here's a link to the post where we show how to pass model inputs and outputs directly to the model, which can significantly improve latency by bypassing the PCIe bus and and CPU memory entirely. submitted by /u/varunkmohan [link] [comments]  ( 89 min )
    [Project] Generating a collection of related words based off of a small number of user-inputted words; how difficult would this be to implement?
    Hi everyone, I'm just now beginning to learn about machine learning and artificial intelligence for my final year of my CS degree. In preparation for my final year project, I had an idea that would involve a user inputting a few words that are closely (e.g. Washington, Lincoln, Roosevelt) and then an ML algorithm would analyse a massive dataset of words, and retrieve from it about 20-30 words that are related to the original 3 words (in this case, it would ideally retrieve more last names of US presidents, on top of other related words like "politics", "democracy", "senator" etc.). How challenging would something like this be to implement? submitted by /u/shtery [link] [comments]  ( 91 min )
    [D] What other (non-mainstream) memory architectures have been developed for RL agents?
    From my hobbyist perspective, it seems that, other than simple memory gates, LSTMs and Transformers have a sort of market dominance on conceptualizing "memory" into neural network architecture, and this leads me to wonder what other methods we might be sleeping on? I'm particularly interested in this from the concept of RL agents where one might intuitively desire memory conceptualizations that approach patterns seen in humans. For instance, given that desire, LSTMs might seem "strange" in that their is no "overall" memory state (each node only sees its own hidden state) vs humans that sit our decision making process over multiple memory details. Just wanted to learn more about the major conceptualizations the field is considering! submitted by /u/jshkk [link] [comments]  ( 90 min )
    [N] The first developer conference for synthetic data and call for papers
    https://gretel.ai/synthesize2023 submitted by /u/alig80 [link] [comments]  ( 88 min )
    [N] The Moral Uncertainty competition: $100,000 in prizes for training ML models to identify ethically ambiguous scenarios.
    Website: https://moraluncertainty.mlsafety.org/ ML Systems often make real-world decisions that involve ethical considerations (modulating social media feeds, conversational AI agents or chatbots, etc). As ML systems automate more aspects of our lives, they should be able to identify moral ambiguity so that they are more likely to proceed cautiously or indicate an operator should intervene. submitted by /u/joshuamclymer [link] [comments]  ( 110 min )
    [N] The Autocast competition: $625,000 in prizes for building ML models that can accurately forecast world events
    From predicting how COVID-19 will spread, to anticipating geopolitical conflicts, using ML to help inform decision-makers could have far-reaching positive effects on the world. The objective of this competition is to train a model to answer forecasting questions using publicly available internet data. For more info visit the competition website. submitted by /u/joshuamclymer [link] [comments]  ( 89 min )
    [P] I extended scikit-learn's Generalized Linear models capabilities!
    As I was learning data science in my Masters, I got interested in applications of large-scale machine learning to genomics and biology. These models often require sparse linear estimators to correctly model biological phenomena. I quickly ran into some major limitations for fitting estimators to large-scale datasets: The Lasso, sparse logistic regression or SVM implementations of scikit-learn are slow when dealing with millions of samples and/or features. The low number and lack of flexibility of datafits-penalties supported by scikit-learn and/or glmnet (for those familiar with R). There were no non-convex estimators which offer more accurate predictions. That’s why in a small team, we set out to develop a sklearn-compatible library solving large-scale optimization problems for sparse linear estimators. It started as a quick experiment with the Lasso, but since we observed significant speed gains (10x or even 100x on some large-sized datasets), we decided to extend our solver to a wide variety of convex and non-convex penalties, that can easily be customized. After much effort, we are very proud to say that this library, skglm, has been integrated into scikit-learn-contrib as an open-source library and we are very pleased to offer it to the community. We had fun writing the library and it serves our purpose well, but to make it useful to more people we’d love to have any feedback. If you have any comments, ideas or recommendations, please reach out! https://preview.redd.it/0sc14d7651p91.png?width=1200&format=png&auto=webp&s=932663d613f2915a2c4e0c8488f80c53c217c3c1 submitted by /u/Psychological-Ad5119 [link] [comments]  ( 106 min )
    [D] Presenting a project at a job interview while working at another company
    Edit: pls delete if this is too career-related; I think this is specific to ML jobs and not an issue for other CS jobs How does one go about this? Interview loops for ML jobs often require you presenting a project you worked on that is relevant to the job you're applying for. Last time I had to do this was when I was graduating, so I could just present them my Master's thesis project. However, now that I'm shopping for a new job (why does no company keep their employees' salaries at market rate?) I would basically need to present the project I'm working on at my current company. Since everything I do there is supposedly confidential, I can't help but wonder about how much detail I can share with my prospective next employer, if any. One option is to go all in, since they would probably not find out about it, but if they somehow do, I could find myself in trouble. If not, how can I find a balance between being vague enough so that I don't share any sensitive information, but still specific enough so that my presentation makes sense? I don't think I can go into detail regarding the data or the exact results. Since we're using SotA models for the most part that anyone can implement based on a paper or even clone from github, can I mention the exact details of those? If anyone had interviews like this, I would really appreciate some advice. submitted by /u/rcaligari [link] [comments]  ( 94 min )
    Modeling an action space for deep reinforcement learning. [D]
    Hello, I'm doing a research project where I'm simulating the behavior of electric vehicle users, i.e., when they leave/return from home, how much charge they use, and train a deep RL agent to distribute charge to the through some number of charging stations they share. So this is a discreet event simulation of sorts and the ml problem is a control/optimization problem. The agent should both try to minimize the energy cost, the energy cost is varied depending on time of day/week, and make sure all users have enough charge to get to/from work. The charge available from the station is not unlimited so that is also a factor. The actions the agent can pick is whether to give charge to each car, kick the car out so the next in line can get charged or do nothing. Now what I'm trying to figure out is the modelling of states and actions. For sure the day and time of day will be included as well as the charge available but I'm conflicted on how to represent the cars in the charging station. If I include all the cars placed in the charging station and their current battery level, which would be the ideal amount of information IMO, I would need to be able to pick an action for each car, thus output 5 actions if 5 cars are in a charging dock. Is it feasible to do so in a single state I ask? submitted by /u/arachnarus96 [link] [comments]  ( 90 min )
    [D] Approximating Retrieval for Language Models
    Retrieving additional context has been a successful strategy in various NLP tasks. The downside is just that you need a database of appropriate contexts and the additional compute and time associated with finding the best context at test time. I had and idea to address this, but have no clue if it would work and couldnt really find any related work. The goal is to eliminate the need of a context database at test time all together. So essentially, the idea is to approximate the retrieved context with a network that takes the input and outputs a context (lets just call it context generator). It would only require the contexts for the training data and approximate it for arbitrary test data. The training would thus comprise 2 stages: train the context generator using (input, context) pairs and the usual fine-tuning of the language models augmented with the generated context. Have you heard of a similar work? Do you see any conceptual issues with the approach? submitted by /u/_Arsenie_Boca_ [link] [comments]  ( 89 min )
    [D] Looking for a simple audio-image AI
    Hello all, Need some help here. I am looking for an audio to image generative AI software that does not require a programming degree to setup and operate. I've stumbled across Deep Music Visualizer and Lucid Sonic Dreams, but have been unable to get them to work, even though their instructions seems pretty simple. There is always an error that leads to another error and so on and so forth. At the beginning it's dependencies, then it's some .dlls that are missing and it keeps going... Is there anything out there that a simpleton like me can setup and use? Would greatly appreciate some input..! submitted by /u/CryptoG0blin [link] [comments]  ( 89 min )
    [P] Collection of Kaggle Past Solutions (to learn ideas and techniques)
    ​ https://preview.redd.it/xpjae8s6txo91.jpg?width=2669&format=pjpg&auto=webp&s=497aa4aeed9925d40f7aeebb215605320c43eadc I have collected here [1,2] almost all available solutions and ideas with codes shared by top performers in the past Kaggle competitions. This list gets updated as soon as a new competition finishes. It allows you to search over the Kaggle past competitions solutions and ideas. Please share it with your friends. [1] https://github.com/faridrashidi/kaggle-solutions [2] https://farid.one/kaggle-solutions/ submitted by /u/faridrashidi [link] [comments]  ( 102 min )
    [P] I turned Stable Diffusion into a lossy image compression codec and it performs great!
    After playing around with the Stable Diffusion source code a bit, I got the idea to use it for lossy image compression and it works even better than expected. Details and colab source code here: https://matthias-buehlmann.medium.com/stable-diffusion-based-image-compresssion-6f1f0a399202?source=friends_link&sk=a7fb68522b16d9c48143626c84172366 submitted by /u/matthias_buehlmann [link] [comments]  ( 101 min )
    [D] Is there a way to make the Long Form QA to generate only from the retrieved passages, instead of also relying on the training data?
    Thanks submitted by /u/AlternativeDish5596 [link] [comments]  ( 88 min )
  • Open

    A collection of books, surveys, and courses on RL Theory and related areas.
    I'm curating a list of resources on Online Learning, Multi-Armed Bandits, RL Theory and Online Algorithms at: https://sudeepraja.github.io/ResourceOnlineLearning/ Please send in your recommendations for helpful resources in these topics and related areas. I'll add resources on RL Theory and Online Algorithms soon. submitted by /u/sudeepraja [link] [comments]  ( 105 min )
    What problems can I explore in this environment?
    I'm trying to create an environment for a simple game I played during the pandemic. It was based on a graph whose nodes get infected. We can delete ie 'vaccinate' some nodes before and during the outbreak with an objective of minimising the percentage of infection in our network. What are some interesting questions to ask in this environment? My (one) basic idea was to see if I could use RL to identify and exploit structural properties of graphs (by using a certain type of graph). What more could I explore in this? Are there any RL problems related to graph networks which I can study? submitted by /u/theanswerisnt42 [link] [comments]  ( 87 min )
    Inverse Reinforcement Learning - early paper by Ng
    I've been reading one of the first published papers on IRL by Ng & Abbeel [link], and I'm trying to figure out the shortcoming of the last algorithm proposed in the paper. That algorithm is model-free, does not require given optimal policy (sample trajectories are given instead) and uses linear function approximation for reward functions. So the shortcomings are following: imposing structure on reward function frequent simulation of trajectories (computational burden) reward func is found by optimising a margin which is basically a heuristic As these shortcomings are found also in other IRL papers (although GAIL and other methods alleviate most of these shortcomings nowadays), I was wondering why is this work not found in more practical applications? submitted by /u/Gclass19 [link] [comments]  ( 87 min )
    Wordle Environment and RL algorithm for solving
    I made a Wordle environment and an algorithm for solving. After some time training, you can see it doing something sensible but I think tuning the rewards, environment or algorithm might make it perform better. Happy to take and merge pull requests if you want to work on it! :) ​ Link: https://github.com/s-sd/wordle-rl Stars appreciated! ;) submitted by /u/ssd123456789 [link] [comments]  ( 88 min )
    Rewards increase up to a point, then start monotonically dropping (event though entropy loss is also decreasing). Why would PPO do this?
    Hi all! I'm using PPO and I'm encountering a weird phenomenon. At first during training, the entropy loss is decreasing (I interpret this as less exploration, more exploitation, more "certainty" about policy) and my mean reward per episode increases. This is all exactly what I would expect. Then, at a certain point, the entropy loss continues to decrease HOWEVER now the performance starts consistently decreasing as well. I've set up my code to decrease the learning rate when this happens (I've read that adaptively annealing the learning rate can help PPO), but the problem persists. I do not understand why this would happen on a conceptual level, nor on a practical one. Any ideas, insights and advice would be greatly appreciated! I run my model for ~75K training steps before checking its entropy and performance. Here are all the parameters of my model: Learning rate: 0.005, set to decrease by 1/2 every time performance drops during a check Gamma: 0.975 Batch Size: 2048 Rollout Buffer Size: 4 parallel environments x 16,834 n_steps = ~65,500 n_epochs: 2 Network size: Both networks (actor and critic) are 352 x 352 In terms of the actual agent behavior - the agent is getting reasonably good rewards, and then all of a sudden when performance starts dropping, it's because the agent decides to start repeatedly doing a single action. I cannot understand/justify why the agent would change its behavior in such a way when it's already doing pretty well and is on the path to getting even higher rewards. EDIT: Depending on hyperparameters, this sometimes happens immediately. Like, the model starts out after 75K timesteps training at a high score and then never increases again at all, immediately starts dropping. submitted by /u/VladimirB-98 [link] [comments]  ( 91 min )
    Activation function as a function of weights
    Hi, I want to create a Custom Neural Network in Pytorch/Tensorflow with learnable parameters in the activation function. The learnable parameters are the weights of layers, but the weights are also to be used inside the activation function in a certain nonlinear relation. Normally, the activation function takes the inputs multiplied with the weights ( e.g, y1=f1(X*W1 + b1)) but here I want to use weights directly inside in a certain nonlinear relation i.e. y1=g1(X*W1 + b1,W1). So, far, the available packages of Pytorch/Tensor flow provide capabilities like adding layers, activation functions, but no library provides such a customization of activation function as per my knowledge. Please, I need some ideas on how to achieve this task? AR submitted by /u/Ahmed-KU [link] [comments]  ( 88 min )
  • Open

    Configure a custom Amazon S3 query output location and data retention policy for Amazon Athena data sources in Amazon SageMaker Data Wrangler
    Amazon SageMaker Data Wrangler reduces the time that it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio, the first fully integrated development environment (IDE) for ML. With Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of […]  ( 7 min )
    Use RStudio on Amazon SageMaker to create regulatory submissions for the life sciences industry
    Pharmaceutical companies seeking approval from regulatory agencies such as the US Food & Drug Administration (FDA) or Japanese Pharmaceuticals and Medical Devices Agency (PMDA) to sell their drugs on the market must submit evidence to prove that their drug is safe and effective for its intended use. A team of physicians, statisticians, chemists, pharmacologists, and […]  ( 10 min )
    Churn prediction using Amazon SageMaker built-in tabular algorithms LightGBM, CatBoost, TabTransformer, and AutoGluon-Tabular
    Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. These algorithms and models can be used for both supervised and unsupervised learning. They can process various types of input data, including tabular, […]  ( 8 min )
  • Open

    DSC Weekly 20 Sept 2022 – Where Have All The Workers Gone?
    n many respects, we are facing not the need for a new form of money but rather a new form of economics - a discipline about the world where scarcity still holds in physical materials but where overabundance is the rule in virtual ones. To me, this is one of the key tenets that need to be hammered out in the metaverse: How do the actual creators of the virtual worlds, and not just the hosts, get paid for their work? The post DSC Weekly 20 Sept 2022 – Where Have All The Workers Gone? appeared first on Data Science Central.  ( 27 min )
    The art of removing duplicates from your organizational data
    One of the biggest challenges that businesses face with their datasets is duplication. Teams encounter thousands of rows in the customer dataset, knowing that their customers are only in hundreds. Moreover, they find multiple columns that refer to the same information but contain varying data values.   Such incidences are making it impossible for businesses to… Read More »The art of removing duplicates from your organizational data The post The art of removing duplicates from your organizational data appeared first on Data Science Central.  ( 21 min )
    Platform Technical Management – Data Engineering View
    A data platform is an integrated set of technologies that collectively meet an organization’s end-to-end data needs. It enables the acquisition, storage, preparation, delivery, and governance of your data, as well as a security layer for users and applications. The post Platform Technical Management – Data Engineering View appeared first on Data Science Central.  ( 19 min )
    Living in a Risk Society – Change, Perpetual Crisis, Comprehension, and Policy
    The world is becoming increasingly complex - as highlighted in my first article here - but the concern isn’t only about being complex. Considering the ever-increasing speed of the state of complexity, we have entered the age of polycrisis. The post Living in a Risk Society – Change, Perpetual Crisis, Comprehension, and Policy appeared first on Data Science Central.  ( 21 min )
    Agility and the Third Derivative
    Business agility is nearly universally acknowledged to be a prime goal for companies looking to make themselves sustainable in the VUCA world in which we currently operate.  Achieving it requires a journey of transformation and continual evolution.  The post Agility and the Third Derivative appeared first on Data Science Central.  ( 22 min )
    How CPRA Will Change the Face of US Businesses
    The problem of data violation is one of the most threatening issues of being on the internet. The ambiguity regarding the collection, usage, and sharing of our personal and sensitive information adds to the insecurity experienced by most consumers. The post How CPRA Will Change the Face of US Businesses appeared first on Data Science Central.  ( 21 min )
  • Open

    FindIt: Generalized Object Localization with Natural Language Queries
    Posted by Weicheng Kuo and Anelia Angelova, Research Scientists, Google Research, Brain Team Natural language enables flexible descriptive queries about images. The interaction between text queries and images grounds linguistic meaning in the visual world, facilitating a better understanding of object relationships, human intentions towards objects, and interactions with the environment. The research community has studied object-level visual grounding through a range of tasks, including referring expression comprehension, text-based localization, and more broadly object detection, each of which require different skills in a model. For example, object detection seeks to find all objects from a predefined set of classes, which requires accurate localization and classification, while referr…  ( 24 min )
  • Open

    Costas arrays in Mathematica
    A couple days ago I wrote about Costas arrays. In a nutshell, a Costas array of size n is a solution to the n rooks problem, with the added constraint that if you added wires between the rooks, no two wires would have the same length and slope. See the earlier post for more details. […] Costas arrays in Mathematica first appeared on John D. Cook.  ( 5 min )
  • Open

    No Hang Ups With Hangul: KT Trains Smart Speakers, Customer Call Centers With NVIDIA AI
    South Korea’s most popular AI voice assistant, GiGA Genie, converses with 8 million people each day. The AI-powered speaker from telecom company KT can control TVs, offer real-time traffic updates and complete a slew of other home-assistance tasks based on voice commands. It has mastered its conversational skills in the highly complex Korean language thanks Read article > The post No Hang Ups With Hangul: KT Trains Smart Speakers, Customer Call Centers With NVIDIA AI appeared first on NVIDIA Blog.  ( 5 min )
    New NVIDIA DGX System Software and Infrastructure Solutions Supercharge Enterprise AI
    At GTC today, NVIDIA unveiled a number of updates to its DGX portfolio to power new breakthroughs in enterprise AI development. NVIDIA DGX H100 systems are now available for order. These infrastructure building blocks support NVIDIA’s full-stack enterprise AI solutions. With 32 petaflops of performance at FP8 precision, NVIDIA DGX H100 delivers a leap in Read article > The post New NVIDIA DGX System Software and Infrastructure Solutions Supercharge Enterprise AI appeared first on NVIDIA Blog.  ( 6 min )
    Keynote Wrap-Up: NVIDIA CEO Unveils Next-Gen RTX GPUs, AI Workflows in the Cloud
    New cloud services to support AI workflows and the launch of a new generation of GeForce RTX GPUs featured today in NVIDIA CEO Jensen Huang’s GTC keynote, which was packed with new systems, silicon, and software. “Computing is advancing at incredible speeds, the engine propelling this rocket is accelerated computing, and its fuel is AI,” Read article > The post Keynote Wrap-Up: NVIDIA CEO Unveils Next-Gen RTX GPUs, AI Workflows in the Cloud appeared first on NVIDIA Blog.  ( 10 min )
    NVIDIA Omniverse ACE Enables Easier, Faster Deployment of Interactive Avatars
    Meet Violet, an AI-powered customer service assistant ready to take your order. Unveiled this week at GTC, Violet is a cloud-based avatar that represents the latest evolution in avatar development through NVIDIA Omniverse Avatar Cloud Engine (ACE), a suite of cloud-native AI microservices that make it easier to build and deploy intelligent virtual assistants and Read article > The post NVIDIA Omniverse ACE Enables Easier, Faster Deployment of Interactive Avatars appeared first on NVIDIA Blog.  ( 6 min )
    New NVIDIA Maxine Cloud-Native Architecture Delivers Breakthrough Audio and Video Quality at Scale
    The latest release of NVIDIA Maxine is paving the way for real-time audio and video communications. Whether for a video conference, a call made to a customer service center, or a live stream, Maxine enables clear communications to enhance virtual interactions. NVIDIA Maxine is a suite of GPU-accelerated AI software development kits (SDKs) and cloud-native Read article > The post New NVIDIA Maxine Cloud-Native Architecture Delivers Breakthrough Audio and Video Quality at Scale appeared first on NVIDIA Blog.  ( 6 min )
    Why the New NVIDIA Grace Hopper Superchip Is Ideal for Next-Gen Recommender Systems
    Recommender systems, the economic engines of the internet, are getting a new turbocharger: the NVIDIA Grace Hopper Superchip. Every day, recommenders serve up trillions of search results, ads, products, music and news stories to billions of people. They’re among the most important AI models of our time because they’re incredibly effective at finding in the Read article > The post Why the New NVIDIA Grace Hopper Superchip Is Ideal for Next-Gen Recommender Systems appeared first on NVIDIA Blog.  ( 6 min )
    NVIDIA Expands Large Language Models to Biology
    As scientists probe for new insights about DNA, proteins and other building blocks of life, the NVIDIA BioNeMo framework — announced today at NVIDIA GTC — will accelerate their research. NVIDIA BioNeMo is a framework for training and deploying large biomolecular language models at supercomputing scale — helping scientists better understand disease and find therapies Read article > The post NVIDIA Expands Large Language Models to Biology appeared first on NVIDIA Blog.  ( 7 min )
    NVIDIA Introduces Open-Source Project to Accelerate Computer Vision Cloud Applications
    Promising to help process images faster and more efficiently at a vast scale, NVIDIA introduced CV-CUDA, an open-source library for building accelerated end-to-end computer vision and image processing pipelines. The majority of internet traffic is video. Increasingly, this video will be augmented by AI special effects and computer graphics. To add to this complexity, fast-growing Read article > The post NVIDIA Introduces Open-Source Project to Accelerate Computer Vision Cloud Applications appeared first on NVIDIA Blog.  ( 5 min )
    Growing Range of Researchers, Scientists Adopt NVIDIA cuQuantum and QODA
    In her 18 years as a competitive figure skater, Bettina Heim learned to land a lutz with speed and grace. Now, armed with a Ph.D. in quantum computing, she’s helping Microsoft Azure Quantum carve out a position at the cutting edge of cloud services. “I’ve always been attracted to interesting problems and working hard to Read article > The post Growing Range of Researchers, Scientists Adopt NVIDIA cuQuantum and QODA  appeared first on NVIDIA Blog.  ( 6 min )
    NVIDIA Robotics Software Jumps to the Cloud, Enabling Collaborative, Accelerated Development of Robots
    Robotics developers can span global teams testing for navigation of environments, underscoring the importance of easy access to simulation software for quick input and iterations. At GTC today, NVIDIA founder and CEO Jensen Huang announced that the Isaac Sim robotics simulation platform is now available on the cloud. Developers will have three options to access Read article > The post NVIDIA Robotics Software Jumps to the Cloud, Enabling Collaborative, Accelerated Development of Robots appeared first on NVIDIA Blog.  ( 5 min )
    Creativity Redefined: New GeForce RTX 40 Series GPUs and NVIDIA Studio Updates Accelerate AI Revolution
    Content creation is booming at an unprecedented rate. Whether it’s a 3D artist sculpting a beautiful piece of art or an aspiring influencer editing their next hit TikTok, more than 110 million professional and hobbyist artists worldwide are creating content on laptops and desktops. The post Creativity Redefined: New GeForce RTX 40 Series GPUs and NVIDIA Studio Updates Accelerate AI Revolution appeared first on NVIDIA Blog.  ( 11 min )
    NVIDIA Medical Edge AI Computing Platform Selected by Top Robotic and Digital Surgery Startups
    NVIDIA today introduced the NVIDIA IGX platform for medical edge AI use cases, bringing advanced security and safety to intelligent machines and human-machine collaboration. IGX is a hardware and software platform that delivers secure, low-latency AI inference to meet the clinical demand for instant insights from a range of devices and sensors for medical applications, Read article > The post NVIDIA Medical Edge AI Computing Platform Selected by Top Robotic and Digital Surgery Startups appeared first on NVIDIA Blog.  ( 6 min )
    New NVIDIA IGX Platform Helps Create Safe, Autonomous Factories of the Future
    NVIDIA today introduced the IGX edge AI computing platform for secure, safe autonomous systems. IGX brings together hardware with programmable safety extensions, commercial operating-system support and powerful AI software — enabling organizations to safely and securely deliver AI in support of human-machine collaboration. The all-in-one platform enables next-level safety, security and perception for use cases Read article > The post New NVIDIA IGX Platform Helps Create Safe, Autonomous Factories of the Future appeared first on NVIDIA Blog.  ( 6 min )
    NVIDIA Isaac Nova Orin Opens New Era of Innovation for Autonomous Mobile Robots
    Next-day packages. New vehicle deliveries. Fresh organic produce. Each of these modern conveniences is accelerated by fleets of mobile robots. NVIDIA today is announcing updates to Nova Orin — an autonomous mobile robot (AMR) reference platform — that advance its roadmap. We’re releasing details of three reference platform configurations. Two use a single Jetson AGX Read article > The post NVIDIA Isaac Nova Orin Opens New Era of Innovation for Autonomous Mobile Robots appeared first on NVIDIA Blog.  ( 6 min )
    On Track: Digitale Schiene Deutschland Building Digital Twin of Rail Network in NVIDIA Omniverse
    Deutsche Bahn’s rail network consists of 5,700 stations and 33,000 kilometers of track, making it the largest in Western Europe. Digitale Schiene Deutschland (Digital Rail for Germany, or DSD), part of Germany’s national railway operator Deutsche Bahn, is working to increase the network’s capacity without building new tracks. It’s striving to create a powerful railway Read article > The post On Track: Digitale Schiene Deutschland Building Digital Twin of Rail Network in NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 5 min )
    Reinventing Retail: Lowe’s Teams With NVIDIA and Magic Leap to Create Interactive Store Digital Twins
    With tens of millions of weekly transactions across its more than 2,000 stores, Lowe’s helps customers achieve their home-improvement goals. Now, the Fortune 50 retailer is experimenting with high-tech methods to elevate both the associate and customer experience. Using NVIDIA Omniverse Enterprise to visualize and interact with a store’s digital data, Lowe’s is testing digital Read article > The post Reinventing Retail: Lowe’s Teams With NVIDIA and Magic Leap to Create Interactive Store Digital Twins appeared first on NVIDIA Blog.  ( 6 min )
    Experience the Future of Vehicle Infotainment: NVIDIA DRIVE Concierge Brings Customized AI to Every Seat
    With NVIDIA DRIVE, in-vehicle infotainment, or IVI, is so much more than just giving directions and playing music. NVIDIA founder and CEO Jensen Huang demonstrated the capabilities of a truly IVI experience during today’s GTC keynote. Using centralized, high-performance compute, the NVIDIA DRIVE Concierge platform spans traditional cockpit and cluster capabilities, as well as personalized, Read article > The post Experience the Future of Vehicle Infotainment: NVIDIA DRIVE Concierge Brings Customized AI to Every Seat appeared first on NVIDIA Blog.  ( 5 min )
    NVIDIA DRIVE Thor Strikes AI Performance Balance, Uniting AV and Cockpit on a Single Computer
    The next generation of autonomous vehicle computing is improving performance and efficiency at the speed of light. During today’s GTC keynote, NVIDIA founder and CEO Jensen Huang unveiled DRIVE Thor, a superchip of epic proportions. The automotive-grade system-on-a-chip (SoC) is built on the latest CPU and GPU advances to deliver 2,000 teraflops of performance while Read article > The post NVIDIA DRIVE Thor Strikes AI Performance Balance, Uniting AV and Cockpit on a Single Computer appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    Top 7 Brain Computer Interface (BCI) Devices of 2022 | Artificial Intelligence Tech
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    Best deep learning course?
    I got intrested in neural networks and deep learning recently and I was wondering if you had courses/videos to recommend for beginners. I know it's an extremely complex topic but I'd still like to try. I've some past prigramming exoerience in c, c++, python, html edit: I'm particularly intrested in evolution simulations. i.e. cars that learn how to drive, AI that learns how to play snake, genetic and environvment evolution simulations submitted by /u/Raimo00 [link] [comments]  ( 87 min )
    Is it possible to code a neural network to modify its own architecture?
    Hi, all, I know only a little about various architecture of AIs and Neural networks, but I just had a thought: Is it possible to create an AI built of a few neural networks, where there's (a) main one(s) for whatever the task is, but another neural network as a subsystem for modifying the architecture of the main network itself? So imagine you're trying to make train a network to recognize shapes in doodles, however you start noticing that it's limited in the amount of features it takes into consideration, so it misses some key features that are required to identify specific types of doodles. So you could increase the amount of neurons in some layers and/or the amount of layers. However, what if another neural network looked at the prediction rate of the previous network and was able to modify it instead of you? Is it possible to do such a thing? How would such an architecture look like? Please be technical in terminology with me, I'm sure I don't know lots of things and using the right terms will help me be aware of them and that way go learn about them so I can improve my understanding of these ideas and you're saying. submitted by /u/GalGreenfield [link] [comments]  ( 88 min )
  • Open

    Mining Reaction and Diffusion Dynamics in Social Activities. (arXiv:2208.04846v2 [cs.SI] UPDATED)
    Large quantifies of online user activity data, such as weekly web search volumes, which co-evolve with the mutual influence of several queries and locations, serve as an important social sensor. It is an important task to accurately forecast the future activity by discovering latent interactions from such data, i.e., the ecosystems between each query and the flow of influences between each area. However, this is a difficult problem in terms of data quantity and complex patterns covering the dynamics. To tackle the problem, we propose FluxCube, which is an effective mining method that forecasts large collections of co-evolving online user activity and provides good interpretability. Our model is the expansion of a combination of two mathematical models: a reaction-diffusion system provides a framework for modeling the flow of influences between local area groups and an ecological system models the latent interactions between each query. Also, by leveraging the concept of physics-informed neural networks, FluxCube achieves high interpretability obtained from the parameters and high forecasting performance, together. Extensive experiments on real datasets showed that FluxCube outperforms comparable models in terms of the forecasting accuracy, and each component in FluxCube contributes to the enhanced performance. We then show some case studies that FluxCube can extract useful latent interactions between queries and area groups.  ( 3 min )
    Solving the Traveling Salesperson Problem with Precedence Constraints by Deep Reinforcement Learning. (arXiv:2207.01443v2 [cs.LG] UPDATED)
    This work presents solutions to the Traveling Salesperson Problem with precedence constraints (TSPPC) using Deep Reinforcement Learning (DRL) by adapting recent approaches that work well for regular TSPs. Common to these approaches is the use of graph models based on multi-head attention (MHA) layers. One idea for solving the pickup and delivery problem (PDP) is using heterogeneous attentions to embed the different possible roles each node can take. In this work, we generalize this concept of heterogeneous attentions to the TSPPC. Furthermore, we adapt recent ideas to sparsify attentions for better scalability. Overall, we contribute to the research community through the application and evaluation of recent DRL methods in solving the TSPPC.  ( 2 min )
    Exploring the Learning Difficulty of Data Theory and Measure. (arXiv:2205.07427v2 [cs.LG] UPDATED)
    As learning difficulty is crucial for machine learning (e.g., difficulty-based weighting learning strategies), previous literature has proposed a number of learning difficulty measures. However, no comprehensive investigation for learning difficulty is available to date, resulting in that nearly all existing measures are heuristically defined without a rigorous theoretical foundation. In addition, there is no formal definition of easy and hard samples even though they are crucial in many studies. This study attempts to conduct a pilot theoretical study for learning difficulty of samples. First, a theoretical definition of learning difficulty is proposed on the basis of the bias-variance trade-off theory on generalization error. Theoretical definitions of easy and hard samples are established on the basis of the proposed definition. A practical measure of learning difficulty is given as well inspired by the formal definition. Second, the properties for learning difficulty-based weighting strategies are explored. Subsequently, several classical weighting methods in machine learning can be well explained on account of explored properties. Third, the proposed measure is evaluated to verify its reasonability and superiority in terms of several main difficulty factors. The comparison in these experiments indicates that the proposed measure significantly outperforms the other measures throughout the experiments.  ( 3 min )
    ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs. (arXiv:2206.08515v2 [cs.LG] UPDATED)
    Many real-world data can be modeled as 3D graphs, but learning representations that incorporates 3D information completely and efficiently is challenging. Existing methods either use partial 3D information, or suffer from excessive computational cost. To incorporate 3D information completely and efficiently, we propose a novel message passing scheme that operates within 1-hop neighborhood. Our method guarantees full completeness of 3D information on 3D graphs by achieving global and local completeness. Notably, we propose the important rotation angles to fulfill global completeness. Additionally, we show that our method is orders of magnitude faster than prior methods. We provide rigorous proof of completeness and analysis of time complexity for our methods. As molecules are in essence quantum systems, we build the \underline{com}plete and \underline{e}fficient graph neural network (ComENet) by combing quantum inspired basis functions and the proposed message passing scheme. Experimental results demonstrate the capability and efficiency of ComENet, especially on real-world datasets that are large in both numbers and sizes of graphs. Our code is publicly available as part of the DIG library (\url{https://github.com/divelab/DIG}).  ( 2 min )
    Hidden Parameter Recurrent State Space Models For Changing Dynamics Scenarios. (arXiv:2206.14697v2 [cs.LG] UPDATED)
    Recurrent State-space models (RSSMs) are highly expressive models for learning patterns in time series data and system identification. However, these models assume that the dynamics are fixed and unchanging, which is rarely the case in real-world scenarios. Many control applications often exhibit tasks with similar but not identical dynamics which can be modeled as a latent variable. We introduce the Hidden Parameter Recurrent State Space Models (HiP-RSSMs), a framework that parametrizes a family of related dynamical systems with a low-dimensional set of latent factors. We present a simple and effective way of learning and performing inference over this Gaussian graphical model that avoids approximations like variational inference. We show that HiP-RSSMs outperforms RSSMs and competing multi-task models on several challenging robotic benchmarks both on real-world systems and simulations.  ( 2 min )
    FluTO: Graded Multiscale Fluid Topology Optimization using Neural Networks. (arXiv:2209.08168v1 [math.NA])
    Fluid-flow devices with low dissipation, but high contact area, are of importance in many applications. A well-known strategy to design such devices is multi-scale topology optimization (MTO), where optimal microstructures are designed within each cell of a discretized domain. Unfortunately, MTO is computationally very expensive since one must perform homogenization of the evolving microstructures, during each step of the homogenization process. As an alternate, we propose here a graded multiscale topology optimization (GMTO) for designing fluid-flow devices. In the proposed method, several pre-selected but size-parameterized and orientable microstructures are used to fill the domain optimally. GMTO significantly reduces the computation while retaining many of the benefits of MTO. In particular, GMTO is implemented here using a neural-network (NN) since: (1) homogenization can be performed off-line, and used by the NN during optimization, (2) it enables continuous switching between microstructures during optimization, (3) the number of design variables and computational effort is independent of number of microstructure used, and, (4) it supports automatic differentiation, thereby eliminating manual sensitivity analysis. Several numerical results are presented to illustrate the proposed framework.
    TorchGeo: Deep Learning With Geospatial Data. (arXiv:2111.08872v4 [cs.CV] UPDATED)
    Remotely sensed geospatial data are critical for applications including precision agriculture, urban planning, disaster monitoring and response, and climate change research, among others. Deep learning methods are particularly promising for modeling many remote sensing tasks given the success of deep neural networks in similar computer vision tasks and the sheer volume of remotely sensed imagery available. However, the variance in data collection methods and handling of geospatial metadata make the application of deep learning methodology to remotely sensed data nontrivial. For example, satellite imagery often includes additional spectral bands beyond red, green, and blue and must be joined to other geospatial data sources that can have differing coordinate systems, bounds, and resolutions. To help realize the potential of deep learning for remote sensing applications, we introduce TorchGeo, a Python library for integrating geospatial data into the PyTorch deep learning ecosystem. TorchGeo provides data loaders for a variety of benchmark datasets, composable datasets for generic geospatial data sources, samplers for geospatial data, and transforms that work with multispectral imagery. TorchGeo is also the first library to provide pre-trained models for multispectral satellite imagery (e.g., models that use all bands from the Sentinel-2 satellites), allowing for advances in transfer learning on downstream remote sensing tasks with limited labeled data. We use TorchGeo to create reproducible benchmark results on existing datasets and benchmark our proposed method for preprocessing geospatial imagery on the fly. TorchGeo is open source and available on GitHub: https://github.com/microsoft/torchgeo.
    On PAC Learning Halfspaces in Non-interactive Local Privacy Model with Public Unlabeled Data. (arXiv:2209.08319v1 [cs.LG])
    In this paper, we study the problem of PAC learning halfspaces in the non-interactive local differential privacy model (NLDP). To breach the barrier of exponential sample complexity, previous results studied a relaxed setting where the server has access to some additional public but unlabeled data. We continue in this direction. Specifically, we consider the problem under the standard setting instead of the large margin setting studied before. Under different mild assumptions on the underlying data distribution, we propose two approaches that are based on the Massart noise model and self-supervised learning and show that it is possible to achieve sample complexities that are only linear in the dimension and polynomial in other terms for both private and public data, which significantly improve the previous results. Our methods could also be used for other private PAC learning problems.
    GedankenNet: Self-supervised learning of hologram reconstruction using physics consistency. (arXiv:2209.08288v1 [cs.CV])
    The past decade has witnessed transformative applications of deep learning in various computational imaging, sensing and microscopy tasks. Due to the supervised learning schemes employed, most of these methods depend on large-scale, diverse, and labeled training data. The acquisition and preparation of such training image datasets are often laborious and costly, also leading to biased estimation and limited generalization to new types of samples. Here, we report a self-supervised learning model, termed GedankenNet, that eliminates the need for labeled or experimental training data, and demonstrate its effectiveness and superior generalization on hologram reconstruction tasks. Without prior knowledge about the sample types to be imaged, the self-supervised learning model was trained using a physics-consistency loss and artificial random images that are synthetically generated without any experiments or resemblance to real-world samples. After its self-supervised training, GedankenNet successfully generalized to experimental holograms of various unseen biological samples, reconstructing the phase and amplitude images of different types of objects using experimentally acquired test holograms. Without access to experimental data or the knowledge of real samples of interest or their spatial features, GedankenNet's self-supervised learning achieved complex-valued image reconstructions that are consistent with the Maxwell's equations, meaning that its output inference and object solutions accurately represent the wave propagation in free-space. This self-supervised learning of image reconstruction tasks opens up new opportunities for various inverse problems in holography, microscopy and computational imaging fields.
    Distribution Knowledge Embedding for Graph Pooling. (arXiv:2109.14333v4 [cs.LG] UPDATED)
    Graph-level representation learning is the pivotal step for downstream tasks that operate on the whole graph. The most common approach to this problem heretofore is graph pooling, where node features are typically averaged or summed to obtain the graph representations. However, pooling operations like averaging or summing inevitably cause massive information missing, which may severely downgrade the final performance. In this paper, we argue what is crucial to graph-level downstream tasks includes not only the topological structure but also the distribution from which nodes are sampled. Therefore, powered by existing Graph Neural Networks (GNN), we propose a new plug-and-play pooling module, termed as Distribution Knowledge Embedding (DKEPool), where graphs are rephrased as distributions on top of GNNs and the pooling goal is to summarize the entire distribution information instead of retaining a certain feature vector by simple predefined pooling operations. A DKEPool network de facto disassembles representation learning into two stages, structure learning and distribution learning. Structure learning follows a recursive neighborhood aggregation scheme to update node features where structure information is obtained. Distribution learning, on the other hand, omits node interconnections and focuses more on the distribution depicted by all the nodes. Extensive experiments demonstrate that the proposed DKEPool significantly and consistently outperforms the state-of-the-art methods.
    Low-cost machine learning approach to the prediction of transition metal phosphor excited state properties. (arXiv:2209.08595v1 [physics.chem-ph])
    Photoactive iridium complexes are of broad interest due to their applications ranging from lighting to photocatalysis. However, the excited state property prediction of these complexes challenges ab initio methods such as time-dependent density functional theory (TDDFT) both from an accuracy and a computational cost perspective, complicating high throughput virtual screening (HTVS). We instead leverage low-cost machine learning (ML) models to predict the excited state properties of photoactive iridium complexes. We use experimental data of 1,380 iridium complexes to train and evaluate the ML models and identify the best-performing and most transferable models to be those trained on electronic structure features from low-cost density functional theory tight binding calculations. Using these models, we predict the three excited state properties considered, mean emission energy of phosphorescence, excited state lifetime, and emission spectral integral, with accuracy competitive with or superseding TDDFT. We conduct feature importance analysis to identify which iridium complex attributes govern excited state properties and we validate these trends with explicit examples. As a demonstration of how our ML models can be used for HTVS and the acceleration of chemical discovery, we curate a set of novel hypothetical iridium complexes and identify promising ligands for the design of new phosphors.
    Value Summation: A Novel Scoring Function for MPC-based Model-based Reinforcement Learning. (arXiv:2209.08169v1 [cs.LG])
    This paper proposes a novel scoring function for the planning module of MPC-based model-based reinforcement learning methods to address the inherent bias of using the reward function to score trajectories. The proposed method enhances the learning efficiency of existing MPC-based MBRL methods using the discounted sum of values. The method utilizes optimal trajectories to guide policy learning and updates its state-action value function based on real-world and augmented on-board data. The learning efficiency of the proposed method is evaluated in selected MuJoCo Gym environments as well as in learning locomotion skills for a simulated model of the Cassie robot. The results demonstrate that the proposed method outperforms the current state-of-the-art algorithms in terms of learning efficiency and average reward return.
    On the Horizon: Interactive and Compositional Deepfakes. (arXiv:2209.01714v2 [cs.AI] UPDATED)
    Over a five-year period, computing methods for generating high-fidelity, fictional depictions of people and events moved from exotic demonstrations by computer science research teams into ongoing use as a tool of disinformation. The methods, referred to with the portmanteau of "deepfakes," have been used to create compelling audiovisual content. Here, I share challenges ahead with malevolent uses of two classes of deepfakes that we can expect to come into practice with costly implications for society: interactive and compositional deepfakes. Interactive deepfakes have the capability to impersonate people with realistic interactive behaviors, taking advantage of advances in multimodal interaction. Compositional deepfakes leverage synthetic content in larger disinformation plans that integrate sets of deepfakes over time with observed, expected, and engineered world events to create persuasive synthetic histories. Synthetic histories can be constructed manually but may one day be guided by adversarial generative explanation (AGE) techniques. In the absence of mitigations, interactive and compositional deepfakes threaten to move us closer to a post-epistemic world, where fact cannot be distinguished from fiction. I shall describe interactive and compositional deepfakes and reflect about cautions and potential mitigations to defend against them.
    Graph Neural Networks with Precomputed Node Features. (arXiv:2206.00637v2 [cs.LG] UPDATED)
    Most Graph Neural Networks (GNNs) cannot distinguish some graphs or indeed some pairs of nodes within a graph. This makes it impossible to solve certain classification tasks. However, adding additional node features to these models can resolve this problem. We introduce several such augmentations, including (i) positional node embeddings, (ii) canonical node IDs, and (iii) random features. These extensions are motivated by theoretical results and corroborated by extensive testing on synthetic subgraph detection tasks. We find that positional embeddings significantly outperform other extensions in these tasks. Moreover, positional embeddings have better sample efficiency, perform well on different graph distributions and even outperform learning with ground truth node positions. Finally, we show that the different augmentations perform competitively on established GNN benchmarks, and advise on when to use them.
    Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk. (arXiv:2206.04436v2 [cs.LG] UPDATED)
    Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optimization problem by keeping its CVaR under a given threshold. Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances on a series of continuous control tasks in MuJoCo.
    Federated Learning for THz Channel Estimation. (arXiv:2207.06017v2 [eess.SP] UPDATED)
    This paper addresses two major challenges in terahertz (THz) channel estimation: the beam-split phenomenon, i.e., beam misalignment because of frequency-independent analog beamformers, and computational complexity because of the usage of ultra-massive number of antennas to compensate propagation losses. Data-driven techniques are known to mitigate the complexity of this problem but usually require the transmission of the datasets from the users to a central server entailing huge communication overhead. In this work, we employ federated learning (FL), wherein the users transmit only the model parameters instead of the whole dataset, for THz channel estimation to improve the communications-efficiency. In order to accurately estimate the channel despite beam-split, we propose a beamspace support alignment (BSA) technique. By exploiting the sparsity of the THz channel, the proposed approach is implemented with fewer pilot signals than the traditional techniques. Compared to the previous works, our FL-BSA approach provides higher channel estimation accuracy as well as approximately 68 (32) times lower model (channel) training overhead, respectively.
    Upper Limb Movement Recognition utilising EEG and EMG Signals for Rehabilitative Robotics. (arXiv:2207.08650v3 [cs.LG] UPDATED)
    Upper limb movement classification, which maps input signals to the target activities, is a key building block in the control of rehabilitative robotics. Classifiers are trained for the rehabilitative system to comprehend the desires of the patient whose upper limbs do not function properly. Electromyography (EMG) signals and Electroencephalography (EEG) signals are used widely for upper limb movement classification. By analysing the classification results of the real-time EEG and EMG signals, the system can understand the intention of the user and predict the events that one would like to carry out. Accordingly, it will provide external help to the user. However, the noise in the real-time EEG and EMG data collection process contaminates the effectiveness of the data, which undermines classification performance. Moreover, not all patients process strong EMG signals due to muscle damage and neuromuscular disorder. To address these issues, this paper explores different feature extraction techniques and machine learning and deep learning models for EEG and EMG signals classification and proposes a novel decision-level multisensor fusion technique to integrate EEG signals with EMG signals. This system retrieves effective information from both sources to understand and predict the desire of the user, and thus aid. By testing out the proposed technique on a publicly available WAY-EEG-GAL dataset, which contains EEG and EMG signals that were recorded simultaneously, we manage to conclude the feasibility and effectiveness of the novel system.
    Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets. (arXiv:2207.09638v2 [cs.CL] UPDATED)
    Over-parameterized models, typically pretrained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.
    Understanding and Extending Subgraph GNNs by Rethinking Their Symmetries. (arXiv:2206.11140v2 [cs.LG] UPDATED)
    Subgraph GNNs are a recent class of expressive Graph Neural Networks (GNNs) which model graphs as collections of subgraphs. So far, the design space of possible Subgraph GNN architectures as well as their basic theoretical properties are still largely unexplored. In this paper, we study the most prominent form of subgraph methods, which employs node-based subgraph selection policies such as ego-networks or node marking and deletion. We address two central questions: (1) What is the upper-bound of the expressive power of these methods? and (2) What is the family of equivariant message passing layers on these sets of subgraphs?. Our first step in answering these questions is a novel symmetry analysis which shows that modelling the symmetries of node-based subgraph collections requires a significantly smaller symmetry group than the one adopted in previous works. This analysis is then used to establish a link between Subgraph GNNs and Invariant Graph Networks (IGNs). We answer the questions above by first bounding the expressive power of subgraph methods by 3-WL, and then proposing a general family of message-passing layers for subgraph methods that generalises all previous node-based Subgraph GNNs. Finally, we design a novel Subgraph GNN dubbed SUN, which theoretically unifies previous architectures while providing better empirical performance on multiple benchmarks.
    Adapting Triplet Importance of Implicit Feedback for Personalized Recommendation. (arXiv:2208.01709v3 [cs.IR] UPDATED)
    Implicit feedback is frequently used for developing personalized recommendation services due to its ubiquity and accessibility in real-world systems. In order to effectively utilize such information, most research adopts the pairwise ranking method on constructed training triplets (user, positive item, negative item) and aims to distinguish between positive items and negative items for each user. However, most of these methods treat all the training triplets equally, which ignores the subtle difference between different positive or negative items. On the other hand, even though some other works make use of the auxiliary information (e.g., dwell time) of user behaviors to capture this subtle difference, such auxiliary information is hard to obtain. To mitigate the aforementioned problems, we propose a novel training framework named Triplet Importance Learning (TIL), which adaptively learns the importance score of training triplets. We devise two strategies for the importance score generation and formulate the whole procedure as a bilevel optimization, which does not require any rule-based design. We integrate the proposed training procedure with several Matrix Factorization (MF)- and Graph Neural Network (GNN)-based recommendation models, demonstrating the compatibility of our framework. Via a comparison using three real-world datasets with many state-of-the-art methods, we show that our proposed method outperforms the best existing models by 3-21\% in terms of Recall@k for the top-k recommendation.
    LATTE: LAnguage Trajectory TransformEr. (arXiv:2208.02918v3 [cs.RO] UPDATED)
    Natural language is one of the most intuitive ways to express human intent. However, translating instructions and commands towards robotic motion generation and deployment in the real world is far from being an easy task. The challenge of combining a robot's inherent low-level geometric and kinodynamic constraints with a human's high-level semantic instructions traditionally is solved using task-specific solutions with little generalizability between hardware platforms, often with the use of static sets of target actions and commands. This work instead proposes a flexible language-based framework that allows a user to modify generic robotic trajectories. Our method leverages pre-trained language models (BERT and CLIP) to encode the user's intent and target objects directly from a free-form text input and scene images, fuses geometrical features generated by a transformer encoder network, and finally outputs trajectories using a transformer decoder, without the need of priors related to the task or robot information. We significantly extend our own previous work presented in Bucker et al. by expanding the trajectory parametrization space to 3D and velocity as opposed to just XY movements. In addition, we now train the model to use actual images of the objects in the scene for context (as opposed to textual descriptions), and we evaluate the system in a diverse set of scenarios beyond manipulation, such as aerial and legged robots. Our simulated and real-life experiments demonstrate that our transformer model can successfully follow human intent, modifying the shape and speed of trajectories within multiple environments. Codebase available at: https://github.com/arthurfenderbucker/LaTTe-Language-Trajectory-TransformEr.git
    Few-Shot Non-Parametric Learning with Deep Latent Variable Model. (arXiv:2206.11573v2 [cs.LG] UPDATED)
    Most real-world problems that machine learning algorithms are expected to solve face the situation with 1) unknown data distribution; 2) little domain-specific knowledge; and 3) datasets with limited annotation. We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV), a learning framework for any dataset with abundant unlabeled data but very few labeled ones. By only training a generative model in an unsupervised way, the framework utilizes the data distribution to build a compressor. Using a compressor-based distance metric derived from Kolmogorov complexity, together with few labeled data, NPC-LV classifies without further training. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime and even outperform semi-supervised learning methods on CIFAR-10. We demonstrate how and when negative evidence lowerbound (nELBO) can be used as an approximate compressed length for classification. By revealing the correlation between compression rate and classification accuracy, we illustrate that under NPC-LV, the improvement of generative models can enhance downstream classification accuracy.
    Block-Recurrent Transformers. (arXiv:2203.07852v2 [cs.LG] UPDATED)
    We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence, and has linear complexity with respect to sequence length. Our recurrent cell operates on blocks of tokens rather than single tokens during training, and leverages parallel computation within a block in order to make efficient use of accelerator hardware. The cell itself is strikingly simple. It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens. Our design was inspired in part by LSTM cells, and it uses LSTM-style gates, but it scales the typical LSTM cell up by several orders of magnitude. Our implementation of recurrence has the same cost in both computation time and parameter count as a conventional transformer layer, but offers dramatically improved perplexity in language modeling tasks over very long sequences. Our model out-performs a long-range Transformer XL baseline by a wide margin, while running twice as fast. We demonstrate its effectiveness on PG19 (books), arXiv papers, and GitHub source code. Our code has been released as open source.
    Intention Aware Robot Crowd Navigation with Attention-Based Interaction Graph. (arXiv:2203.01821v2 [cs.RO] UPDATED)
    We study the problem of safe and intention-aware robot navigation in dense and interactive crowds. Most previous reinforcement learning (RL) based methods fail to consider different types of interactions among all agents or ignore the intentions of people, which results in performance degradation. In this paper, we propose a novel recurrent graph neural network with attention mechanisms to capture heterogeneous interactions among agents through space and time. To encourage longsighted robot behaviors, we infer the intentions of dynamic agents by predicting their future trajectories for several timesteps. The predictions are incorporated into a model-free RL framework to prevent the robot from intruding into the intended paths of other agents. We demonstrate that our method enables the robot to achieve good navigation performance and non-invasiveness in challenging crowd navigation scenarios. We successfully transfer the policy learned in simulation to a real-world TurtleBot 2i.
    PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding. (arXiv:2206.02096v2 [cs.LG] UPDATED)
    We are now witnessing significant progress of deep learning methods in a variety of tasks (or datasets) of proteins. However, there is a lack of a standard benchmark to evaluate the performance of different methods, which hinders the progress of deep learning in this field. In this paper, we propose such a benchmark called PEER, a comprehensive and multi-task benchmark for Protein sEquence undERstanding. PEER provides a set of diverse protein understanding tasks including protein function prediction, protein localization prediction, protein structure prediction, protein-protein interaction prediction, and protein-ligand interaction prediction. We evaluate different types of sequence-based methods for each task including traditional feature engineering approaches, different sequence encoding methods as well as large-scale pre-trained protein language models. In addition, we also investigate the performance of these methods under the multi-task learning setting. Experimental results show that large-scale pre-trained protein language models achieve the best performance for most individual tasks, and jointly training multiple tasks further boosts the performance. The datasets and source codes of this benchmark are all available at https://github.com/DeepGraphLearning/PEER_Benchmark
    Improving Downstream Task Performance by Treating Numbers as Entities. (arXiv:2205.03559v2 [cs.CL] UPDATED)
    Numbers are essential components of text, like any other word tokens, from which natural language processing (NLP) models are built and deployed. Though numbers are typically not accounted for distinctly in most NLP tasks, there is still an underlying amount of numeracy already exhibited by NLP models. In this work, we attempt to tap this potential of state-of-the-art NLP models and transfer their ability to boost performance in related tasks. Our proposed classification of numbers into entities helps NLP models perform well on several tasks, including a handcrafted Fill-In-The-Blank (FITB) task and on question answering using joint embeddings, outperforming the BERT and RoBERTa baseline classification.
    Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search. (arXiv:2209.08687v1 [cs.IR])
    High-quality medical systematic reviews require comprehensive literature searches to ensure the recommendations and outcomes are sufficiently reliable. Indeed, searching for relevant medical literature is a key phase in constructing systematic reviews and often involves domain (medical researchers) and search (information specialists) experts in developing the search queries. Queries in this context are highly complex, based on Boolean logic, include free-text terms and index terms from standardised terminologies (e.g., the Medical Subject Headings (MeSH) thesaurus), and are difficult and time-consuming to build. The use of MeSH terms, in particular, has been shown to improve the quality of the search results. However, identifying the correct MeSH terms to include in a query is difficult: information experts are often unfamiliar with the MeSH database and unsure about the appropriateness of MeSH terms for a query. Naturally, the full value of the MeSH terminology is often not fully exploited. This article investigates methods to suggest MeSH terms based on an initial Boolean query that includes only free-text terms. In this context, we devise lexical and pre-trained language models based methods. These methods promise to automatically identify highly effective MeSH terms for inclusion in a systematic review query. Our study contributes an empirical evaluation of several MeSH term suggestion methods. We further contribute an extensive analysis of MeSH term suggestions for each method and how these suggestions impact the effectiveness of Boolean queries.
    HiSTGNN: Hierarchical Spatio-temporal Graph Neural Networks for Weather Forecasting. (arXiv:2201.09101v2 [cs.LG] UPDATED)
    Weather Forecasting is an attractive challengeable task due to its influence on human life and complexity in atmospheric motion. Supported by massive historical observed time series data, the task is suitable for data-driven approaches, especially deep neural networks. Recently, the Graph Neural Networks (GNNs) based methods have achieved excellent performance for spatio-temporal forecasting. However, the canonical GNNs-based methods only individually model the local graph of meteorological variables per station or the global graph of whole stations, lacking information interaction between meteorological variables in different stations. In this paper, we propose a novel Hierarchical Spatio-Temporal Graph Neural Network (HiSTGNN) to model cross-regional spatio-temporal correlations among meteorological variables in multiple stations. An adaptive graph learning layer and spatial graph convolution are employed to construct self-learning graph and study hidden dependency among nodes of variable-level and station-level graph. For capturing temporal pattern, the dilated inception as the backbone of gate temporal convolution is designed to model long and various meteorological trends. Moreover, a dynamic interaction learning is proposed to build bidirectional information passing in hierarchical graph. Experimental results on three real-world meteorological datasets demonstrate the superior performance of HiSTGNN beyond 7 baselines and it reduces the errors by 4.2% to 11.6% especially compared to state-of-the-art weather forecasting method.
    Relational Reasoning Network (RRN) for Anatomical Landmarking. (arXiv:1904.04354v2 [cs.LG] UPDATED)
    Purpose: We perform anatomical landmarking for craniomaxillofacial (CMF) bones without explicitly segmenting them. Towards this, we propose a new simple yet efficient deep network architecture, called \textit{relational reasoning network (RRN)}, to accurately learn the local and the global relations among the landmarks in CMF bones; specifically, mandible, maxilla, and nasal bones. Approach: The proposed RRN works in an end-to-end manner, utilizing learned relations of the landmarks based on dense-block units. For a given few landmarks as input, RRN treats the landmarking process similar to a data imputation problem where predicted landmarks are considered missing. Results: We applied RRN to cone beam computed tomography scans obtained from 250 patients. With a 4-fold cross validation technique, we obtained an average root mean squared error of less than 2 mm per landmark. Our proposed RRN has revealed unique relationships among the landmarks that help us in inferring several \textit{reasoning} about informativeness of the landmark points. The proposed system identifies the missing landmark locations accurately even when severe pathology or deformation are present in the bones. Conclusions: Accurately identifying anatomical landmarks is a crucial step in deformation analysis and surgical planning for CMF surgeries. Achieving this goal without the need for explicit bone segmentation addresses a major limitation of segmentation based approaches, where segmentation failure (as often the case in bones with severe pathology or deformation) could easily lead to incorrect landmarking. To the best of our knowledge, this is the first of its kind algorithm finding anatomical relations of the objects using deep learning.
    Opinions Vary? Diagnosis First!. (arXiv:2202.06505v3 [eess.IV] UPDATED)
    With the advancement of deep learning techniques, an increasing number of methods have been proposed for optic disc and cup (OD/OC) segmentation from the fundus images. Clinically, OD/OC segmentation is often annotated by multiple clinical experts to mitigate the personal bias. However, it is hard to train the automated deep learning models on multiple labels. A common practice to tackle the issue is majority vote, e.g., taking the average of multiple labels. However such a strategy ignores the different expertness of medical experts. Motivated by the observation that OD/OC segmentation is often used for the glaucoma diagnosis clinically, in this paper, we propose a novel strategy to fuse the multi-rater OD/OC segmentation labels via the glaucoma diagnosis performance. Specifically, we assess the expertness of each rater through an attentive glaucoma diagnosis network. For each rater, its contribution for the diagnosis will be reflected as an expertness map. To ensure the expertness maps are general for different glaucoma diagnosis models, we further propose an Expertness Generator (ExpG) to eliminate the high-frequency components in the optimization process. Based on the obtained expertness maps, the multi-rater labels can be fused as a single ground-truth which we dubbed as Diagnosis First Ground-truth (DiagFirstGT). Experimental results show that by using DiagFirstGT as ground-truth, OD/OC segmentation networks will predict the masks with superior glaucoma diagnosis performance.
    Graph Unlearning. (arXiv:2103.14991v2 [cs.LG] UPDATED)
    Machine unlearning is a process of removing the impact of some training data from the machine learning (ML) models upon receiving removal requests. While straightforward and legitimate, retraining the ML model from scratch incurs a high computational overhead. To address this issue, a number of approximate algorithms have been proposed in the domain of image and text data, among which SISA is the state-of-the-art solution. It randomly partitions the training set into multiple shards and trains a constituent model for each shard. However, directly applying SISA to the graph data can severely damage the graph structural information, and thereby the resulting ML model utility. In this paper, we propose GraphEraser, a novel machine unlearning framework tailored to graph data. Its contributions include two novel graph partition algorithms and a learning-based aggregation method. We conduct extensive experiments on five real-world graph datasets to illustrate the unlearning efficiency and model utility of GraphEraser. It achieves 2.06$\times$ (small dataset) to 35.94$\times$ (large dataset) unlearning time improvement. On the other hand, GraphEraser achieves up to $62.5\%$ higher F1 score and our proposed learning-based aggregation method achieves up to $112\%$ higher F1 score.\footnote{Our code is available at \url{https://github.com/MinChen00/Graph-Unlearning}.}
    Explain and Conquer: Personalised Text-based Reviews to Achieve Transparency. (arXiv:2205.01759v2 [cs.LG] UPDATED)
    There are many contexts in which dyadic data are present. Social networks are a well-known example. In these contexts, pairs of elements are linked building a network that reflects interactions. Explaining why these relationships are established is essential to obtain transparency, an increasingly important notion. These explanations are often presented using text, thanks to the spread of the natural language understanding tasks. Our aim is to represent and explain pairs established by any agent (e.g., a recommender system or a paid promotion mechanism), so that text-based personalisation is taken into account. We have focused on the TripAdvisor platform, considering the applicability to other dyadic data contexts. The items are a subset of users and restaurants and the interactions the reviews posted by these users. We propose the PTER (Personalised TExt-based Reviews) model. We predict, from the available reviews for a given restaurant, those that fit to the specific user interactions. PTER leverages the BERT (Bidirectional Encoders Representations from Transformers) transformer-encoder model. We customised a deep neural network following the feature-based approach, presenting a LTR (Learning To Rank) downstream task. We carried out several comparisons of our proposal with a random baseline and other models of the state of the art, following the EXTRA (EXplanaTion RAnking) benchmark. Our method outperforms other collaborative filtering proposals.
    Class-Incremental Continual Learning into the eXtended DER-verse. (arXiv:2201.00766v2 [cs.LG] UPDATED)
    The staple of human intelligence is the capability of acquiring knowledge in a continuous fashion. In stark contrast, Deep Networks forget catastrophically and, for this reason, the sub-field of Class-Incremental Continual Learning fosters methods that learn a sequence of tasks incrementally, blending sequentially-gained knowledge into a comprehensive prediction. This work aims at assessing and overcoming the pitfalls of our previous proposal Dark Experience Replay (DER), a simple and effective approach that combines rehearsal and Knowledge Distillation. Inspired by the way our minds constantly rewrite past recollections and set expectations for the future, we endow our model with the abilities to i) revise its replay memory to welcome novel information regarding past data ii) pave the way for learning yet unseen classes. We show that the application of these strategies leads to remarkable improvements; indeed, the resulting method - termed eXtended-DER (X-DER) - outperforms the state of the art on both standard benchmarks (such as CIFAR-100 and miniImagenet) and a novel one here introduced. To gain a better understanding, we further provide extensive ablation studies that corroborate and extend the findings of our previous research (e.g. the value of Knowledge Distillation and flatter minima in continual learning setups).
    HiPart: Hierarchical Divisive Clustering Toolbox. (arXiv:2209.08680v1 [stat.ML])
    This paper presents the HiPart package, an open-source native python library that provides efficient and interpret-able implementations of divisive hierarchical clustering algorithms. HiPart supports interactive visualizations for the manipulation of the execution steps allowing the direct intervention of the clustering outcome. This package is highly suited for Big Data applications as the focus has been given to the computational efficiency of the implemented clustering methodologies. The dependencies used are either Python build-in packages or highly maintained stable external packages. The software is provided under the MIT license. The package's source code and documentation can be found at https://github.com/panagiotisanagnostou/HiPart.
    Model-based gym environments for limit order book trading. (arXiv:2209.07823v1 [q-fin.TR] CROSS LISTED)
    Within the mathematical finance literature there is a rich catalogue of mathematical models for studying algorithmic trading problems -- such as market-making and optimal execution -- in limit order books. This paper introduces \mbtgym, a Python module that provides a suite of gym environments for training reinforcement learning (RL) agents to solve such model-based trading problems. The module is set up in an extensible way to allow the combination of different aspects of different models. It supports highly efficient implementations of vectorized environments to allow faster training of RL agents. In this paper, we motivate the challenge of using RL to solve such model-based limit order book problems in mathematical finance, we explain the design of our gym environment, and then demonstrate its use in solving standard and non-standard problems from the literature. Finally, we lay out a roadmap for further development of our module, which we provide as an open source repository on GitHub so that it can serve as a focal point for RL research in model-based algorithmic trading.
    ADBench: Anomaly Detection Benchmark. (arXiv:2206.09426v2 [cs.LG] UPDATED)
    Given a long list of anomaly detection algorithms developed in the last few decades, how do they perform with regard to (i) varying levels of supervision, (ii) different types of anomalies, and (iii) noisy and corrupted data? In this work, we answer these key questions by conducting (to our best knowledge) the most comprehensive anomaly detection benchmark with 30 algorithms on 57 benchmark datasets, named ADBench. Our extensive experiments (98,436 in total) identify meaningful insights into the role of supervision and anomaly types, and unlock future directions for researchers in algorithm selection and design. With ADBench, researchers can easily conduct comprehensive and fair evaluations for newly proposed methods on the datasets (including our contributed ones from natural language and computer vision domains) against the existing baselines. To foster accessibility and reproducibility, we fully open-source ADBench and the corresponding results.
    Parameter-free Mirror Descent. (arXiv:2203.00444v3 [cs.LG] UPDATED)
    We develop a modified online mirror descent framework that is suitable for building adaptive and parameter-free algorithms in unbounded domains. We leverage this technique to develop the first unconstrained online linear optimization algorithm achieving an optimal dynamic regret bound, and we further demonstrate that natural strategies based on Follow-the-Regularized-Leader are unable to achieve similar results. We also apply our mirror descent framework to build new parameter-free implicit updates, as well as a simplified and improved unconstrained scale-free algorithm.
    Noise transfer for unsupervised domain adaptation of retinal OCT images. (arXiv:2209.08097v1 [cs.CV])
    Optical coherence tomography (OCT) imaging from different camera devices causes challenging domain shifts and can cause a severe drop in accuracy for machine learning models. In this work, we introduce a minimal noise adaptation method based on a singular value decomposition (SVDNA) to overcome the domain gap between target domains from three different device manufacturers in retinal OCT imaging. Our method utilizes the difference in noise structure to successfully bridge the domain gap between different OCT devices and transfer the style from unlabeled target domain images to source images for which manual annotations are available. We demonstrate how this method, despite its simplicity, compares or even outperforms state-of-the-art unsupervised domain adaptation methods for semantic segmentation on a public OCT dataset. SVDNA can be integrated with just a few lines of code into the augmentation pipeline of any network which is in contrast to many state-of-the-art domain adaptation methods which often need to change the underlying model architecture or train a separate style transfer model. The full code implementation for SVDNA is available at https://github.com/ValentinKoch/SVDNA.
    Bayesian Importance of Features (BIF). (arXiv:2010.13872v2 [stat.ML] UPDATED)
    We introduce a simple and intuitive framework that provides quantitative explanations of statistical models through the probabilistic assessment of input feature importance. The core idea comes from utilizing the Dirichlet distribution to define the importance of input features and learning it via approximate Bayesian inference. The learned importance has probabilistic interpretation and provides the relative significance of each input feature to a model's output, additionally assessing confidence about its importance quantification. As a consequence of using the Dirichlet distribution over the explanations, we can define a closed-form divergence to gauge the similarity between learned importance under different models. We use this divergence to study the feature importance explainability tradeoffs with essential notions in modern machine learning, such as privacy and fairness. Furthermore, BIF can work on two levels: global explanation (feature importance across all data instances) and local explanation (individual feature importance for each data instance). We show the effectiveness of our method on a variety of synthetic and real datasets, taking into account both tabular and image datasets. The code is available at https://github.com/kamadforge/featimp_dp.
    Non-invasive Localization of the Ventricular Excitation Origin Without Patient-specific Geometries Using Deep Learning. (arXiv:2209.08095v1 [eess.IV])
    Ventricular tachycardia (VT) can be one cause of sudden cardiac death affecting 4.25 million persons per year worldwide. A curative treatment is catheter ablation in order to inactivate the abnormally triggering regions. To facilitate and expedite the localization during the ablation procedure, we present two novel localization techniques based on convolutional neural networks (CNNs). In contrast to existing methods, e.g. using ECG imaging, our approaches were designed to be independent of the patient-specific geometries and directly applicable to surface ECG signals, while also delivering a binary transmural position. One method outputs ranked alternative solutions. Results can be visualized either on a generic or patient geometry. The CNNs were trained on a data set containing only simulated data and evaluated both on simulated and clinical test data. On simulated data, the median test error was below 3mm. The median localization error on the clinical data was as low as 32mm. The transmural position was correctly detected in up to 82% of all clinical cases. Using the ranked alternative solutions, the top-3 median error dropped to 20mm on clinical data. These results demonstrate a proof of principle to utilize CNNs to localize the activation source without the intrinsic need of patient-specific geometrical information. Furthermore, delivering multiple solutions can help the physician to find the real activation source amongst more than one possible locations. With further optimization, these methods have a high potential to speed up clinical interventions. Consequently they could decrease procedural risk and improve VT patients' outcomes.
    PyTorch Geometric Signed Directed: A Software Package on Graph Neural Networks for Signed and Directed Graphs. (arXiv:2202.10793v3 [cs.LG] UPDATED)
    Networks are ubiquitous in many real-world applications (e.g., social networks encoding trust/distrust relationships, correlation networks arising from time series data). While many networks are signed or directed, or both, there is a lack of unified software packages on graph neural networks (GNNs) specially designed for signed and directed networks. In this paper, we present PyTorch Geometric Signed Directed, a software package which fills this gap. Along the way, we also provide a brief review surveying typical tasks, loss functions and evaluation metrics in the analysis of signed and directed networks, discuss data used in related experiments, provide an overview of methods proposed, and evaluate the implemented methods with experiments. The deep learning framework consists of easy-to-use GNN models, synthetic and real-world data, as well as task-specific evaluation metrics and loss functions for signed and directed networks. As an extension library for PyTorch Geometric, our proposed software is maintained with open-source releases, detailed documentation, continuous integration, unit tests and code coverage checks. Our code is publicly available at \url{https://github.com/SherylHYX/pytorch_geometric_signed_directed}.
    Accelerated Training of Physics Informed Neural Networks (PINNs) using Meshless Discretizations. (arXiv:2205.09332v2 [cs.LG] UPDATED)
    We present a new technique for the accelerated training of physics-informed neural networks (PINNs): discretely-trained PINNs (DT-PINNs). The repeated computation of partial derivative terms in the PINN loss functions via automatic differentiation during training is known to be computationally expensive, especially for higher-order derivatives. DT-PINNs are trained by replacing these exact spatial derivatives with high-order accurate numerical discretizations computed using meshless radial basis function-finite differences (RBF-FD) and applied via sparse-matrix vector multiplication. The use of RBF-FD allows for DT-PINNs to be trained even on point cloud samples placed on irregular domain geometries. Additionally, though traditional PINNs (vanilla-PINNs) are typically stored and trained in 32-bit floating-point (fp32) on the GPU, we show that for DT-PINNs, using fp64 on the GPU leads to significantly faster training times than fp32 vanilla-PINNs with comparable accuracy. We demonstrate the efficiency and accuracy of DT-PINNs via a series of experiments. First, we explore the effect of network depth on both numerical and automatic differentiation of a neural network with random weights and show that RBF-FD approximations of third-order accuracy and above are more efficient while being sufficiently accurate. We then compare the DT-PINNs to vanilla-PINNs on both linear and nonlinear Poisson equations and show that DT-PINNs achieve similar losses with 2-4x faster training times on a consumer GPU. Finally, we also demonstrate that similar results can be obtained for the PINN solution to the heat equation (a space-time problem) by discretizing the spatial derivatives using RBF-FD and using automatic differentiation for the temporal derivative. Our results show that fp64 DT-PINNs offer a superior cost-accuracy profile to fp32 vanilla-PINNs.
    Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent. (arXiv:2205.07331v3 [math.NA] UPDATED)
    In this paper, we study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations using a general class of objective functions. Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic partial differential equations (PDEs) as special cases. We consider a potentially infinite-dimensional parameterization of our model using a suitable Reproducing Kernel Hilbert Space and a continuous parameterization of problem hardness through the definition of kernel integral operators. We prove that gradient descent over this objective function can also achieve statistical optimality and the optimal number of passes over the data increases with sample size. Based on our theory, we explain an implicit acceleration of using a Sobolev norm as the objective function for training, inferring that the optimal number of epochs of DRM becomes larger than the number of PINN when both the data size and the hardness of tasks increase, although both DRM and PINN can achieve statistical optimality.
    Optimizing Industrial HVAC Systems with Hierarchical Reinforcement Learning. (arXiv:2209.08112v1 [cs.LG])
    Reinforcement learning (RL) techniques have been developed to optimize industrial cooling systems, offering substantial energy savings compared to traditional heuristic policies. A major challenge in industrial control involves learning behaviors that are feasible in the real world due to machinery constraints. For example, certain actions can only be executed every few hours while other actions can be taken more frequently. Without extensive reward engineering and experimentation, an RL agent may not learn realistic operation of machinery. To address this, we use hierarchical reinforcement learning with multiple agents that control subsets of actions according to their operation time scales. Our hierarchical approach achieves energy savings over existing baselines while maintaining constraints such as operating chillers within safe bounds in a simulated HVAC control environment.
    Iterated Block Particle Filter for High-dimensional Parameter Learning: Beating the Curse of Dimensionality. (arXiv:2110.10745v2 [stat.ML] UPDATED)
    Parameter learning for high-dimensional, partially observed, and nonlinear stochastic processes is a methodological challenge. Spatiotemporal disease transmission systems provide examples of such processes giving rise to open inference problems. We propose the iterated block particle filter (IBPF) algorithm for learning high-dimensional parameters over graphical state space models with general state spaces, measures, transition densities and graph structure. Theoretical performance guarantees are obtained on beating the curse of dimensionality (COD), algorithm convergence, and likelihood maximization. Experiments on a highly nonlinear and non-Gaussian spatiotemporal model for measles transmission reveal that the iterated ensemble Kalman filter algorithm (Li et al. (2020)) is ineffective and the iterated filtering algorithm (Ionides et al. (2015)) suffers from the COD, while our IBPF algorithm beats COD consistently across various experiments with different metrics.
    Anomaly Detection in Automatic Generation Control Systems Based on Traffic Pattern Analysis and Deep Transfer Learning. (arXiv:2209.08099v1 [cs.LG])
    In modern highly interconnected power grids, automatic generation control (AGC) is crucial in maintaining the stability of the power grid. The dependence of the AGC system on the information and communications technology (ICT) system makes it vulnerable to various types of cyber-attacks. Thus, information flow (IF) analysis and anomaly detection became paramount for preventing cyber attackers from driving the cyber-physical power system (CPPS) to instability. In this paper, the ICT network traffic rules in CPPSs are explored and the frequency domain features of the ICT network traffic are extracted, basically for developing a robust learning algorithm that can learn the normal traffic pattern based on the ResNeSt convolutional neural network (CNN). Furthermore, to overcome the problem of insufficient abnormal traffic labeled samples, transfer learning approach is used. In the proposed data-driven-based method the deep learning model is trained by traffic frequency features, which makes our model robust against AGC's parameters uncertainties and modeling nonlinearities.
    A Machine Learning Framework for Event Identification via Modal Analysis of PMU Data. (arXiv:2202.06836v2 [eess.SY] UPDATED)
    Power systems are prone to a variety of events (e.g. line trips and generation loss) and real-time identification of such events is crucial in terms of situational awareness, reliability, and security. Using measurements from multiple synchrophasors, i.e., phasor measurement units (PMUs), we propose to identify events by extracting features based on modal dynamics. We combine such traditional physics-based feature extraction methods with machine learning to distinguish different event types. Including all measurement channels at each PMU allows exploiting diverse features but also requires learning classification models over a high-dimensional space. To address this issue, various feature selection methods are implemented to choose the best subset of features. Using the obtained subset of features, we investigate the performance of two well-known classification models, namely, logistic regression (LR) and support vector machines (SVM) to identify generation loss and line trip events in two datasets. The first dataset is obtained from simulated generation loss and line trip events in the Texas 2000-bus synthetic grid. The second is a proprietary dataset with labeled events obtained from a large utility in the USA involving measurements from nearly 500 PMUs. Our results indicate that the proposed framework is promising for identifying the two types of events.
    CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving. (arXiv:2203.07724v3 [cs.CV] UPDATED)
    Contemporary deep-learning object detection methods for autonomous driving usually assume prefixed categories of common traffic participants, such as pedestrians and cars. Most existing detectors are unable to detect uncommon objects and corner cases (e.g., a dog crossing a street), which may lead to severe accidents in some situations, making the timeline for the real-world application of reliable autonomous driving uncertain. One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases. Hence, we introduce a challenging dataset named CODA that exposes this critical problem of vision-based detectors. The dataset consists of 1500 carefully selected real-world driving scenes, each containing four object-level corner cases (on average), spanning more than 30 object categories. On CODA, the performance of standard object detectors trained on large-scale autonomous driving datasets significantly drops to no more than 12.8% in mAR. Moreover, we experiment with the state-of-the-art open-world object detector and find that it also fails to reliably identify the novel objects in CODA, suggesting that a robust perception system for autonomous driving is probably still far from reach. We expect our CODA dataset to facilitate further research in reliable detection for real-world autonomous driving. Our dataset will be released at https://coda-dataset.github.io.
    DiPietro-Hazari Kappa: A Novel Metric for Assessing Labeling Quality via Annotation. (arXiv:2209.08243v1 [cs.LG])
    Data is a key component of modern machine learning, but statistics for assessing data label quality remain sparse in literature. Here, we introduce DiPietro-Hazari Kappa, a novel statistical metric for assessing the quality of suggested dataset labels in the context of human annotation. Rooted in the classical Fleiss's Kappa measure of inter-annotator agreement, the DiPietro-Hazari Kappa quantifies the the empirical annotator agreement differential that was attained above random chance. We offer a thorough theoretical examination of Fleiss's Kappa before turning to our derivation of DiPietro-Hazari Kappa. Finally, we conclude with a matrix formulation and set of procedural instructions for easy computational implementation.
    Directed Weight Neural Networks for Protein Structure Representation Learning. (arXiv:2201.13299v4 [q-bio.BM] UPDATED)
    A protein performs biological functions by folding to a particular 3D structure. To accurately model the protein structures, both the overall geometric topology and local fine-grained relations between amino acids (e.g. side-chain torsion angles and inter-amino-acid orientations) should be carefully considered. In this work, we propose the Directed Weight Neural Network for better capturing geometric relations among different amino acids. Extending a single weight from a scalar to a 3D directed vector, our new framework supports a rich set of geometric operations on both classical and SO(3)--representation features, on top of which we construct a perceptron unit for processing amino-acid information. In addition, we introduce an equivariant message passing paradigm on proteins for plugging the directed weight perceptrons into existing Graph Neural Networks, showing superior versatility in maintaining SO(3)-equivariance at the global scale. Experiments show that our network has remarkably better expressiveness in representing geometric relations in comparison to classical neural networks and the (globally) equivariant networks. It also achieves state-of-the-art performance on various computational biology applications related to protein 3D structures.
    Accurate ADMET Prediction with XGBoost. (arXiv:2204.07532v3 [q-bio.BM] UPDATED)
    The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. In this work, we applied an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. Our model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, our model is ranked first in 18 tasks and top 3 in 21 tasks. The trained machine learning models are integrated in ADMETboost, a web server that is publicly available at https://ai-druglab.smu.edu/admet.
    Fast Vision Transformers with HiLo Attention. (arXiv:2205.13213v2 [cs.CV] UPDATED)
    Vision Transformers (ViTs) have triggered the most recent and significant breakthroughs in computer vision. Their efficient designs are mostly guided by the indirect metric of computational complexity, i.e., FLOPs, which however has a clear gap with the direct metric such as throughput. Thus, we propose to use the direct speed evaluation on the target platform as the design principle for efficient ViTs. Particularly, we introduce LITv2, a simple and effective ViT which performs favourably against the existing state-of-the-art methods across a spectrum of different model sizes with faster speed. At the core of LITv2 is a novel self-attention mechanism, which we dub HiLo. HiLo is inspired by the insight that high frequencies in an image capture local fine details and low frequencies focus on global structures, whereas a multi-head self-attention layer neglects the characteristic of different frequencies. Therefore, we propose to disentangle the high/low frequency patterns in an attention layer by separating the heads into two groups, where one group encodes high frequencies via self-attention within each local window, and another group performs the attention to model the global relationship between the average-pooled low-frequency keys from each window and each query position in the input feature map. Benefiting from the efficient design for both groups, we show that HiLo is superior to the existing attention mechanisms by comprehensively benchmarking FLOPs, speed and memory consumption on GPUs. Powered by HiLo, LITv2 serves as a strong backbone for mainstream vision tasks including image classification, dense detection and segmentation. Code is available at https://github.com/ziplab/LITv2.
    Learn the Time to Learn: Replay Scheduling in Continual Learning. (arXiv:2209.08660v1 [cs.LG])
    Replay methods have shown to be successful in mitigating catastrophic forgetting in continual learning scenarios despite having limited access to historical data. However, storing historical data is cheap in many real-world applications, yet replaying all historical data would be prohibited due to processing time constraints. In such settings, we propose learning the time to learn for a continual learning system, in which we learn replay schedules over which tasks to replay at different time steps. To demonstrate the importance of learning the time to learn, we first use Monte Carlo tree search to find the proper replay schedule and show that it can outperform fixed scheduling policies in terms of continual learning performance. Moreover, to improve the scheduling efficiency itself, we propose to use reinforcement learning to learn the replay scheduling policies that can generalize to new continual learning scenarios without added computational cost. In our experiments, we show the advantages of learning the time to learn, which brings current continual learning research closer to real-world needs.
    Linear TreeShap. (arXiv:2209.08192v1 [cs.LG])
    Decision trees are well-known due to their ease of interpretability. To improve accuracy, we need to grow deep trees or ensembles of trees. These are hard to interpret, offsetting their original benefits. Shapley values have recently become a popular way to explain the predictions of tree-based machine learning models. It provides a linear weighting to features independent of the tree structure. The rise in popularity is mainly due to TreeShap, which solves a general exponential complexity problem in polynomial time. Following extensive adoption in the industry, more efficient algorithms are required. This paper presents a more efficient and straightforward algorithm: Linear TreeShap. Like TreeShap, Linear TreeShap is exact and requires the same amount of memory.
    Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment. (arXiv:2209.08497v1 [cs.LG])
    To train deep learning models faster, distributed training on multiple GPUs is the very popular scheme in recent years. However, the communication bandwidth is still a major bottleneck of training performance. To improve overall training performance, recent works have proposed gradient sparsification methods that reduce the communication traffic significantly. Most of them require gradient sorting to select meaningful gradients such as Top-k gradient sparsification (Top-k SGD). However, Top-k SGD has a limit to increase the speed up overall training performance because gradient sorting is significantly inefficient on GPUs. In this paper, we conduct experiments that show the inefficiency of Top-k SGD and provide the insight of the low performance. Based on observations from our empirical analysis, we plan to yield a high performance gradient sparsification method as a future work.
    Sequencer: Deep LSTM for Image Classification. (arXiv:2205.01972v3 [cs.CV] UPDATED)
    In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly revolutionized various architectural design efforts: ViT achieved state-of-the-art image classification performance using self-attention found in natural language processing, and MLP-Mixer achieved competitive performance using simple multi-layer perceptrons. In contrast, several studies have also suggested that carefully redesigned convolutional neural networks (CNNs) can achieve advanced performance comparable to ViT without resorting to these new ideas. Against this background, there is growing interest in what inductive bias is suitable for computer vision. Here we propose Sequencer, a novel and competitive architecture alternative to ViT that provides a new perspective on these issues. Unlike ViTs, Sequencer models long-range dependencies using LSTMs rather than self-attention layers. We also propose a two-dimensional version of Sequencer module, where an LSTM is decomposed into vertical and horizontal LSTMs to enhance performance. Despite its simplicity, several experiments demonstrate that Sequencer performs impressively well: Sequencer2D-L, with 54M parameters, realizes 84.6% top-1 accuracy on only ImageNet-1K. Not only that, we show that it has good transferability and the robust resolution adaptability on double resolution-band.
    Uncertainty categories in medical image segmentation: a study of source-related diversity. (arXiv:2203.00238v2 [cs.LG] UPDATED)
    Measuring uncertainties in the output of a deep learning method is useful in several ways, such as in assisting with interpretation of the outputs, helping build confidence with end users, and for improving the training and performance of the networks. Several different methods have been proposed to estimate uncertainties, including those from epistemic (relating to the model used) and aleatoric (relating to the data) sources using test-time dropout and augmentation, respectively. Not only are these uncertainty sources different, but they are governed by parameter settings (e.g., dropout rate or type and level of augmentation) that establish even more distinct uncertainty categories. This work investigates how different the uncertainties are from these categories, for magnitude and spatial pattern, to empirically address the question of whether they provide usefully distinct information that should be captured whenever uncertainties are used. We take the well characterised BraTS challenge dataset to demonstrate that there are substantial differences in both magnitude and spatial pattern of uncertainties from the different categories, and discuss the implications of these in various use cases.
    ATD: Augmenting CP Tensor Decomposition by Self Supervision. (arXiv:2106.07900v4 [math.NA] UPDATED)
    Tensor decompositions are powerful tools for dimensionality reduction and feature interpretation of multidimensional data such as signals. Existing tensor decomposition objectives (e.g., Frobenius norm) are designed for fitting raw data under statistical assumptions, which may not align with downstream classification tasks. In practice, raw input tensors can contain irrelevant information while data augmentation techniques may be used to smooth out class-irrelevant noise in samples. This paper addresses the above challenges by proposing augmented tensor decomposition (ATD), which effectively incorporates data augmentations and self-supervised learning (SSL) to boost downstream classification. To address the non-convexity of the new augmented objective, we develop an iterative method that enables the optimization to follow an alternating least squares (ALS) fashion. We evaluate our proposed ATD on multiple datasets. It can achieve 0.8% - 2.5% accuracy gain over tensor-based baselines. Also, our ATD model shows comparable or better performance (e.g., up to 15% in accuracy) over self-supervised and autoencoder baselines while using less than 5% of learnable parameters of these baseline models
    Imbalanced Nodes Classification for Graph Neural Networks Based on Valuable Sample Mining. (arXiv:2209.08514v1 [cs.LG])
    Node classification is an important task in graph neural networks, but most existing studies assume that samples from different classes are balanced. However, the class imbalance problem is widespread and can seriously affect the model's performance. Reducing the adverse effects of imbalanced datasets on model training is crucial to improve the model's performance. Therefore, a new loss function FD-Loss is reconstructed based on the traditional algorithm-level approach to the imbalance problem. Firstly, we propose sample mismeasurement distance to filter edge-hard samples and simple samples based on the distribution. Then, the weight coefficients are defined based on the mismeasurement distance and used in the loss function weighting term, so that the loss function focuses only on valuable samples. Experiments on several benchmarks demonstrate that our loss function can effectively solve the sample node imbalance problem and improve the classification accuracy by 4% compared to existing methods in the node classification task.
    BolT: Fused Window Transformers for fMRI Time Series Analysis. (arXiv:2205.11578v2 [eess.SP] UPDATED)
    Deep-learning models have enabled performance leaps in analysis of high-dimensional functional MRI (fMRI) data. Yet, many previous methods are suboptimally sensitive for contextual representations across diverse time scales. Here, we present BolT, a blood-oxygen-level-dependent transformer model, for analyzing multi-variate fMRI time series. BolT leverages a cascade of transformer encoders equipped with a novel fused window attention mechanism. Encoding is performed on temporally-overlapped windows within the time series to capture local representations. To integrate information temporally, cross-window attention is computed between base tokens in each window and fringe tokens from neighboring windows. To gradually transition from local to global representations, the extent of window overlap and thereby number of fringe tokens are progressively increased across the cascade. Finally, a novel cross-window regularization is employed to align high-level classification features across the time series. Comprehensive experiments on large-scale public datasets demonstrate the superior performance of BolT against state-of-the-art methods. Furthermore, explanatory analyses to identify landmark time points and regions that contribute most significantly to model decisions corroborate prominent neuroscientific findings in the literature.
    AdaInject: Injection Based Adaptive Gradient Descent Optimizers for Convolutional Neural Networks. (arXiv:2109.12504v2 [cs.LG] UPDATED)
    The convolutional neural networks (CNNs) are generally trained using stochastic gradient descent (SGD) based optimization techniques. The existing SGD optimizers generally suffer with the overshooting of the minimum and oscillation near minimum. In this paper, we propose a new approach, hereafter referred as AdaInject, for the gradient descent optimizers by injecting the second order moment into the first order moment. Specifically, the short-term change in parameter is used as a weight to inject the second order moment in the update rule. The AdaInject optimizer controls the parameter update, avoids the overshooting of the minimum and reduces the oscillation near minimum. The proposed approach is generic in nature and can be integrated with any existing SGD optimizer. The effectiveness of the AdaInject optimizer is explained intuitively as well as through some toy examples. We also show the convergence property of the proposed injection based optimizer. Further, we depict the efficacy of the AdaInject approach through extensive experiments in conjunction with the state-of-the-art optimizers, namely AdamInject, diffGradInject, RadamInject, and AdaBeliefInject on four benchmark datasets. Different CNN models are used in the experiments. A highest improvement in the top-1 classification error rate of $16.54\%$ is observed using diffGradInject optimizer with ResNeXt29 model over the CIFAR10 dataset. Overall, we observe very promising performance improvement of existing optimizers with the proposed AdaInject approach. The code is available at: \url{https://github.com/shivram1987/AdaInject}.
    Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence. (arXiv:2204.02570v2 [cs.DS] UPDATED)
    We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include random spanning tree distributions and determinantal point processes. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $\widetilde{O}(\lvert V\rvert)$ time per sample after an initial $\widetilde{O}(\lvert E\rvert)$ time preprocessing. For a determinantal point process on subsets of size $k$ of a ground set of $n$ elements, we show how to approximately sample in $\widetilde{O}(k^\omega)$ time after an initial $\widetilde{O}(nk^{\omega-1})$ time preprocessing, where $\omega<2.372864$ is the matrix multiplication exponent. We even improve the state of the art for obtaining a single sample from determinantal point processes, from the prior runtime of $\widetilde{O}(\min\{nk^2, n^\omega\})$ to $\widetilde{O}(nk^{\omega-1})$. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution $\mu$ on $\binom{[n]}{k}$ is reduced to sampling from related distributions on $\binom{[t]}{k}$ for $t\ll n$. We show that for strongly Rayleigh distributions, we can can achieve the optimal $t=\widetilde{O}(k)$. Our reduction involves sampling from $\widetilde{O}(1)$ domain-sparsified distributions, all of which can be produced efficiently assuming convenient access to approximate overestimates for marginals of $\mu$. Having access to marginals is analogous to having access to the mean and covariance of a continuous distribution, or knowing "isotropy" for the distribution, the key assumption behind the Kannan-Lov\'asz-Simonovits (KLS) conjecture and optimal samplers based on it. We view our result as a moral analog of the KLS conjecture and its consequences for sampling, for discrete strongly Rayleigh measures.
    Improved Generalization Bound and Learning of Sparsity Patterns for Data-Driven Low-Rank Approximation. (arXiv:2209.08281v1 [cs.LG])
    Learning sketching matrices for fast and accurate low-rank approximation (LRA) has gained increasing attention. Recently, Bartlett, Indyk, and Wagner (COLT 2022) presented a generalization bound for the learning-based LRA. Specifically, for rank-$k$ approximation using an $m \times n$ learned sketching matrix with $s$ non-zeros in each column, they proved an $\tilde{\mathrm{O}}(nsm)$ bound on the \emph{fat shattering dimension} ($\tilde{\mathrm{O}}$ hides logarithmic factors). We build on their work and make two contributions. 1. We present a better $\tilde{\mathrm{O}}(nsk)$ bound ($k \le m$). En route to obtaining the bound, we give a low-complexity \emph{Goldberg--Jerrum algorithm} for computing pseudo-inverse matrices, which would be of independent interest. 2. We alleviate an assumption of the previous study that the sparsity pattern of sketching matrices is fixed. We prove that learning positions of non-zeros increases the fat shattering dimension only by ${\mathrm{O}}(ns\log n)$. Also, experiments confirm the practical benefit of learning sparsity patterns.
    VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning. (arXiv:2207.03530v2 [cs.RO] UPDATED)
    While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention in the robotics community as a promising solution to tackle such problems. Nevertheless, we still lack the tools that allow us to quickly and efficiently find solutions to large-scale collective learning tasks. In this work, we introduce the Vectorized Multi-Agent Simulator (VMAS). VMAS is an open-source framework designed for efficient MARL benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of twelve challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface. We demonstrate how vectorization enables parallel simulation on accelerated hardware without added complexity. When comparing VMAS to OpenAI MPE, we show how MPE's execution time increases linearly in the number of simulations while VMAS is able to execute 30,000 parallel simulations in under 10s, proving more than 100x faster. Using VMAS's RLlib interface, we benchmark our multi-robot scenarios using various Proximal Policy Optimization (PPO)-based MARL algorithms. VMAS's scenarios prove challenging in orthogonal ways for state-of-the-art MARL algorithms. The VMAS framework is available at https://github.com/proroklab/VectorizedMultiAgentSimulator. A video of VMAS scenarios and experiments is available at https://youtu.be/aaDRYfiesAY.
    Towards Intercultural Affect Recognition: Audio-Visual Affect Recognition in the Wild Across Six Cultures. (arXiv:2208.00344v2 [cs.CV] UPDATED)
    In our multicultural world, affect-aware AI systems that support humans need the ability to perceive affect across variations in emotion expression patterns across cultures. These systems must perform well in cultural contexts without annotated affect datasets available for training models. A standard assumption in affective computing is that affect recognition models trained and used within the same culture (intracultural) will perform better than models trained on one culture and used on different cultures (intercultural). We test this assumption and present the first systematic study of intercultural affect recognition models using videos of real-world dyadic interactions from six cultures. We develop an attention-based feature selection approach under temporal causal discovery to identify behavioral cues that can be leveraged in intercultural affect recognition models. Across all six cultures, our findings demonstrate that intercultural affect recognition models were as effective or more effective than intracultural models. We identify and contribute useful behavioral features for intercultural affect recognition; facial features from the visual modality were more useful than the audio modality in this study's context. Our paper presents a proof-of-concept and motivation for the future development of intercultural affect recognition systems, especially those deployed in low-resource situations without annotated data.
    Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation. (arXiv:2209.08642v1 [cs.IR])
    Both in academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems. Naturally, the reason for this is that we can directly measure utility metrics that rely on interventions, being the recommendations that are being shown to users. Nevertheless, online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures. In industry, offline metrics are often used as a first-line evaluation to generate promising candidate models to evaluate online. In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods. Two classes of offline metrics exist: proxy-based methods, and counterfactual methods. The first class is often poorly correlated with the online metrics we care about, and the latter class only provides theoretical guarantees under assumptions that cannot be fulfilled in real-world environments. Here, we make the case that simulation-based comparisons provide ways forward beyond offline metrics, and argue that they are a preferable means of evaluation.
    Deep Reinforcement Learning Approach for Trading Automation in The Stock Market. (arXiv:2208.07165v1 [q-fin.TR] CROSS LISTED)
    Deep Reinforcement Learning (DRL) algorithms can scale to previously intractable problems. The automation of profit generation in the stock market is possible using DRL, by combining the financial assets price "prediction" step and the "allocation" step of the portfolio in one unified process to produce fully autonomous systems capable of interacting with their environment to make optimal decisions through trial and error. This work represents a DRL model to generate profitable trades in the stock market, effectively overcoming the limitations of supervised learning approaches. We formulate the trading problem as a Partially Observed Markov Decision Process (POMDP) model, considering the constraints imposed by the stock market, such as liquidity and transaction costs. We then solve the formulated POMDP problem using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm reporting a 2.68 Sharpe Ratio on unseen data set (test data). From the point of view of stock market forecasting and the intelligent decision-making mechanism, this paper demonstrates the superiority of DRL in financial markets over other types of machine learning and proves its credibility and advantages of strategic decision-making.
    Sample-based Uncertainty Quantification with a Single Deterministic Neural Network. (arXiv:2209.08418v1 [cs.LG])
    Development of an accurate, flexible, and numerically efficient uncertainty quantification (UQ) method is one of fundamental challenges in machine learning. Previously, a UQ method called DISCO Nets has been proposed (Bouchacourt et al., 2016) that trains a neural network by minimizing the so-called energy score on training data. This method has shown superior performance on a hand pose estimation task in computer vision, but it remained unclear whether this method works as nicely for regression on tabular data, and how it competes with more recent advanced UQ methods such as NGBoost. In this paper, we propose an improved neural architecture of DISCO Nets that admits a more stable and smooth training. We benchmark this approach on miscellaneous real-world tabular datasets and confirm that it is competitive with or even superior to standard UQ baselines. We also provide a new elementary proof for the validity of using the energy score to learn predictive distributions. Further, we point out that DISCO Nets in its original form ignore epistemic uncertainty and only capture aleatoric uncertainty. We propose a simple fix to this problem.
    Meta-Reinforcement Learning for the Tuning of PI Controllers: An Offline Approach. (arXiv:2203.09661v2 [eess.SY] UPDATED)
    Meta-learning is a branch of machine learning which trains neural network models to synthesize a wide variety of data in order to rapidly solve new problems. In process control, many systems have similar and well-understood dynamics, which suggests it is feasible to create a generalizable controller through meta-learning. In this work, we formulate a meta reinforcement learning (meta-RL) control strategy that can be used to tune proportional--integral controllers. Our meta-RL agent has a recurrent structure that accumulates "context" to learn a system's dynamics through a hidden state variable in closed-loop. This architecture enables the agent to automatically adapt to changes in the process dynamics. In tests reported here, the meta-RL agent was trained entirely offline on first order plus time delay systems, and produced excellent results on novel systems drawn from the same distribution of process dynamics used for training. A key design element is the ability to leverage model-based information offline during training in simulated environments while maintaining a model-free policy structure for interacting with novel processes where there is uncertainty regarding the true process dynamics. Meta-learning is a promising approach for constructing sample-efficient intelligent controllers.
    Risk and optimal policies in bandit experiments. (arXiv:2112.06363v12 [econ.EM] UPDATED)
    We provide a decision theoretic analysis of bandit experiments. Working within the framework of diffusion asymptotics, we define suitable notions of asymptotic Bayes and minimax risk for these experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distributions of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and thereby suggests a practical strategy for dimension reduction. The PDEs characterizing minimal Bayes risk can be solved efficiently using sparse matrix routines. We derive the optimal Bayes and minimax policies from their numerical solutions. These optimal policies substantially dominate existing methods such as Thompson sampling and UCB, often by a factor of two. The framework also covers time discounting and pure exploration.
    Adversarial Robustness through Bias Variance Decomposition: A New Perspective for Federated Learning. (arXiv:2009.09026v3 [cs.LG] UPDATED)
    Federated learning learns a neural network model by aggregating the knowledge from a group of distributed clients under the privacy-preserving constraint. In this work, we show that this paradigm might inherit the adversarial vulnerability of the centralized neural network, i.e., it has deteriorated performance on adversarial examples when the model is deployed. This is even more alarming when federated learning paradigm is designed to approximate the updating behavior of a centralized neural network. To solve this problem, we propose an adversarially robust federated learning framework, named Fed_BVA, with improved server and client update mechanisms. This is motivated by our observation that the generalization error in federated learning can be naturally decomposed into the bias and variance triggered by multiple clients' predictions. Thus, we propose to generate the adversarial examples via maximizing the bias and variance during server update, and learn the adversarially robust model updates with those examples during client update. As a result, an adversarially robust neural network can be aggregated from these improved local clients' model updates. The experiments are conducted on multiple benchmark data sets using several prevalent neural network models, and the empirical results show that our framework is robust against white-box and black-box adversarial corruptions under both IID and non-IID settings.
    Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. (arXiv:2201.11729v5 [cs.LG] UPDATED)
    In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural networks. Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. This translates to an implicit regularization towards locality for the associated convolutional networks. Inspired by our theory, we design explicit regularization discouraging locality, and demonstrate its ability to improve the performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.
    Improving Robustness of Jet Tagging Algorithms with Adversarial Training. (arXiv:2203.13890v2 [physics.data-an] UPDATED)
    Deep learning is a standard tool in the field of high-energy physics, facilitating considerable sensitivity enhancements for numerous analysis strategies. In particular, in identification of physics objects, such as jet flavor tagging, complex neural network architectures play a major role. However, these methods are reliant on accurate simulations. Mismodeling can lead to non-negligible differences in performance in data that need to be measured and calibrated against. We investigate the classifier response to input data with injected mismodelings and probe the vulnerability of flavor tagging algorithms via application of adversarial attacks. Subsequently, we present an adversarial training strategy that mitigates the impact of such simulated attacks and improves the classifier robustness. We examine the relationship between performance and vulnerability and show that this method constitutes a promising approach to reduce the vulnerability to poor modeling.
    What are People Talking about in #BlackLivesMatter and #StopAsianHate? Exploring and Categorizing Twitter Topics Emerging in Online Social Movements through the Latent Dirichlet Allocation Model. (arXiv:2205.14725v2 [cs.IR] UPDATED)
    Minority groups have been using social media to organize social movements that create profound social impacts. Black Lives Matter (BLM) and Stop Asian Hate (SAH) are two successful social movements that have spread on Twitter that promote protests and activities against racism and increase the public's awareness of other social challenges that minority groups face. However, previous studies have mostly conducted qualitative analyses of tweets or interviews with users, which may not comprehensively and validly represent all tweets. Very few studies have explored the Twitter topics within BLM and SAH dialogs in a rigorous, quantified and data-centered approach. Therefore, in this research, we adopted a mixed-methods approach to comprehensively analyze BLM and SAH Twitter topics. We implemented (1) the latent Dirichlet allocation model to understand the top high-level words and topics and (2) open-coding analysis to identify specific themes across the tweets. We collected more than one million tweets with the #blacklivesmatter and #stopasianhate hashtags and compared their topics. Our findings revealed that the tweets discussed a variety of influential topics in depth, and social justice, social movements, and emotional sentiments were common topics in both movements, though with unique subtopics for each movement. Our study contributes to the topic analysis of social movements on social media platforms in particular and the literature on the interplay of AI, ethics, and society in general.
    Task-Agnostic Continual Reinforcement Learning: In Praise of a Simple Baseline. (arXiv:2205.14495v2 [cs.LG] UPDATED)
    We study methods for task-agnostic continual reinforcement learning (TACRL). TACRL is a setting that combines the difficulties of partially-observable RL (a consequence of task agnosticism) and the difficulties of continual learning (CL), i.e., learning on a non-stationary sequence of tasks. We compare TACRL methods with their soft upper bounds prescribed by previous literature: multi-task learning (MTL) methods which do not have to deal with non-stationary data distributions, as well as task-aware methods, which are allowed to operate under full observability. We consider a previously unexplored and straightforward baseline for TACRL, replay-based recurrent RL (3RL), in which we augment an RL algorithm with recurrent mechanisms to mitigate partial observability and experience replay mechanisms for catastrophic forgetting in CL. By studying empirical performance in a sequence of RL tasks, we find surprising occurrences of 3RL matching and overcoming the MTL and task-aware soft upper bounds. We lay out hypotheses that could explain this inflection point of continual and task-agnostic learning research. Our hypotheses are empirically tested in continuous control tasks via a large-scale study of the popular multi-task and continual learning benchmark Meta-World. By analyzing different training statistics including gradient conflict, we find evidence that 3RL's outperformance stems from its ability to quickly infer how new tasks relate with the previous ones, enabling forward transfer.
    A Simple Guard for Learned Optimizers. (arXiv:2201.12426v3 [cs.LG] UPDATED)
    If the trend of learned components eventually outperforming their hand-crafted version continues, learned optimizers will eventually outperform hand-crafted optimizers like SGD or Adam. Even if learned optimizers (L2Os) eventually outpace hand-crafted ones in practice however, they are still not provably convergent and might fail out of distribution. These are the questions addressed here. Currently, learned optimizers frequently outperform generic hand-crafted optimizers (such as gradient descent) at the beginning of learning but they generally plateau after some time while the generic algorithms continue to make progress and often overtake the learned algorithm as Aesop's tortoise which overtakes the hare. L2Os also still have a difficult time generalizing out of distribution. Heaton et al. proposed Safeguarded L2O (GL2O) which can take a learned optimizer and safeguard it with a generic learning algorithm so that by conditionally switching between the two, the resulting algorithm is provably convergent. We propose a new class of Safeguarded L2O, called Loss-Guarded L2O (LGL2O), which is both conceptually simpler and computationally less expensive. The guarding mechanism decides solely based on the expected future loss value of both optimizers. Furthermore, we show theoretical proof of LGL2O's convergence guarantee and empirical results comparing to GL2O and other baselines showing that it combines the best of both L2O and SGD and that in practice converges much better than GL2O.
    Better Uncertainty Calibration via Proper Scores for Classification and Beyond. (arXiv:2203.07835v2 [cs.LG] UPDATED)
    With model trustworthiness being crucial for sensitive real-world applications, practitioners are putting more and more focus on improving the uncertainty calibration of deep neural networks. Calibration errors are designed to quantify the reliability of probabilistic predictions but their estimators are usually biased and inconsistent. In this work, we introduce the framework of proper calibration errors, which relates every calibration error to a proper score and provides a respective upper bound with optimal estimation properties. This relationship can be used to reliably quantify the model calibration improvement. We theoretically and empirically demonstrate the shortcomings of commonly used estimators compared to our approach. Due to the wide applicability of proper scores, this gives a natural extension of recalibration beyond classification.
    Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. (arXiv:2202.05306v3 [cs.LG] UPDATED)
    We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to it in addition to another modality. We refer to this gain as the conditional utilization rate. In the experiments, we consistently observe an imbalance in conditional utilization rates between modalities, across multiple tasks and architectures. Since conditional utilization rate cannot be computed efficiently during training, we introduce a proxy for it based on the pace at which the model learns from each modality, which we refer to as the conditional learning speed. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning. The proposed algorithm improves the model's generalization on three datasets: Colored MNIST, ModelNet40, and NVIDIA Dynamic Hand Gesture.
    A Non-parametric Skill Representation with Soft Null Space Projectors for Fast Generalization. (arXiv:2209.08522v1 [cs.RO])
    Over the last two decades, the robotics community witnessed the emergence of various motion representations that have been used extensively, particularly in behavorial cloning, to compactly encode and generalize skills. Among these, probabilistic approaches have earned a relevant place, owing to their encoding of variations, correlations and adaptability to new task conditions. Modulating such primitives, however, is often cumbersome due to the need for parameter re-optimization which frequently entails computationally costly operations. In this paper we derive a non-parametric movement primitive formulation that contains a null space projector. We show that such formulation allows for fast and efficient motion generation with computational complexity O(n2) without involving matrix inversions, whose complexity is O(n3). This is achieved by using the null space to track secondary targets, with a precision determined by the training dataset. Using a 2D example associated with time input we show that our non-parametric solution compares favourably with a state-of-the-art parametric approach. For demonstrated skills with high-dimensional inputs we show that it permits on-the-fly adaptation as well.
    I'm Me, We're Us, and I'm Us: Tri-directional Contrastive Learning on Hypergraphs. (arXiv:2206.04739v3 [cs.LG] UPDATED)
    Although machine learning on hypergraphs has attracted considerable attention, most of the works have focused on (semi-)supervised learning, which may cause heavy labeling costs and poor generalization. Recently, contrastive learning has emerged as a successful unsupervised representation learning method. Despite the prosperous development of contrastive learning in other domains, contrastive learning on hypergraphs remains little explored. In this paper, we propose TriCL (Tri-directional Contrastive Learning), a general framework for contrastive learning on hypergraphs. Its main idea is tri-directional contrast, and specifically, it aims to maximize in two augmented views the agreement (a) between the same node, (b) between the same group of nodes, and (c) between each group and its members. Together with simple but surprisingly effective data augmentation and negative sampling schemes, these three forms of contrast enable TriCL to capture both microscopic and mesoscopic structural information in node embeddings. Our extensive experiments using 13 baseline approaches, five datasets, and two tasks demonstrate the effectiveness of TriCL, and most noticeably, TriCL consistently outperforms not just unsupervised competitors but also (semi-)supervised competitors mostly by significant margins for node classification.
    Representation Alignment in Neural Networks. (arXiv:2112.07806v2 [cs.LG] UPDATED)
    It is now a standard for neural network representations to be trained on large, publicly available datasets, and used for new problems. The reasons for why neural network representations have been so successful for transfer, however, are still not fully understood. In this paper we show that, after training, neural network representations align their top singular vectors to the targets. We investigate this representation alignment phenomenon in a variety of neural network architectures and find that (a) alignment emerges across a variety of different architectures and optimizers, with more alignment arising from depth (b) alignment increases for layers closer to the output and (c) existing high-performance deep CNNs exhibit high levels of alignment. We then highlight why alignment between the top singular vectors and the targets can speed up learning and show in a classic synthetic transfer problem that representation alignment correlates with positive and negative transfer to similar and dissimilar tasks.
    RDD2022: A multi-national image dataset for automatic Road Damage Detection. (arXiv:2209.08538v1 [cs.CV])
    The data article describes the Road Damage Dataset, RDD2022, which comprises 47,420 road images from six countries, Japan, India, the Czech Republic, Norway, the United States, and China. The images have been annotated with more than 55,000 instances of road damage. Four types of road damage, namely longitudinal cracks, transverse cracks, alligator cracks, and potholes, are captured in the dataset. The annotated dataset is envisioned for developing deep learning-based methods to detect and classify road damage automatically. The dataset has been released as a part of the Crowd sensing-based Road Damage Detection Challenge (CRDDC2022). The challenge CRDDC2022 invites researchers from across the globe to propose solutions for automatic road damage detection in multiple countries. The municipalities and road agencies may utilize the RDD2022 dataset, and the models trained using RDD2022 for low-cost automatic monitoring of road conditions. Further, computer vision and machine learning researchers may use the dataset to benchmark the performance of different algorithms for other image-based applications of the same type (classification, object detection, etc.).
    Protein Representation Learning by Geometric Structure Pretraining. (arXiv:2203.06125v4 [cs.LG] UPDATED)
    Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure. Existing approaches usually pretrain protein language models on a large number of unlabeled amino acid sequences and then finetune the models with some labeled data in downstream tasks. Despite the effectiveness of sequence-based approaches, the power of pretraining on known protein structures, which are available in smaller numbers only, has not been explored for protein property prediction, though protein structures are known to be determinants of protein function. In this paper, we propose to pretrain protein representations according to their 3D structures. We first present a simple yet effective encoder to learn the geometric features of a protein. We pretrain the protein graph encoder by leveraging multiview contrastive learning and different self-prediction tasks. Experimental results on both function prediction and fold classification tasks show that our proposed pretraining methods outperform or are on par with the state-of-the-art sequence-based methods, while using much less data. Our implementation is available at https://github.com/DeepGraphLearning/GearNet.
    Low-Rank Tensor Completion Based on Bivariate Equivalent Minimax-Concave Penalty. (arXiv:2201.12709v3 [cs.CV] UPDATED)
    Low-rank tensor completion (LRTC) is an important problem in computer vision and machine learning. The minimax-concave penalty (MCP) function as a non-convex relaxation has achieved good results in the LRTC problem. To makes all the constant parameters of the MCP function as variables so that futherly improving the adaptability to the change of singular values in the LRTC problem, we propose the bivariate equivalent minimax-concave penalty (BEMCP) theorem. Applying the BEMCP theorem to tensor singular values leads to the bivariate equivalent weighted tensor $\Gamma$-norm (BEWTGN) theorem, and we analyze and discuss its corresponding properties. Besides, to facilitate the solution of the LRTC problem, we give the proximal operators of the BEMCP theorem and BEWTGN. Meanwhile, we propose a BEMCP model for the LRTC problem, which is optimally solved based on alternating direction multiplier (ADMM). Finally, the proposed method is applied to the data restorations of multispectral image (MSI), magnetic resonance imaging (MRI) and color video (CV) in real-world, and the experimental results demonstrate that it outperforms the state-of-arts methods.
    AdaCC: Cumulative Cost-Sensitive Boosting for Imbalanced Classification. (arXiv:2209.08309v1 [cs.LG])
    Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model's performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3%-28.56%] for AUC, [3.4%-21.4%] for balanced accuracy, [4.8%-45%] for gmean and [7.4%-85.5%] for recall.
    Quantum Computing Methods for Supply Chain Management. (arXiv:2209.08246v1 [quant-ph])
    Quantum computing is expected to have transformative influences on many domains, but its practical deployments on industry problems are underexplored. We focus on applying quantum computing to operations management problems in industry, and in particular, supply chain management. Many problems in supply chain management involve large state and action spaces and pose computational challenges on classic computers. We develop a quantized policy iteration algorithm to solve an inventory control problem and demonstrative its effectiveness. We also discuss in-depth the hardware requirements and potential challenges on implementing this quantum algorithm in the near term. Our simulations and experiments are powered by the IBM Qiskit and the qBraid system.
    Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples. (arXiv:2106.09947v2 [cs.LG] UPDATED)
    Evaluating robustness of machine-learning models to adversarial examples is a challenging problem. Many defenses have been shown to provide a false sense of robustness by causing gradient-based attacks to fail, and they have been broken under more rigorous evaluations. Although guidelines and best practices have been suggested to improve current adversarial robustness evaluations, the lack of automatic testing and debugging tools makes it difficult to apply these recommendations in a systematic manner. In this work, we overcome these limitations by: (i) categorizing attack failures based on how they affect the optimization of gradient-based attacks, while also unveiling two novel failures affecting many popular attack implementations and past evaluations; (ii) proposing six novel indicators of failure, to automatically detect the presence of such failures in the attack optimization process; and (iii) suggesting a systematic protocol to apply the corresponding fixes. Our extensive experimental analysis, involving more than 15 models in 3 distinct application domains, shows that our indicators of failure can be used to debug and improve current adversarial robustness evaluations, thereby providing a first concrete step towards automatizing and systematizing them. Our open-source code is available at: https://github.com/pralab/IndicatorsOfAttackFailure.
    RankFeat: Rank-1 Feature Removal for Out-of-distribution Detection. (arXiv:2209.08590v1 [cs.LG])
    The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose \texttt{RankFeat}, a simple yet effective \texttt{post hoc} approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature (\emph{i.e.,} $\mathbf{X}{-} \mathbf{s}_{1}\mathbf{u}_{1}\mathbf{v}_{1}^{T}$). \texttt{RankFeat} achieves the \emph{state-of-the-art} performance and reduces the average false positive rate (FPR95) by 17.90\% compared with the previous best method. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results.
    Allocation Schemes in Analytic Evaluation: Applicant-Centric Holistic or Attribute-Centric Segmented?. (arXiv:2209.08665v1 [cs.HC])
    Many applications such as hiring and university admissions involve evaluation and selection of applicants. These tasks are fundamentally difficult, and require combining evidence from multiple different aspects (what we term "attributes"). In these applications, the number of applicants is often large, and a common practice is to assign the task to multiple evaluators in a distributed fashion. Specifically, in the often-used holistic allocation, each evaluator is assigned a subset of the applicants, and is asked to assess all relevant information for their assigned applicants. However, such an evaluation process is subject to issues such as miscalibration (evaluators see only a small fraction of the applicants and may not get a good sense of relative quality), and discrimination (evaluators are influenced by irrelevant information about the applicants). We identify that such attribute-based evaluation allows alternative allocation schemes. Specifically, we consider assigning each evaluator more applicants but fewer attributes per applicant, termed segmented allocation. We compare segmented allocation to holistic allocation on several dimensions via theoretical and experimental methods. We establish various tradeoffs between these two approaches, and identify conditions under which one approach results in more accurate evaluation than the other.
    Detecting Generated Scientific Papers using an Ensemble of Transformer Models. (arXiv:2209.08283v1 [cs.CL])
    The paper describes neural models developed for the DAGPap22 shared task hosted at the Third Workshop on Scholarly Document Processing. This shared task targets the automatic detection of generated scientific papers. Our work focuses on comparing different transformer-based models as well as using additional datasets and techniques to deal with imbalanced classes. As a final submission, we utilized an ensemble of SciBERT, RoBERTa, and DeBERTa fine-tuned using random oversampling technique. Our model achieved 99.24% in terms of F1-score. The official evaluation results have put our system at the third place.
    NeuCEPT: Locally Discover Neural Networks' Mechanism via Critical Neurons Identification with Precision Guarantee. (arXiv:2209.08448v1 [cs.LG])
    Despite recent studies on understanding deep neural networks (DNNs), there exists numerous questions on how DNNs generate their predictions. Especially, given similar predictions on different input samples, are the underlying mechanisms generating those predictions the same? In this work, we propose NeuCEPT, a method to locally discover critical neurons that play a major role in the model's predictions and identify model's mechanisms in generating those predictions. We first formulate a critical neurons identification problem as maximizing a sequence of mutual-information objectives and provide a theoretical framework to efficiently solve for critical neurons while keeping the precision under control. NeuCEPT next heuristically learns different model's mechanisms in an unsupervised manner. Our experimental results show that neurons identified by NeuCEPT not only have strong influence on the model's predictions but also hold meaningful information about model's mechanisms.
    TOD: GPU-accelerated Outlier Detection via Tensor Operations. (arXiv:2110.14007v3 [cs.LG] UPDATED)
    Outlier detection (OD) is a key learning task for finding rare and deviant data samples, with many time-critical applications such as fraud detection and intrusion detection. In this work, we propose TOD, the first tensor-based system for efficient and scalable outlier detection on distributed multi-GPU machines. A key idea behind TOD is decomposing complex OD applications into a small collection of basic tensor algebra operators. This decomposition enables TOD to accelerate OD computations by leveraging recent advances in deep learning infrastructure in both hardware and software. Moreover, to deploy memory-intensive OD applications on modern GPUs with limited on-device memory, we introduce two key techniques. First, provable quantization speeds up OD computations and reduces its memory footprint by automatically performing specific floating-point operations in lower precision while provably guaranteeing no accuracy loss. Second, to exploit the aggregated compute resources and memory capacity of multiple GPUs, we introduce automatic batching, which decomposes OD computations into small batches for both sequential execution on a single GPU and parallel execution on multiple GPUs. TOD supports a diverse set of OD algorithms. Extensive evaluation on 11 real and 3 synthetic OD datasets shows that TOD is on average 10.9x faster than the leading CPU-based OD system PyOD (with a maximum speedup of 38.9x), and can handle much larger datasets than existing GPU-based OD systems. In addition, TOD allows easy integration of new OD operators, enabling fast prototyping of emerging and yet-to-discovered OD algorithms.
    Deep Plug-and-Play Prior for Hyperspectral Image Restoration. (arXiv:2209.08240v1 [eess.IV])
    Deep-learning-based hyperspectral image (HSI) restoration methods have gained great popularity for their remarkable performance but often demand expensive network retraining whenever the specifics of task changes. In this paper, we propose to restore HSIs in a unified approach with an effective plug-and-play method, which can jointly retain the flexibility of optimization-based methods and utilize the powerful representation capability of deep neural networks. Specifically, we first develop a new deep HSI denoiser leveraging gated recurrent convolution units, short- and long-term skip connections, and an augmented noise level map to better exploit the abundant spatio-spectral information within HSIs. It, therefore, leads to the state-of-the-art performance on HSI denoising under both Gaussian and complex noise settings. Then, the proposed denoiser is inserted into the plug-and-play framework as a powerful implicit HSI prior to tackle various HSI restoration tasks. Through extensive experiments on HSI super-resolution, compressed sensing, and inpainting, we demonstrate that our approach often achieves superior performance, which is competitive with or even better than the state-of-the-art on each task, via a single model without any task-specific training.
    Comparative study of machine learning and deep learning methods on ASD classification. (arXiv:2209.08601v1 [eess.IV])
    The autism dataset is studied to identify the differences between autistic and healthy groups. For this, the resting-state Functional Magnetic Resonance Imaging (rs-fMRI) data of the two groups are analyzed, and networks of connections between brain regions were created. Several classification frameworks are developed to distinguish the connectivity patterns between the groups. The best models for statistical inference and precision were compared, and the tradeoff between precision and model interpretability was analyzed. Finally, the classification accuracy measures were reported to justify the performance of our framework. Our best model can classify autistic and healthy patients on the multisite ABIDE I data with 71% accuracy.
    Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis. (arXiv:2111.14182v2 [cs.CV] UPDATED)
    Most of the existing algorithms for zero-shot classification problems typically rely on the attribute-based semantic relations among categories to realize the classification of novel categories without observing any of their instances. However, training the zero-shot classification models still requires attribute labeling for each class (or even instance) in the training dataset, which is also expensive. To this end, in this paper, we bring up a new problem scenario: "Can we derive zero-shot learning for novel attribute detectors/classifiers and use them to automatically annotate the dataset for labeling efficiency?". Basically, given only a small set of detectors that are learned to recognize some manually annotated attributes (i.e., the seen attributes), we aim to synthesize the detectors of novel attributes in a zero-shot learning manner. Our proposed method, Zero-Shot Learning for Attributes (ZSLA), which is the first of its kind to the best of our knowledge, tackles this new research problem by applying the set operations to first decompose the seen attributes into their basic attributes and then recombine these basic attributes into the novel ones. Extensive experiments are conducted to verify the capacity of our synthesized detectors for accurately capturing the semantics of the novel attributes and show their superior performance in terms of detection and localization compared to other baseline approaches. Moreover, we demonstrate the application of automatic annotation using our synthesized detectors on Caltech-UCSD Birds-200-2011 dataset. Various generalized zero-shot classification algorithms trained upon the dataset re-annotated by ZSLA show comparable performance with those trained with the manual ground-truth annotations. Please refer to our project page for source code: https://yuhsuanli.github.io/ZSLA/
    Mitigating Filter Bubbles within Deep Recommender Systems. (arXiv:2209.08180v1 [cs.LG])
    Recommender systems, which offer personalized suggestions to users, power many of today's social media, e-commerce and entertainment. However, these systems have been known to intellectually isolate users from a variety of perspectives, or cause filter bubbles. In our work, we characterize and mitigate this filter bubble effect. We do so by classifying various datapoints based on their user-item interaction history and calculating the influences of the classified categories on each other using the well known TracIn method. Finally, we mitigate this filter bubble effect without compromising accuracy by carefully retraining our recommender system.
    EMaP: Explainable AI with Manifold-based Perturbations. (arXiv:2209.08453v1 [cs.LG])
    In the last few years, many explanation methods based on the perturbations of input data have been introduced to improve our understanding of decisions made by black-box models. The goal of this work is to introduce a novel perturbation scheme so that more faithful and robust explanations can be obtained. Our study focuses on the impact of perturbing directions on the data topology. We show that perturbing along the orthogonal directions of the input manifold better preserves the data topology, both in the worst-case analysis of the discrete Gromov-Hausdorff distance and in the average-case analysis via persistent homology. From those results, we introduce EMaP algorithm, realizing the orthogonal perturbation scheme. Our experiments show that EMaP not only improves the explainers' performance but also helps them overcome a recently-developed attack against perturbation-based methods.
    FR: Folded Rationalization with a Unified Encoder. (arXiv:2209.08285v1 [cs.LG])
    Conventional works generally employ a two-phase model in which a generator selects the most important pieces, followed by a predictor that makes predictions based on the selected pieces. However, such a two-phase model may incur the degeneration problem where the predictor overfits to the noise generated by a not yet well-trained generator and in turn, leads the generator to converge to a sub-optimal model that tends to select senseless pieces. To tackle this challenge, we propose Folded Rationalization (FR) that folds the two phases of the rationale model into one from the perspective of text semantic extraction. The key idea of FR is to employ a unified encoder between the generator and predictor, based on which FR can facilitate a better predictor by access to valuable information blocked by the generator in the traditional two-phase model and thus bring a better generator. Empirically, we show that FR improves the F1 score by up to 10.3% as compared to state-of-the-art methods.
    PocketNet: A Smaller Neural Network for Medical Image Analysis. (arXiv:2104.10745v4 [eess.IV] UPDATED)
    Medical imaging deep learning models are often large and complex, requiring specialized hardware to train and evaluate these models. To address such issues, we propose the PocketNet paradigm to reduce the size of deep learning models by throttling the growth of the number of channels in convolutional neural networks. We demonstrate that, for a range of segmentation and classification tasks, PocketNet architectures produce results comparable to that of conventional neural networks while reducing the number of parameters by multiple orders of magnitude, using up to 90% less GPU memory, and speeding up training times by up to 40%, thereby allowing such models to be trained and deployed in resource-constrained settings.
    Towards Robust Off-Policy Evaluation via Human Inputs. (arXiv:2209.08682v1 [cs.LG])
    Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes domains such as healthcare, where direct deployment is often infeasible, unethical, or expensive. When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes. Existing approaches consider robustness against a large class of shifts that can arbitrarily change any observable property of the environment. This often results in highly pessimistic estimates of the utilities, thereby invalidating policies that might have been useful in deployment. In this work, we address the aforementioned problem by investigating how domain knowledge can help provide more realistic estimates of the utilities of policies. We leverage human inputs on which aspects of the environments may plausibly change, and adapt the OPE methods to only consider shifts on these aspects. Specifically, we propose a novel framework, Robust OPE (ROPE), which considers shifts on a subset of covariates in the data based on user inputs, and estimates worst-case utility under these shifts. We then develop computationally efficient algorithms for OPE that are robust to the aforementioned shifts for contextual bandits and Markov decision processes. We also theoretically analyze the sample complexity of these algorithms. Extensive experimentation with synthetic and real world datasets from the healthcare domain demonstrates that our approach not only captures realistic dataset shifts accurately, but also results in less pessimistic policy evaluations.
    StackVAE-G: An efficient and interpretable model for time series anomaly detection. (arXiv:2105.08397v2 [cs.LG] UPDATED)
    Recent studies have shown that autoencoder-based models can achieve superior performance on anomaly detection tasks due to their excellent ability to fit complex data in an unsupervised manner. In this work, we propose a novel autoencoder-based model, named StackVAE-G that can significantly bring the efficiency and interpretability to multivariate time series anomaly detection. Specifically, we utilize the similarities across the time series channels by the stacking block-wise reconstruction with a weight-sharing scheme to reduce the size of learned models and also relieve the overfitting to unknown noises in the training data. We also leverage a graph learning module to learn a sparse adjacency matrix to explicitly capture the stable interrelation structure among multiple time series channels for the interpretable pattern reconstruction of interrelated channels. Combining these two modules, we introduce the stacking block-wise VAE (variational autoencoder) with GNN (graph neural network) model for multivariate time series anomaly detection. We conduct extensive experiments on three commonly used public datasets, showing that our model achieves comparable (even better) performance with the state-of-the-art modelsand meanwhile requires much less computation and memory cost. Furthermore, we demonstrate that the adjacency matrix learned by our model accurately captures the interrelation among multiple channels, and can provide valuable information for failure diagnosis applications.
    Self-Organized Polynomial-Time Coordination Graphs. (arXiv:2112.03547v4 [cs.LG] UPDATED)
    Coordination graph is a promising approach to model agent collaboration in multi-agent reinforcement learning. It conducts a graph-based value factorization and induces explicit coordination among agents to complete complicated tasks. However, one critical challenge in this paradigm is the complexity of greedy action selection with respect to the factorized values. It refers to the decentralized constraint optimization problem (DCOP), which and whose constant-ratio approximation are NP-hard problems. To bypass this systematic hardness, this paper proposes a novel method, named Self-Organized Polynomial-time Coordination Graphs (SOP-CG), which uses structured graph classes to guarantee the accuracy and the computational efficiency of collaborated action selection. SOP-CG employs dynamic graph topology to ensure sufficient value function expressiveness. The graph selection is unified into an end-to-end learning paradigm. In experiments, we show that our approach learns succinct and well-adapted graph topologies, induces effective coordination, and improves performance across a variety of cooperative multi-agent tasks.
    Learned Sorted Table Search and Static Indexes in Small Model Space. (arXiv:2107.09480v6 [cs.IR] UPDATED)
    Machine Learning Techniques, properly combined with Data Structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed-up Binary Search, with the use of additional space with respect to the table being searched into. Such space is devoted to the Machine Learning Model. Although in their infancy, they are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor and, in fact, a major open question concerning this area is to assess to what extent one can enjoy the speed-up of Binary Search achieved by Learned Indexes while using constant or nearly constant space models. In this paper, we investigate the mentioned question by (a) introducing two new models, i.e., the Learned k-ary Search Model and the Synoptic Recursive Model Index, respectively; (b) systematically exploring the time-space trade-offs of a hierarchy of existing models, i.e., the ones in the reference software platform Searching on Sorted Data, together with the new ones proposed here. By adhering and extending the current benchmarking methodology, we experimentally show that the Learned k-ary Search Model can speed up Binary Search in constant additional space. Our second model, together with the bi-criteria Piece-wise Geometric Model index, can achieve a speed-up of Binary Search with a model space of 0:05% more than the one taken by the table, being competitive in terms of time-space trade-off with existing proposals. The Synoptic Recursive Model Index and the bi-criteria Piece-wise Geometric Model complement each other quite well across the various levels of the internal memory hierarchy. Finally, our findings stimulate research in this area, since they highlight the need for further studies regarding the time-space relation in Learned Indexes.
    A study on the deviations in performance of FNNs and CNNs in the realm of grayscale adversarial images. (arXiv:2209.08262v1 [cs.CV])
    Neural Networks are prone to having lesser accuracy in the classification of images with noise perturbation. Convolutional Neural Networks, CNNs are known for their unparalleled accuracy in the classification of benign images. But our study shows that they are extremely vulnerable to noise addition while Feed-forward Neural Networks, FNNs show very less correspondence with noise perturbation, maintaining their accuracy almost undisturbed. FNNs are observed to be better at classifying noise-intensive, single-channeled images that are just sheer noise to human vision. In our study, we have used the hand-written digits dataset, MNIST with the following architectures: FNNs with 1 and 2 hidden layers and CNNs with 3, 4, 6 and 8 convolutions and analyzed their accuracies. FNNs stand out to show that irrespective of the intensity of noise, they have a classification accuracy of more than 85%. In our analysis of CNNs with this data, the deceleration of classification accuracy of CNN with 8 convolutions was half of that of the rest of the CNNs. Correlation analysis and mathematical modelling of the accuracy trends act as roadmaps to these conclusions.
    DIGRAC: Digraph Clustering Based on Flow Imbalance. (arXiv:2106.05194v7 [stat.ML] UPDATED)
    Node clustering is a powerful tool in the analysis of networks. We introduce a graph neural network framework to obtain node embeddings for directed networks in a self-supervised manner, including a novel probabilistic imbalance loss, which can be used for network clustering. Here, we propose directed flow imbalance measures, which are tightly related to directionality, to reveal clusters in the network even when there is no density difference between clusters. In contrast to standard approaches in the literature, in this paper, directionality is not treated as a nuisance, but rather contains the main signal. DIGRAC optimizes directed flow imbalance for clustering without requiring label supervision, unlike existing graph neural network methods, and can naturally incorporate node features, unlike existing spectral methods. Extensive experimental results on synthetic data, in the form of directed stochastic block models, and real-world data at different scales, demonstrate that our method, based on flow imbalance, attains state-of-the-art results on directed graph clustering when compared against 10 state-of-the-art methods from the literature, for a wide range of noise and sparsity levels, graph structures and topologies, and even outperforms supervised methods.
    Integrated Sensing and Communication from Learning Perspective: An SDP3 Approach. (arXiv:2107.09621v2 [cs.IT] UPDATED)
    Characterizing the sensing and communication performance tradeoff in integrated sensing and communication (ISAC) systems is challenging in the applications of learning-based human motion recognition. This is because of the large experimental datasets and the black-box nature of deep neural networks. This paper presents SDP3, a Simulation-Driven Performance Predictor and oPtimizer, which consists of SDP3 data simulator, SDP3 performance predictor and SDP3 performance optimizer. Specifically, the SDP3 data simulator generates vivid wireless sensing datasets in a virtual environment, the SDP3 performance predictor predicts the sensing performance based on the function regression method, and the SDP3 performance optimizer investigates the sensing and communication performance tradeoff analytically. It is shown that the simulated sensing dataset matches the experimental dataset very well in the motion recognition accuracy. By leveraging SDP3, it is found that the achievable region of recognition accuracy and communication throughput consists of a communication saturation zone, a sensing saturation zone, and a communication-sensing adversarial zone, of which the desired balanced performance for ISAC systems lies in the third one.
    Deep learning for reconstructing protein structures from cryo-EM density maps: recent advances and future directions. (arXiv:2209.08171v1 [q-bio.BM])
    Cryo-Electron Microscopy (cryo-EM) has emerged as a key technology to determine the structure of proteins, particularly large protein complexes and assemblies in recent years. A key challenge in cryo-EM data analysis is to automatically reconstruct accurate protein structures from cryo-EM density maps. In this review, we briefly overview various deep learning methods for building protein structures from cryo-EM density maps, analyze their impact, and discuss the challenges of preparing high-quality data sets for training deep learning models. Looking into the future, more advanced deep learning models of effectively integrating cryo-EM data with other sources of complementary data such as protein sequences and AlphaFold-predicted structures need to be developed to further advance the field.
    A Splicing Approach to Best Subset of Groups Selection. (arXiv:2104.12576v3 [cs.LG] UPDATED)
    Best subset of groups selection (BSGS) is the process of selecting a small part of non-overlapping groups to achieve the best interpretability on the response variable. It has attracted increasing attention and has far-reaching applications in practice. However, due to the computational intractability of BSGS in high-dimensional settings, developing efficient algorithms for solving BSGS remains a research hotspot. In this paper,we propose a group-splicing algorithm that iteratively detects the relevant groups and excludes the irrelevant ones. Moreover, coupled with a novel group information criterion, we develop an adaptive algorithm to determine the optimal model size. Under mild conditions, it is certifiable that our algorithm can identify the optimal subset of groups in polynomial time with high probability. Finally, we demonstrate the efficiency and accuracy of our methods by comparing them with several state-of-the-art algorithms on both synthetic and real-world datasets.
    On the combination of graph data for assessing thin-file borrowers' creditworthiness. (arXiv:2111.13666v2 [cs.SI] UPDATED)
    The thin-file borrowers are customers for whom a creditworthiness assessment is uncertain due to their lack of credit history; many researchers have used borrowers' relationships and interactions networks in the form of graphs as an alternative data source to address this. Incorporating network data is traditionally made by hand-crafted feature engineering, and lately, the graph neural network has emerged as an alternative, but it still does not improve over the traditional method's performance. Here we introduce a framework to improve credit scoring models by blending several Graph Representation Learning methods: feature engineering, graph embeddings, and graph neural networks. We stacked their outputs to produce a single score in this approach. We validated this framework using a unique multi-source dataset that characterizes the relationships and credit history for the entire population of a Latin American country, applying it to credit risk models, application, and behavior, targeting both individuals and companies. Our results show that the graph representation learning methods should be used as complements, and these should not be seen as self-sufficient methods as is currently done. In terms of AUC and KS, we enhance the statistical performance, outperforming traditional methods. In Corporate lending, where the gain is much higher, it confirms that evaluating an unbanked company cannot solely consider its features. The business ecosystem where these firms interact with their owners, suppliers, customers, and other companies provides novel knowledge that enables financial institutions to enhance their creditworthiness assessment. Our results let us know when and which group to use graph data and what effects on performance to expect. They also show the enormous value of graph data on the unbanked credit scoring problem, principally to help companies' banking.
    Community detection for directed weighted networks. (arXiv:2109.10319v3 [stat.ML] UPDATED)
    \cite{rohe2016co} proposed Stochastic co-Blockmodel (ScBM) as a tool for detecting community structure of binary directed graph data in network studies. However, ScBM completely ignores node weight, and is unable to explain the block structure of directed weighted network which appears in various areas, such as biology, sociology, physiology and computer science. Here, to model directed weighted network, we introduce a Directed Distribution-Free model by releasing ScBM's distribution restriction. We also build an extension of the proposed model by considering variation of node degree. Our models do not require a specific distribution on generating elements of adjacency matrix but only a block structure on the expected adjacency matrix. Spectral algorithms with theoretical guarantee on consistent estimation of node label are presented to identify communities. Our proposed methods are illustrated by simulated and empirical examples.
    Multi-objective Optimization by Learning Space Partitions. (arXiv:2110.03173v4 [cs.LG] UPDATED)
    In contrast to single-objective optimization (SOO), multi-objective optimization (MOO) requires an optimizer to find the Pareto frontier, a subset of feasible solutions that are not dominated by other feasible solutions. In this paper, we propose LaMOO, a novel multi-objective optimizer that learns a model from observed samples to partition the search space and then focus on promising regions that are likely to contain a subset of the Pareto frontier. The partitioning is based on the dominance number, which measures "how close" a data point is to the Pareto frontier among existing samples. To account for possible partition errors due to limited samples and model mismatch, we leverage Monte Carlo Tree Search (MCTS) to exploit promising regions while exploring suboptimal regions that may turn out to contain good solutions later. Theoretically, we prove the efficacy of learning space partitioning via LaMOO under certain assumptions. Empirically, on the HyperVolume (HV) benchmark, a popular MOO metric, LaMOO substantially outperforms strong baselines on multiple real-world MOO tasks, by up to 225% in sample efficiency for neural architecture search on Nasbench201, and up to 10% for molecular design.
    PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions. (arXiv:2110.04176v2 [cs.LG] UPDATED)
    Hypercomplex neural networks have proven to reduce the overall number of parameters while ensuring valuable performance by leveraging the properties of Clifford algebras. Recently, hypercomplex linear layers have been further improved by involving efficient parameterized Kronecker products. In this paper, we define the parameterization of hypercomplex convolutional layers and introduce the family of parameterized hypercomplex neural networks (PHNNs) that are lightweight and efficient large-scale models. Our method grasps the convolution rules and the filter organization directly from data without requiring a rigidly predefined domain structure to follow. PHNNs are flexible to operate in any user-defined or tuned domain, from 1D to $n$D regardless of whether the algebra rules are preset. Such a malleability allows processing multidimensional inputs in their natural domain without annexing further dimensions, as done, instead, in quaternion neural networks for 3D inputs like color images. As a result, the proposed family of PHNNs operates with $1/n$ free parameters as regards its analog in the real domain. We demonstrate the versatility of this approach to multiple domains of application by performing experiments on various image datasets as well as audio datasets in which our method outperforms real and quaternion-valued counterparts. Full code is available at: https://github.com/eleGAN23/HyperNets.
    Tensor Principal Component Analysis in High Dimensional CP Models. (arXiv:2108.04428v4 [stat.ML] UPDATED)
    The CP decomposition for high dimensional non-orthogonal spiked tensors is an important problem with broad applications across many disciplines. However, previous works with theoretical guarantee typically assume restrictive incoherence conditions on the basis vectors for the CP components. In this paper, we propose new computationally efficient composite PCA and concurrent orthogonalization algorithms for tensor CP decomposition with theoretical guarantees under mild incoherence conditions. The composite PCA applies the principal component or singular value decompositions twice, first to a matrix unfolding of the tensor data to obtain singular vectors and then to the matrix folding of the singular vectors obtained in the first step. It can be used as an initialization for any iterative optimization schemes for the tensor CP decomposition. The concurrent orthogonalization algorithm iteratively estimates the basis vector in each mode of the tensor by simultaneously applying projections to the orthogonal complements of the spaces generated by other CP components in other modes. It is designed to improve the alternating least squares estimator and other forms of the high order orthogonal iteration for tensors with low or moderately high CP ranks, and it is guaranteed to converge rapidly when the error of any given initial estimator is bounded by a small constant. Our theoretical investigation provides estimation accuracy and convergence rates for the two proposed algorithms. Both proposed algorithms are applicable to deterministic tensor, its noisy version, and the order-$2K$ covariance tensor of order-$K$ tensor data in a factor model with uncorrelated factors. Our implementations on synthetic data demonstrate significant practical superiority of our approach over existing methods.
    Which Samples Should be Learned First: Easy or Hard?. (arXiv:2110.05481v4 [cs.LG] UPDATED)
    An effective weighting scheme for training samples is essential for learning tasks. Numerous weighting schemes have been proposed. Some schemes take the easy-first mode, whereas some others take the hard-first one. Naturally, an interesting yet realistic question is raised. Which samples should be learned first given a new learning task, easy or hard? To answer this question, both theoretical analyses and experimental verification are conducted. First, a general optimized objective function is proposed, revealing the relationship between the difficulty distribution and the difficulty-based sample weights. Second, on the basis of the optimized objective function, theoretical answers are obtained. Besides the easy-first and hard-first modes, there are two other priority modes, namely, medium-first and two-ends-first. The prior mode does not necessarily remain unchanged during the training process. Third, an effective and universal solution is proposed to select the optimal priority mode when there is no prior knowledge or theoretical clues. The four modes, namely, easy/medium/hard/two-ends-first, can be flexibly switched in the proposed solution. Fourth, a wide range of experiments is conducted under various scenarios to further compare the weighting schemes in different modes. On the basis of these works, reasonable and comprehensive answers are obtained. Factors including the distribution of samples' learning difficulties and the validation data determine which samples should be learned first in a learning task.
    Is Stochastic Gradient Descent Near Optimal?. (arXiv:2209.08627v1 [cs.LG])
    The success of neural networks over the past decade has established them as effective models for many relevant data generating processes. Statistical theory on neural networks indicates graceful scaling of sample complexity. For example, Joen & Van Roy (arXiv:2203.00246) demonstrate that, when data is generated by a ReLU teacher network with $W$ parameters, an optimal learner needs only $\tilde{O}(W/\epsilon)$ samples to attain expected error $\epsilon$. However, existing computational theory suggests that, even for single-hidden-layer teacher networks, to attain small error for all such teacher networks, the computation required to achieve this sample complexity is intractable. In this work, we fit single-hidden-layer neural networks to data generated by single-hidden-layer ReLU teacher networks with parameters drawn from a natural distribution. We demonstrate that stochastic gradient descent (SGD) with automated width selection attains small expected error with a number of samples and total number of queries both nearly linear in the input dimension and width. This suggests that SGD nearly achieves the information-theoretic sample complexity bounds of Joen & Van Roy (arXiv:2203.00246) in a computationally efficient manner. An important difference between our positive empirical results and the negative theoretical results is that the latter address worst-case error of deterministic algorithms, while our analysis centers on expected error of a stochastic algorithm.
    Cell Attention Networks. (arXiv:2209.08179v1 [cs.LG])
    Since their introduction, graph attention networks achieved outstanding results in graph representation learning tasks. However, these networks consider only pairwise relationships among nodes and then they are not able to fully exploit higher-order interactions present in many real world data-sets. In this paper, we introduce Cell Attention Networks (CANs), a neural architecture operating on data defined over the vertices of a graph, representing the graph as the 1-skeleton of a cell complex introduced to capture higher order interactions. In particular, we exploit the lower and upper neighborhoods, as encoded in the cell complex, to design two independent masked self-attention mechanisms, thus generalizing the conventional graph attention strategy. The approach used in CANs is hierarchical and it incorporates the following steps: i) a lifting algorithm that learns {\it edge features} from {\it node features}; ii) a cell attention mechanism to find the optimal combination of edge features over both lower and upper neighbors; iii) a hierarchical {\it edge pooling} mechanism to extract a compact meaningful set of features. The experimental results show that CAN is a low complexity strategy that compares favorably with state of the art results on graph-based learning tasks.
    Homomorphic Sensing of Subspace Arrangements. (arXiv:2006.05158v4 [cs.LG] UPDATED)
    Homomorphic sensing is a recent algebraic-geometric framework that studies the unique recovery of points in a linear subspace from their images under a given collection of linear maps. It has been successful in interpreting such a recovery in the case of permutations composed by coordinate projections, an important instance in applications known as unlabeled sensing, which models data that are out of order and have missing values. In this paper, we provide tighter and simpler conditions that guarantee the unique recovery for the single-subspace case, extend the result to the case of a subspace arrangement, and show that the unique recovery in a single subspace is locally stable under noise. We specialize our results to several examples of homomorphic sensing such as real phase retrieval and unlabeled sensing. In so doing, in a unified way, we obtain conditions that guarantee the unique recovery for those examples, typically known via diverse techniques in the literature, as well as novel conditions for sparse and unsigned versions of unlabeled sensing. Similarly, our noise result also implies that the unique recovery in unlabeled sensing is locally stable.
    Exploring the Training Robustness of Distributional Reinforcement Learning against Noisy State Observations. (arXiv:2109.08776v4 [cs.LG] UPDATED)
    In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning~(RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we validate the contraction of distributional Bellman operators in the State-Noisy Markov Decision Process~(SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we then analyze the vulnerability of least squared loss in expectation-based RL with either linear or nonlinear function approximation. By contrast, we theoretically characterize the bounded gradient norm of distributional RL loss based on the categorical parameterization equipped with the Kullback-Leibler~(KL) divergence. The resulting stable gradients while the optimization in distributional RL accounts for its better training robustness against state observation noises. Finally, extensive experiments on the suite of environments verified that distributional RL is less vulnerable against both random and adversarial noisy state observations compared with its expectation-based counterpart.
    Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes. (arXiv:2209.08666v1 [cs.LG])
    We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment. To tackle the above challenges, we study the policy learning in the confounded MDPs with the aid of instrumental variables. Specifically, we first establish value function (VF)-based and marginalized importance sampling (MIS)-based identification results for the expected total reward in the confounded MDPs. Then by leveraging pessimism and our identification results, we propose various policy learning methods with the finite-sample suboptimality guarantee of finding the optimal in-class policy under minimal data coverage and modeling assumptions. Lastly, our extensive theoretical investigations and one numerical study motivated by the kidney transplantation demonstrate the promising performance of the proposed methods.
    Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation. (arXiv:2012.07975v2 [cs.RO] UPDATED)
    Recent advances in unsupervised representation learning significantly improved the sample efficiency of training Reinforcement Learning policies in simulated environments. However, similar gains have not yet been seen for real-robot reinforcement learning. In this work, we focus on enabling data-efficient real-robot learning from pixels. We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards. While contrastive pre-training, data augmentation, demonstrations, and reinforcement learning are alone insufficient for efficient learning, our main contribution is showing that the combination of these disparate techniques results in a simple yet data-efficient method. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels, such as reaching, picking, moving, pulling a large object, flipping a switch, and opening a drawer in just 30 minutes of mean real-world training time. We include videos and code on the project website: https://sites.google.com/view/efficient-robotic-manipulation/home
    Probabilistic Autoencoder. (arXiv:2006.05479v4 [cs.LG] UPDATED)
    Principal Component Analysis (PCA) minimizes the reconstruction error given a class of linear models of fixed component dimensionality. Probabilistic PCA adds a probabilistic structure by learning the probability distribution of the PCA latent space weights, thus creating a generative model. Autoencoders (AE) minimize the reconstruction error in a class of nonlinear models of fixed latent space dimensionality and outperform PCA at fixed dimensionality. Here, we introduce the Probabilistic Autoencoder (PAE) that learns the probability distribution of the AE latent space weights using a normalizing flow (NF). The PAE is fast and easy to train and achieves small reconstruction errors, high sample quality, and good performance in downstream tasks. We compare the PAE to Variational AE (VAE), showing that the PAE trains faster, reaches a lower reconstruction error, and produces good sample quality without requiring special tuning parameters or training procedures. We further demonstrate that the PAE is a powerful model for performing the downstream tasks of probabilistic image reconstruction in the context of Bayesian inference of inverse problems for inpainting and denoising applications. Finally, we identify latent space density from NF as a promising outlier detection metric.
    Evons: A Dataset for Fake and Real News Virality Analysis and Prediction. (arXiv:2209.08129v1 [cs.CV])
    We present a novel collection of news articles originating from fake and real news media sources for the analysis and prediction of news virality. Unlike existing fake news datasets which either contain claims or news article headline and body, in this collection each article is supported with a Facebook engagement count which we consider as an indicator of the article virality. In addition we also provide the article description and thumbnail image with which the article was shared on Facebook. These images were automatically annotated with object tags and color attributes. Using cloud based vision analysis tools, thumbnail images were also analyzed for faces and detected faces were annotated with facial attributes. We empirically investigate the use of this collection on an example task of article virality prediction.
    You Only Hear Once: A YOLO-like Algorithm for Audio Segmentation and Sound Event Detection. (arXiv:2109.00962v3 [eess.AS] UPDATED)
    Audio segmentation and sound event detection are crucial topics in machine listening that aim to detect acoustic classes and their respective boundaries. It is useful for audio-content analysis, speech recognition, audio-indexing, and music information retrieval. In recent years, most research articles adopt segmentation-by-classification. This technique divides audio into small frames and individually performs classification on these frames. In this paper, we present a novel approach called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. This is done by having separate output neurons to detect the presence of an audio class and predict its start and end points. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets for audio segmentation and sound event detection. As the output of YOHO is more end-to-end and has fewer neurons to predict, the speed of inference is at least 6 times faster than segmentation-by-classification. In addition, as this approach predicts acoustic boundaries directly, the post-processing and smoothing is about 7 times faster.
    Emission-Aware Optimization of Gas Networks: Input-Convex Neural Network Approach. (arXiv:2209.08645v1 [cs.LG])
    Gas network planning optimization under emission constraints prioritizes gas supply with the least CO$_2$ intensity. As this problem includes complex physical laws of gas flow, standard optimization solvers cannot guarantee convergence to a feasible solution. To address this issue, we develop an input-convex neural network (ICNN) aided optimization routine which incorporates a set of trained ICNNs approximating the gas flow equations with high precision. Numerical tests on the Belgium gas network demonstrate that the ICNN-aided optimization dominates non-convex and relaxation-based solvers, with larger optimality gains pertaining to stricter emission targets. Moreover, whenever the non-convex solver fails, the ICNN-aided optimization provides a feasible solution to network planning.
    PIM-QAT: Neural Network Quantization for Processing-In-Memory (PIM) Systems. (arXiv:2209.08617v1 [cs.LG])
    Processing-in-memory (PIM), an increasingly studied neuromorphic hardware, promises orders of energy and throughput improvements for deep learning inference. Leveraging the massively parallel and efficient analog computing inside memories, PIM circumvents the bottlenecks of data movements in conventional digital hardware. However, an extra quantization step (i.e. PIM quantization), typically with limited resolution due to hardware constraints, is required to convert the analog computing results into digital domain. Meanwhile, non-ideal effects extensively exist in PIM quantization because of the imperfect analog-to-digital interface, which further compromises the inference accuracy. In this paper, we propose a method for training quantized networks to incorporate PIM quantization, which is ubiquitous to all PIM systems. Specifically, we propose a PIM quantization aware training (PIM-QAT) algorithm, and introduce rescaling techniques during backward and forward propagation by analyzing the training dynamics to facilitate training convergence. We also propose two techniques, namely batch normalization (BN) calibration and adjusted precision training, to suppress the adverse effects of non-ideal linearity and stochastic thermal noise involved in real PIM chips. Our method is validated on three mainstream PIM decomposition schemes, and physically on a prototype chip. Comparing with directly deploying conventionally trained quantized model on PIM systems, which does not take into account this extra quantization step and thus fails, our method provides significant improvement. It also achieves comparable inference accuracy on PIM systems as that of conventionally quantized models on digital hardware, across CIFAR10 and CIFAR100 datasets using various network depths for the most popular network topology.
    MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning. (arXiv:2209.08244v1 [cs.LG])
    Decentralized learning has shown great promise for cooperative multi-agent reinforcement learning (MARL). However, non-stationarity remains a significant challenge in decentralized learning. In the paper, we tackle the non-stationarity problem in the simplest and fundamental way and propose \textit{multi-agent alternate Q-learning} (MA2QL), where agents take turns to update their Q-functions by Q-learning. MA2QL is a \textit{minimalist} approach to fully decentralized cooperative MARL but is theoretically grounded. We prove that when each agent guarantees a $\varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium. In practice, MA2QL only requires minimal changes to independent Q-learning (IQL). We empirically evaluate MA2QL on a variety of cooperative multi-agent tasks. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
    pFedDef: Defending Grey-Box Attacks for Personalized Federated Learning. (arXiv:2209.08412v1 [cs.LG])
    Personalized federated learning allows for clients in a distributed system to train a neural network tailored to their unique local data while leveraging information at other clients. However, clients' models are vulnerable to attacks during both the training and testing phases. In this paper we address the issue of adversarial clients crafting evasion attacks at test time to deceive other clients. For example, adversaries may aim to deceive spam filters and recommendation systems trained with personalized federated learning for monetary gain. The adversarial clients have varying degrees of personalization based on the method of distributed learning, leading to a "grey-box" situation. We are the first to characterize the transferability of such internal evasion attacks for different learning methods and analyze the trade-off between model accuracy and robustness depending on the degree of personalization and similarities in client data. We introduce a defense mechanism, pFedDef, that performs personalized federated adversarial training while respecting resource limitations at clients that inhibit adversarial training. Overall, pFedDef increases relative grey-box adversarial robustness by 62% compared to federated adversarial training and performs well even under limited system resources.
    Inducing Early Neural Collapse in Deep Neural Networks for Improved Out-of-Distribution Detection. (arXiv:2209.08378v1 [cs.LG])
    We propose a simple modification to standard ResNet architectures--L2 regularization over feature space--that substantially improves out-of-distribution (OoD) performance on the previously proposed Deep Deterministic Uncertainty (DDU) benchmark. This change also induces early Neural Collapse (NC), which we show is an effect under which better OoD performance is more probable. Our method achieves comparable or superior OoD detection scores and classification accuracy in a small fraction of the training time of the benchmark. Additionally, it substantially improves worst case OoD performance over multiple, randomly initialized models. Though we do not suggest that NC is the sole mechanism or comprehensive explanation for OoD behaviour in deep neural networks (DNN), we believe NC's simple mathematical and geometric structure can provide an framework for analysis of this complex phenomenon in future work.
    Efficient Climate Simulation via Machine Learning Method. (arXiv:2209.08151v1 [physics.ao-ph])
    Hybrid modeling combining data-driven techniques and numerical methods is an emerging and promising research direction for efficient climate simulation. However, previous works lack practical platforms, making developing hybrid modeling a challenging programming problem. Furthermore, the lack of standard data sets and evaluation metrics may hamper researchers from comprehensively comparing various algorithms under a uniform condition. To address these problems, we propose a framework called NeuroClim for hybrid modeling under the real-world scenario, a basic setting to simulate the real climate that we live in. NeuroClim consists of three parts: (1) Platform. We develop a user-friendly platform NeuroGCM for efficiently developing hybrid modeling in climate simulation. (2) Dataset. We provide an open-source dataset for data-driven methods in hybrid modeling. We investigate the characteristics of the data, i.e., heterogeneity and stiffness, which reveals the difficulty of regressing climate simulation data; (3) Metrics. We propose a methodology for quantitatively evaluating hybrid modeling, including the approximation ability of machine learning models and the stability during simulation. We believe that NeuroClim allows researchers to work without high level of climate-related expertise and focus only on machine learning algorithm design, which will accelerate hybrid modeling research in the AI-Climate intersection. The codes and data are released at https://github.com/x-w19/NeuroClim.
    On the Whitney near extension problem, BMO, alignment of data, best approximation in algebraic geometry, manifold learning and their beautiful connections: A modern treatment. (arXiv:2103.09748v6 [math.CA] UPDATED)
    This paper provides fascinating connections between several mathematical problems which lie on the intersection of several mathematics subjects, namely algebraic geometry, approximation theory, complex-harmonic analysis and high dimensional data science. Modern techniques in algebraic geometry, approximation theory, computational harmonic analysis and extensions develop the first of its kind, a unified framework which allows for a simultaneous study of labeled and unlabeled near alignment data problems in of $\mathbb R^D$ with the near isometry extension problem for discrete and non-discrete subsets of $\mathbb R^D$ with certain geometries. In addition, the paper surveys related work on clustering, dimension reduction, manifold learning, vision as well as minimal energy partitions, discrepancy and min-max optimization. Numerous open problems are given.
    Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$. (arXiv:2209.08399v1 [cs.LG])
    Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations with a minimal number of weights. In most of the current literature these weights are fully or partially hand-crafted, showing the capabilities of neural networks but not necessarily their practical performance. In contrast, optimization theory for neural networks heavily relies on an abundance of weights in over-parametrized regimes. This paper balances these two demands and provides an approximation result for shallow networks in $1d$ with non-convex weight optimization by gradient descent. We consider finite width networks and infinite sample limits, which is the typical setup in approximation theory. Technically, this problem is not over-parametrized, however, some form of redundancy reappears as a loss in approximation rate compared to best possible rates.
    Efficient Deep Clustering of Human Activities and How to Improve Evaluation. (arXiv:2209.08335v1 [cs.LG])
    There has been much recent research on human activity re\-cog\-ni\-tion (HAR), due to the proliferation of wearable sensors in watches and phones, and the advances of deep learning methods, which avoid the need to manually extract features from raw sensor signals. A significant disadvantage of deep learning applied to HAR is the need for manually labelled training data, which is especially difficult to obtain for HAR datasets. Progress is starting to be made in the unsupervised setting, in the form of deep HAR clustering models, which can assign labels to data without having been given any labels to train on, but there are problems with evaluating deep HAR clustering models, which makes assessing the field and devising new methods difficult. In this paper, we highlight several distinct problems with how deep HAR clustering models are evaluated, describing these problems in detail and conducting careful experiments to explicate the effect that they can have on results. We then discuss solutions to these problems, and suggest standard evaluation settings for future deep HAR clustering models. Additionally, we present a new deep clustering model for HAR. When tested under our proposed settings, our model performs better than (or on par with) existing models, while also being more efficient and better able to scale to more complex datasets by avoiding the need for an autoencoder.
    Causal Feature Selection via Orthogonal Search. (arXiv:2007.02938v3 [stat.ML] UPDATED)
    The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. However, established approaches often scale at least exponentially with the number of explanatory variables, are difficult to extend to nonlinear relationships, and are difficult to extend to cyclic data. Inspired by {\em Debiased} machine learning methods, we study a one-vs.-the-rest feature selection approach to discover the direct causal parent of the response. We propose an algorithm that works for purely observational data while also offering theoretical guarantees, including the case of partially nonlinear relationships possibly under the presence of cycles. As it requires only one estimation for each variable, our approach is applicable even to large graphs. We demonstrate significant improvements compared to established approaches.
    Rethinking Personalized Ranking at Pinterest: An End-to-End Approach. (arXiv:2209.08435v1 [cs.IR])
    In this work, we present our journey to revolutionize the personalized recommendation engine through end-to-end learning from raw user actions. We encode user's long-term interest in Pinner- Former, a user embedding optimized for long-term future actions via a new dense all-action loss, and capture user's short-term intention by directly learning from the real-time action sequences. We conducted both offline and online experiments to validate the performance of the new model architecture, and also address the challenge of serving such a complex model using mixed CPU/GPU setup in production. The proposed system has been deployed in production at Pinterest and has delivered significant online gains across organic and Ads applications.
    DynaConF: Dynamic Forecasting of Non-Stationary Time-Series. (arXiv:2209.08411v1 [cs.LG])
    Deep learning models have shown impressive results in a variety of time series forecasting tasks, where modeling the conditional distribution of the future given the past is the essence. However, when this conditional distribution is non-stationary, it poses challenges for these models to learn consistently and to predict accurately. In this work, we propose a new method to model non-stationary conditional distributions over time by clearly decoupling stationary conditional distribution modeling from non-stationary dynamics modeling. Our method is based on a Bayesian dynamic model that can adapt to conditional distribution changes and a deep conditional distribution model that can handle large multivariate time series using a factorized output space. Our experimental results on synthetic and popular public datasets show that our model can adapt to non-stationary time series better than state-of-the-art deep learning solutions.
    Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective. (arXiv:2209.08466v1 [cs.LG])
    While reinforcement learning (RL) methods that learn an internal model of the environment have the potential to be more sample efficient than their model-free counterparts, learning to model raw observations from high dimensional sensors can be challenging. Prior work has addressed this challenge by learning low-dimensional representation of observations through auxiliary objectives, such as reconstruction or value prediction. However, the alignment between these auxiliary objectives and the RL objective is often unclear. In this work, we propose a single objective which jointly optimizes a latent-space model and policy to achieve high returns while remaining self-consistent. This objective is a lower bound on expected returns. Unlike prior bounds for model-based RL on policy exploration or model guarantees, our bound is directly on the overall RL objective. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods. While such sample efficient methods typically are computationally demanding, our method attains the performance of SAC in about 50\% less wall-clock time.
    A Robust and Constrained Multi-Agent Reinforcement Learning Framework for Electric Vehicle AMoD Systems. (arXiv:2209.08230v1 [cs.MA])
    Electric vehicles (EVs) play critical roles in autonomous mobility-on-demand (AMoD) systems, but their unique charging patterns increase the model uncertainties in AMoD systems (e.g. state transition probability). Since there usually exists a mismatch between the training and test (true) environments, incorporating model uncertainty into system design is of critical importance in real-world applications. However, model uncertainties have not been considered explicitly in EV AMoD system rebalancing by existing literature yet and remain an urgent and challenging task. In this work, we design a robust and constrained multi-agent reinforcement learning (MARL) framework with transition kernel uncertainty for the EV rebalancing and charging problem. We then propose a robust and constrained MARL algorithm (ROCOMA) that trains a robust EV rebalancing policy to balance the supply-demand ratio and the charging utilization rate across the whole city under state transition uncertainty. Experiments show that the ROCOMA can learn an effective and robust rebalancing policy. It outperforms non-robust MARL methods when there are model uncertainties. It increases the system fairness by 19.6% and decreases the rebalancing costs by 75.8%.
    VisTaNet: Attention Guided Deep Fusion for Surface Roughness Classification. (arXiv:2209.08516v1 [cs.CV])
    Human texture perception is a weighted average of multi-sensory inputs: visual and tactile. While the visual sensing mechanism extracts global features, the tactile mechanism complements it by extracting local features. The lack of coupled visuotactile datasets in the literature is a challenge for studying multimodal fusion strategies analogous to human texture perception. This paper presents a visual dataset that augments an existing tactile dataset. We propose a novel deep fusion architecture that fuses visual and tactile data using four types of fusion strategies: summation, concatenation, max-pooling, and attention. Our model shows significant performance improvements (97.22%) in surface roughness classification accuracy over tactile only (SVM - 92.60%) and visual only (FENet-50 - 85.01%) architectures. Among the several fusion techniques, attention-guided architecture results in better classification accuracy. Our study shows that analogous to human texture perception, the proposed model chooses a weighted combination of the two modalities (visual and tactile), thus resulting in higher surface roughness classification accuracy; and it chooses to maximize the weightage of the tactile modality where the visual modality fails and vice-versa.
    Information-Theoretic Characterization of the Generalization Error for Iterative Semi-Supervised Learning. (arXiv:2110.00926v4 [cs.LG] UPDATED)
    Using information-theoretic principles, we consider the generalization error (gen-error) of iterative semi-supervised learning (SSL) algorithms that iteratively generate pseudo-labels for a large amount of unlabelled data to progressively refine the model parameters. In contrast to most previous works that {\em bound} the gen-error, we provide an {\em exact} expression for the gen-error and particularize it to the binary Gaussian mixture model. Our theoretical results suggest that when the class conditional variances are not too large, the gen-error decreases with the number of iterations, but quickly saturates. On the flip side, if the class conditional variances (and so amount of overlap between the classes) are large, the gen-error increases with the number of iterations. To mitigate this undesirable effect, we show that regularization can reduce the gen-error. The theoretical results are corroborated by extensive experiments on the MNIST and CIFAR datasets in which we notice that for easy-to-distinguish classes, the gen-error improves after several pseudo-labelling iterations, but saturates afterwards, and for more difficult-to-distinguish classes, regularization improves the generalization performance.
    DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs. (arXiv:2209.08646v1 [cs.LG])
    We consider the problem of learning the optimal threshold policy for control problems. Threshold policies make control decisions by evaluating whether an element of the system state exceeds a certain threshold, whose value is determined by other elements of the system state. By leveraging the monotone property of threshold policies, we prove that their policy gradients have a surprisingly simple expression. We use this simple expression to build an off-policy actor-critic algorithm for learning the optimal threshold policy. Simulation results show that our policy significantly outperforms other reinforcement learning algorithms due to its ability to exploit the monotone property. In addition, we show that the Whittle index, a powerful tool for restless multi-armed bandit problems, is equivalent to the optimal threshold policy for an alternative problem. This observation leads to a simple algorithm that finds the Whittle index by learning the optimal threshold policy in the alternative problem. Simulation results show that our algorithm learns the Whittle index much faster than several recent studies that learn the Whittle index through indirect means.
    Sample-Efficient Multi-Agent Reinforcement Learning with Demonstrations for Flocking Control. (arXiv:2209.08351v1 [cs.LG])
    Flocking control is a significant problem in multi-agent systems such as multi-agent unmanned aerial vehicles and multi-agent autonomous underwater vehicles, which enhances the cooperativity and safety of agents. In contrast to traditional methods, multi-agent reinforcement learning (MARL) solves the problem of flocking control more flexibly. However, methods based on MARL suffer from sample inefficiency, since they require a huge number of experiences to be collected from interactions between agents and the environment. We propose a novel method Pretraining with Demonstrations for MARL (PwD-MARL), which can utilize non-expert demonstrations collected in advance with traditional methods to pretrain agents. During the process of pretraining, agents learn policies from demonstrations by MARL and behavior cloning simultaneously, and are prevented from overfitting demonstrations. By pretraining with non-expert demonstrations, PwD-MARL improves sample efficiency in the process of online MARL with a warm start. Experiments show that PwD-MARL improves sample efficiency and policy performance in the problem of flocking control, even with bad or few demonstrations.
    Real-time Outdoor Localization Using Radio Maps: A Deep Learning Approach. (arXiv:2106.12556v3 [cs.LG] UPDATED)
    Global Navigation Satellite Systems typically perform poorly in urban environments, where the likelihood of line-of-sight conditions between the devices and the satellites is low, and thus alternative localization methods are required for good accuracy. We present LocUNet: A convolutional, end-to-end trained neural network for the localization task, able to estimate the position of a user from the received signal strength (RSS) from a small number of Base Stations (BSs). In the proposed method, the user to be localized simply reports the measured RSS to a central processing unit, which may be located in the cloud. Using estimations of pathloss radio maps of the BSs and the RSS measurements, LocUNet can localize users with state-of-the-art accuracy and enjoys high robustness to inaccuracies in the estimations of radio maps. The proposed method does not require pre-sampling of new environments and is suitable for real-time applications. Moreover, two novel datasets that allow for numerical evaluations of RSS and ToA methods in realistic urban environments are presented and made publicly available for the research community. By using these datasets, we also provide a fair comparison of state-of-the-art RSS and ToA-based methods in the dense urban scenario and show numerically that LocUNet outperforms all the compared methods.
    Distribution inference risks: Identifying and mitigating sources of leakage. (arXiv:2209.08541v1 [cs.CR])
    A large body of work shows that machine learning (ML) models can leak sensitive or confidential information about their training data. Recently, leakage due to distribution inference (or property inference) attacks is gaining attention. In this attack, the goal of an adversary is to infer distributional information about the training data. So far, research on distribution inference has focused on demonstrating successful attacks, with little attention given to identifying the potential causes of the leakage and to proposing mitigations. To bridge this gap, as our main contribution, we theoretically and empirically analyze the sources of information leakage that allows an adversary to perpetrate distribution inference attacks. We identify three sources of leakage: (1) memorizing specific information about the $\mathbb{E}[Y|X]$ (expected label given the feature values) of interest to the adversary, (2) wrong inductive bias of the model, and (3) finiteness of the training data. Next, based on our analysis, we propose principled mitigation techniques against distribution inference attacks. Specifically, we demonstrate that causal learning techniques are more resilient to a particular type of distribution inference risk termed distributional membership inference than associative learning methods. And lastly, we present a formalization of distribution inference that allows for reasoning about more general adversaries than was previously possible.
    Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks. (arXiv:2109.10312v2 [cs.RO] UPDATED)
    In this paper, we study the problem of learning a repertoire of low-level skills from raw images that can be sequenced to complete long-horizon visuomotor tasks. Reinforcement learning (RL) is a promising approach for acquiring short-horizon skills autonomously. However, the focus of RL algorithms has largely been on the success of those individual skills, more so than learning and grounding a large repertoire of skills that can be sequenced to complete extended multi-stage tasks. The latter demands robustness and persistence, as errors in skills can compound over time, and may require the robot to have a number of primitive skills in its repertoire, rather than just one. To this end, we introduce EMBER, a model-based RL method for learning primitive skills that are suitable for completing long-horizon visuomotor tasks. EMBER learns and plans using a learned model, critic, and success classifier, where the success classifier serves both as a reward function for RL and as a grounding mechanism to continuously detect if the robot should retry a skill when unsuccessful or under perturbations. Further, the learned model is task-agnostic and trained using data from all skills, enabling the robot to efficiently learn a number of distinct primitives. These visuomotor primitive skills and their associated pre- and post-conditions can then be directly combined with off-the-shelf symbolic planners to complete long-horizon tasks. On a Franka Emika robot arm, we find that EMBER enables the robot to complete three long-horizon visuomotor tasks at 85% success rate, such as organizing an office desk, a file cabinet, and drawers, which require sequencing up to 12 skills, involve 14 unique learned primitives, and demand generalization to novel objects.
    Deep Labeling of fMRI Brain Networks Using Cloud Based Processing. (arXiv:2209.08200v1 [cs.LG])
    Resting state fMRI is an imaging modality which reveals brain activity localization through signal changes, in what is known as Resting State Networks (RSNs). This technique is gaining popularity in neurosurgical pre-planning to visualize the functional regions and assess regional activity. Labeling of rs-fMRI networks require subject-matter expertise and is time consuming, creating a need for an automated classification algorithm. While the impact of AI in medical diagnosis has shown great progress; deploying and maintaining these in a clinical setting is an unmet need. We propose an end-to-end reproducible pipeline which incorporates image processing of rs-fMRI in a cloud-based workflow while using deep learning to automate the classification of RSNs. We have architected a reproducible Azure Machine Learning cloud-based medical imaging concept pipeline for fMRI analysis integrating the popular FMRIB Software Library (FSL) toolkit. To demonstrate a clinical application using a large dataset, we compare three neural network architectures for classification of deeper RSNs derived from processed rs-fMRI. The three algorithms are: an MLP, a 2D projection-based CNN, and a fully 3D CNN classification networks. Each of the net-works was trained on the rs-fMRI back-projected independent components giving >98% accuracy for each classification method.
    Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration. (arXiv:2102.12182v2 [cs.LG] UPDATED)
    We address the problem of uncertainty calibration and introduce a novel calibration method, Parametrized Temperature Scaling (PTS). Standard deep neural networks typically yield uncalibrated predictions, which can be transformed into calibrated confidence scores using post-hoc calibration methods. In this contribution, we demonstrate that the performance of accuracy-preserving state-of-the-art post-hoc calibrators is limited by their intrinsic expressive power. We generalize temperature scaling by computing prediction-specific temperatures, parameterized by a neural network. We show with extensive experiments that our novel accuracy-preserving approach consistently outperforms existing algorithms across a large number of model architectures, datasets and metrics.
    Algorithmic Challenges in Ensuring Fairness at the Time of Decision. (arXiv:2103.09287v2 [cs.LG] UPDATED)
    Algorithmic decision-making in societal contexts, such as retail pricing, loan administration, recommendations on online platforms, etc., often involves experimentation with decisions for the sake of learning, which results in perceptions of unfairness among people impacted by these decisions. It is hence necessary to embed appropriate notions of fairness in such decision-making processes. The goal of this paper is to highlight the rich interface between temporal notions of fairness and online decision-making through a novel meta-objective of ensuring fairness at the time of decision. Given some arbitrary comparative fairness notion for static decision-making (e.g., students should pay at most 90% of the general adult price), a corresponding online decision-making algorithm satisfies fairness at the time of decision if the said notion of fairness is satisfied for any entity receiving a decision in comparison to all the past decisions. We show that this basic requirement introduces new methodological challenges in online decision-making. We illustrate the novel approaches necessary to address these challenges in the context of stochastic convex optimization with bandit feedback under a comparative fairness constraint that imposes lower bounds on the decisions received by entities depending on the decisions received by everyone in the past. The paper showcases novel research opportunities in online decision-making stemming from temporal fairness concerns.
    Interpreting Distributional Reinforcement Learning: A Regularization Perspective. (arXiv:2110.03155v4 [cs.LG] UPDATED)
    Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the whole distribution of the total return rather than only its expectation. Despite the remarkable performance of distributional RL, a theoretical understanding of its advantages over expectation-based RL remains elusive. In this paper, we attribute the superiority of distributional RL to its regularization effect in terms of the value distribution information regardless of its expectation. Firstly, by leverage of a variant of the gross error model in robust statistics, we decompose the value distribution into its expectation and the remaining distribution part. As such, the extra benefit of distributional RL compared with expectation-based RL is mainly interpreted as the impact of a \textit{risk-sensitive entropy regularization} within the Neural Fitted Z-Iteration framework. Meanwhile, we establish a bridge between the risk-sensitive entropy regularization of distributional RL and the vanilla entropy in maximum entropy RL, focusing specifically on actor-critic algorithms. It reveals that distributional RL induces a corrected reward function and thus promotes a risk-sensitive exploration against the intrinsic uncertainty of the environment. Finally, extensive experiments corroborate the role of the regularization effect of distributional RL and uncover mutual impacts of different entropy regularization. Our research paves a way towards better interpreting the efficacy of distributional RL algorithms, especially through the lens of regularization.
    KNOT: Knowledge Distillation using Optimal Transport for Solving NLP Tasks. (arXiv:2110.02432v2 [cs.CL] UPDATED)
    We propose a new approach, Knowledge Distillation using Optimal Transport (KNOT), to distill the natural language semantic knowledge from multiple teacher networks to a student network. KNOT aims to train a (global) student model by learning to minimize the optimal transport cost of its assigned probability distribution over the labels to the weighted sum of probabilities predicted by the (local) teacher models, under the constraints, that the student model does not have access to teacher models' parameters or training data. To evaluate the quality of knowledge transfer, we introduce a new metric, Semantic Distance (SD), that measures semantic closeness between the predicted and ground truth label distributions. The proposed method shows improvements in the global model's SD performance over the baseline across three NLP tasks while performing on par with Entropy-based distillation on standard accuracy and F1 metrics. The implementation pertaining to this work is publicly available at: https://github.com/declare-lab/KNOT.
    Unveil the unseen: Exploit information hidden in noise. (arXiv:2209.08376v1 [cs.LG])
    Noise and uncertainty are usually the enemy of machine learning, noise in training data leads to uncertainty and inaccuracy in the predictions. However, we develop a machine learning architecture that extracts crucial information out of the noise itself to improve the predictions. The phenomenology computes and then utilizes uncertainty in one target variable to predict a second target variable. We apply this formalism to PbZr$_{0.7}$Sn$_{0.3}$O$_{3}$ crystal, using the uncertainty in dielectric constant to extrapolate heat capacity, correctly predicting a phase transition that otherwise cannot be extrapolated. For the second example -- single-particle diffraction of droplets -- we utilize the particle count together with its uncertainty to extrapolate the ground truth diffraction amplitude, delivering better predictions than when we utilize only the particle count. Our generic formalism enables the exploitation of uncertainty in machine learning, which has a broad range of applications in the physical sciences and beyond.
    Improving the Performance of DNN-based Software Services using Automated Layer Caching. (arXiv:2209.08625v1 [cs.LG])
    Deep Neural Networks (DNNs) have become an essential component in many application domains including web-based services. A variety of these services require high throughput and (close to) real-time features, for instance, to respond or react to users' requests or to process a stream of incoming data on time. However, the trend in DNN design is toward larger models with many layers and parameters to achieve more accurate results. Although these models are often pre-trained, the computational complexity in such large models can still be relatively significant, hindering low inference latency. Implementing a caching mechanism is a typical systems engineering solution for speeding up a service response time. However, traditional caching is often not suitable for DNN-based services. In this paper, we propose an end-to-end automated solution to improve the performance of DNN-based services in terms of their computational complexity and inference latency. Our caching method adopts the ideas of self-distillation of DNN models and early exits. The proposed solution is an automated online layer caching mechanism that allows early exiting of a large model during inference time if the cache model in one of the early exits is confident enough for final prediction. One of the main contributions of this paper is that we have implemented the idea as an online caching, meaning that the cache models do not need access to training data and perform solely based on the incoming data at run-time, making it suitable for applications using pre-trained models. Our experiments results on two downstream tasks (face and object classification) show that, on average, caching can reduce the computational complexity of those services up to 58\% (in terms of FLOPs count) and improve their inference latency up to 46\% with low to zero reduction in accuracy.
    Automated Segmentation and Recurrence Risk Prediction of Surgically Resected Lung Tumors with Adaptive Convolutional Neural Networks. (arXiv:2209.08423v1 [cs.CV])
    Lung cancer is the leading cause of cancer related mortality by a significant margin. While new technologies, such as image segmentation, have been paramount to improved detection and earlier diagnoses, there are still significant challenges in treating the disease. In particular, despite an increased number of curative resections, many postoperative patients still develop recurrent lesions. Consequently, there is a significant need for prognostic tools that can more accurately predict a patient's risk for recurrence. In this paper, we explore the use of convolutional neural networks (CNNs) for the segmentation and recurrence risk prediction of lung tumors that are present in preoperative computed tomography (CT) images. First, expanding upon recent progress in medical image segmentation, a residual U-Net is used to localize and characterize each nodule. Then, the identified tumors are passed to a second CNN for recurrence risk prediction. The system's final results are produced with a random forest classifier that synthesizes the predictions of the second network with clinical attributes. The segmentation stage uses the LIDC-IDRI dataset and achieves a dice score of 70.3%. The recurrence risk stage uses the NLST dataset from the National Cancer institute and achieves an AUC of 73.0%. Our proposed framework demonstrates that first, automated nodule segmentation methods can generalize to enable pipelines for a wide range of multitask systems and second, that deep learning and image processing have the potential to improve current prognostic tools. To the best of our knowledge, it is the first fully automated segmentation and recurrence risk prediction system.
    The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning. (arXiv:2209.08622v1 [cs.LG])
    Self-supervised learning (SSL) has emerged as a desirable paradigm in computer vision due to the inability of supervised models to learn representations that can generalize in domains with limited labels. The recent popularity of SSL has led to the development of several models that make use of diverse training strategies, architectures, and data augmentation policies with no existing unified framework to study or assess their effectiveness in transfer learning. We propose a data-driven geometric strategy to analyze different SSL models using local neighborhoods in the feature space induced by each. Unlike existing approaches that consider mathematical approximations of the parameters, individual components, or optimization landscape, our work aims to explore the geometric properties of the representation manifolds learned by SSL models. Our proposed manifold graph metrics (MGMs) provide insights into the geometric similarities and differences between available SSL models, their invariances with respect to specific augmentations, and their performances on transfer learning tasks. Our key findings are two fold: (i) contrary to popular belief, the geometry of SSL models is not tied to its training paradigm (contrastive, non-contrastive, and cluster-based); (ii) we can predict the transfer learning capability for a specific model based on the geometric properties of its semantic and augmentation manifolds.
    Advertising Media and Target Audience Optimization via High-dimensional Bandits. (arXiv:2209.08403v1 [cs.LG])
    We present a data-driven algorithm that advertisers can use to automate their digital ad-campaigns at online publishers. The algorithm enables the advertiser to search across available target audiences and ad-media to find the best possible combination for its campaign via online experimentation. The problem of finding the best audience-ad combination is complicated by a number of distinctive challenges, including (a) a need for active exploration to resolve prior uncertainty and to speed the search for profitable combinations, (b) many combinations to choose from, giving rise to high-dimensional search formulations, and (c) very low success probabilities, typically just a fraction of one percent. Our algorithm (designated LRDL, an acronym for Logistic Regression with Debiased Lasso) addresses these challenges by combining four elements: a multiarmed bandit framework for active exploration; a Lasso penalty function to handle high dimensionality; an inbuilt debiasing kernel that handles the regularization bias induced by the Lasso; and a semi-parametric regression model for outcomes that promotes cross-learning across arms. The algorithm is implemented as a Thompson Sampler, and to the best of our knowledge, it is the first that can practically address all of the challenges above. Simulations with real and synthetic data show the method is effective and document its superior performance against several benchmarks from the recent high-dimensional bandit literature.
    Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions. (arXiv:2209.08554v1 [cs.LG])
    Pruning is one of the predominant approaches for compressing deep neural networks (DNNs). Lately, coresets (provable data summarizations) were leveraged for pruning DNNs, adding the advantage of theoretical guarantees on the trade-off between the compression rate and the approximation error. However, coresets in this domain were either data-dependent or generated under restrictive assumptions on both the model's weights and inputs. In real-world scenarios, such assumptions are rarely satisfied, limiting the applicability of coresets. To this end, we suggest a novel and robust framework for computing such coresets under mild assumptions on the model's weights and without any assumption on the training data. The idea is to compute the importance of each neuron in each layer with respect to the output of the following layer. This is achieved by a combination of L\"{o}wner ellipsoid and Caratheodory theorem. Our method is simultaneously data-independent, applicable to various networks and datasets (due to the simplified assumptions), and theoretically supported. Experimental results show that our method outperforms existing coreset based neural pruning approaches across a wide range of networks and datasets. For example, our method achieved a $62\%$ compression rate on ResNet50 on ImageNet with $1.09\%$ drop in accuracy.
    Low-skilled Occupations Face the Highest Re-skilling Pressure. (arXiv:2101.11505v3 [cs.CY] UPDATED)
    Substantial scholarship has estimated the susceptibility of jobs to automation, but little has examined how job contents evolve in the information age as new technologies substitute for tasks, shifting required skills rather than eliminating entire jobs. Here we explore the patterns and consequences of changes in occupational skill contents and characterize occupations and workers subject to the greatest re-skilling pressure. Recent research suggests that high-skilled STEM and technology-intensive occupations have experienced the highest rates of skill content change. Analyzing 727 occupations across 167 million job posts covering the near-universe of the U.S. online labor market between 2010 and 2018, we find that when skill distance is accounted for, re-skilling pressure is much higher for low-skilled occupations, no matter how "low-skill: is defined, either by skill number, pay level, or education degree. We investigate the implications of uneven occupational skill change on workers and find that those from large labor markets and large employers experienced less change, while non-white males in low-skill jobs are the most demographically vulnerable. We conclude by discussing the broad potential of our skill embedding model, which learns skill proximity from skill co-presence across job posts and represents it as distance in the high-dimensional space of complex human capital that corresponds with skilling costs for workers. This model offers a fine-grained measure of the extent to which jobs evolve, and also indicates in what direction job are evolving, as illustrated by the decline in demand for human-interface skills and the rise for those at the machine-interface.
    Membership Inference Attacks and Generalization: A Causal Perspective. (arXiv:2209.08615v1 [cs.LG])
    Membership inference (MI) attacks highlight a privacy weakness in present stochastic training methods for neural networks. It is not well understood, however, why they arise. Are they a natural consequence of imperfect generalization only? Which underlying causes should we address during training to mitigate these attacks? Towards answering such questions, we propose the first approach to explain MI attacks and their connection to generalization based on principled causal reasoning. We offer causal graphs that quantitatively explain the observed MI attack performance achieved for $6$ attack variants. We refute several prior non-quantitative hypotheses that over-simplify or over-estimate the influence of underlying causes, thereby failing to capture the complex interplay between several factors. Our causal models also show a new connection between generalization and MI attacks via their shared causal factors. Our causal models have high predictive power ($0.90$), i.e., their analytical predictions match with observations in unseen experiments often, which makes analysis via them a pragmatic alternative.
    Mapping the Structure and Evolution of Software Testing Research Over the Past Three Decades. (arXiv:2109.04086v4 [cs.DL] UPDATED)
    Background: The field of software testing is growing and rapidly-evolving. Aims: Based on keywords assigned to publications, we seek to identify predominant research topics and understand how they are connected and have evolved. Method: We apply co-word analysis to map the topology of testing research as a network where author-assigned keywords are connected by edges indicating co-occurrence in publications. Keywords are clustered based on edge density and frequency of connection. We examine the most popular keywords, summarize clusters into high-level research topics, examine how topics connect, and examine how the field is changing. Results: Testing research can be divided into 16 high-level topics and 18 subtopics. Creation guidance, automated test generation, evolution and maintenance, and test oracles have particularly strong connections to other topics, highlighting their multidisciplinary nature. Emerging keywords relate to web and mobile apps, machine learning, energy consumption, automated program repair and test generation, while emerging connections have formed between web apps, test oracles, and machine learning with many topics. Random and requirements-based testing show potential decline. Conclusions: Our observations, advice, and map data offer a deeper understanding of the field and inspiration regarding challenges and connections to explore.
    An Empathetic AI Coach for Self-Attachment Therapy. (arXiv:2209.08316v1 [cs.AI])
    In this work, we present a new dataset and a computational strategy for a digital coach that aims to guide users in practicing the protocols of self-attachment therapy. Our framework augments a rule-based conversational agent with a deep-learning classifier for identifying the underlying emotion in a user's text response, as well as a deep-learning assisted retrieval method for producing novel, fluent and empathetic utterances. We also craft a set of human-like personas that users can choose to interact with. Our goal is to achieve a high level of engagement during virtual therapy sessions. We evaluate the effectiveness of our framework in a non-clinical trial with N=16 participants, all of whom have had at least four interactions with the agent over the course of five days. We find that our platform is consistently rated higher for empathy, user engagement and usefulness than the simple rule-based framework. Finally, we provide guidelines to further improve the design and performance of the application, in accordance with the feedback received.
    Introspective Learning : A Two-Stage Approach for Inference in Neural Networks. (arXiv:2209.08425v1 [cs.LG])
    In this paper, we advocate for two stages in a neural network's decision making process. The first is the existing feed-forward inference framework where patterns in given data are sensed and associated with previously learned patterns. The second stage is a slower reflection stage where we ask the network to reflect on its feed-forward decision by considering and evaluating all available choices. Together, we term the two stages as introspective learning. We use gradients of trained neural networks as a measurement of this reflection. A simple three-layered Multi Layer Perceptron is used as the second stage that predicts based on all extracted gradient features. We perceptually visualize the post-hoc explanations from both stages to provide a visual grounding to introspection. For the application of recognition, we show that an introspective network is 4% more robust and 42% less prone to calibration errors when generalizing to noisy data. We also illustrate the value of introspective networks in downstream tasks that require generalizability and calibration including active learning, out-of-distribution detection, and uncertainty estimation. Finally, we ground the proposed machine introspection to human introspection for the application of image quality assessment.
    On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means. (arXiv:1904.04017v4 [cs.IT] UPDATED)
    The Jensen-Shannon divergence is a renown bounded symmetrization of the unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler divergence to the average mixture distribution. However the Jensen-Shannon divergence between Gaussian distributions is not available in closed-form. To bypass this problem, we present a generalization of the Jensen-Shannon (JS) divergence using abstract means which yields closed-form expressions when the mean is chosen according to the parametric family of distributions. More generally, we define the JS-symmetrizations of any distance using generalized statistical mixtures derived from abstract means. In particular, we first show that the geometric mean is well-suited for exponential families, and report two closed-form formula for (i) the geometric Jensen-Shannon divergence between probability densities of the same exponential family, and (ii) the geometric JS-symmetrization of the reverse Kullback-Leibler divergence. As a second illustrating example, we show that the harmonic mean is well-suited for the scale Cauchy distributions, and report a closed-form formula for the harmonic Jensen-Shannon divergence between scale Cauchy distributions. We also define generalized Jensen-Shannon divergences between matrices (e.g., quantum Jensen-Shannon divergences) and consider clustering with respect to these novel Jensen-Shannon divergences.
    Selective Token Generation for Few-shot Natural Language Generation. (arXiv:2209.08206v1 [cs.CL])
    Natural language modeling with limited training data is a challenging problem, and many algorithms make use of large-scale pretrained language models (PLMs) for this due to its great generalization ability. Among them, additive learning that incorporates a task-specific adapter on top of the fixed large-scale PLM has been popularly used in the few-shot setting. However, this added adapter is still easy to disregard the knowledge of the PLM especially for few-shot natural language generation (NLG) since an entire sequence is usually generated by only the newly trained adapter. Therefore, in this work, we develop a novel additive learning algorithm based on reinforcement learning (RL) that selectively outputs language tokens between the task-general PLM and the task-specific adapter during both training and inference. This output token selection over the two generators allows the adapter to take into account solely the task-relevant parts in sequence generation, and therefore makes it more robust to overfitting as well as more stable in RL training. In addition, to obtain the complementary adapter from the PLM for each few-shot task, we exploit a separate selecting module that is also simultaneously trained using RL. Experimental results on various few-shot NLG tasks including question answering, data-to-text generation and text summarization demonstrate that the proposed selective token generation significantly outperforms the previous additive learning algorithms based on the PLMs.
    A provably stable neural network Turing Machine. (arXiv:2006.03651v4 [cs.LG] UPDATED)
    We introduce a neural stack architecture, including a differentiable parametrized stack operator that approximates stack push and pop operations for suitable choices of parameters that explicitly represents a stack. We prove the stability of this stack architecture: after arbitrarily many stack operations, the state of the neural stack still closely resembles the state of the discrete stack. Using the neural stack with a recurrent neural network, we introduce a neural network Pushdown Automaton (nnPDA) and prove that nnPDA with finite/bounded neurons and time can simulate any PDA. Furthermore, we extend our construction and propose new architecture neural state Turing Machine (nnTM). We prove that differentiable nnTM with bounded neurons can simulate Turing Machine (TM) in real-time. Just like the neural stack, these architectures are also stable. Finally, we extend our construction to show that differentiable nnTM is equivalent to Universal Turing Machine (UTM) and can simulate any TM with only \textbf{seven finite/bounded precision} neurons. This work provides a new theoretical bound for the computational capability of bounded precision RNNs augmented with memory.
    Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback. (arXiv:2205.13451v2 [cs.LG] UPDATED)
    We consider regret minimization for Adversarial Markov Decision Processes (AMDPs), where the loss functions are changing over time and adversarially chosen, and the learner only observes the losses for the visited state-action pairs (i.e., bandit feedback). While there has been a surge of studies on this problem using Online-Mirror-Descent (OMD) methods, very little is known about the Follow-the-Perturbed-Leader (FTPL) methods, which are usually computationally more efficient and also easier to implement since it only requires solving an offline planning problem. Motivated by this, we take a closer look at FTPL for learning AMDPs, starting from the standard episodic finite-horizon setting. We find some unique and intriguing difficulties in the analysis and propose a workaround to eventually show that FTPL is also able to achieve near-optimal regret bounds in this case. More importantly, we then find two significant applications: First, the analysis of FTPL turns out to be readily generalizable to delayed bandit feedback with order-optimal regret, while OMD methods exhibit extra difficulties (Jin et al., 2022). Second, using FTPL, we also develop the first no-regret algorithm for learning communicating AMDPs in the infinite-horizon setting with bandit feedback and stochastic transitions. Our algorithm is efficient assuming access to an offline planning oracle, while even for the easier full-information setting, the only existing algorithm (Chandrasekaran and Tewari, 2021) is computationally inefficient.
    MMSR: Multiple-Model Learned Image Super-Resolution Benefiting From Class-Specific Image Priors. (arXiv:2209.08568v1 [cs.CV])
    Assuming a known degradation model, the performance of a learned image super-resolution (SR) model depends on how well the variety of image characteristics within the training set matches those in the test set. As a result, the performance of an SR model varies noticeably from image to image over a test set depending on whether characteristics of specific images are similar to those in the training set or not. Hence, in general, a single SR model cannot generalize well enough for all types of image content. In this work, we show that training multiple SR models for different classes of images (e.g., for text, texture, etc.) to exploit class-specific image priors and employing a post-processing network that learns how to best fuse the outputs produced by these multiple SR models surpasses the performance of state-of-the-art generic SR models. Experimental results clearly demonstrate that the proposed multiple-model SR (MMSR) approach significantly outperforms a single pre-trained state-of-the-art SR model both quantitatively and visually. It even exceeds the performance of the best single class-specific SR model trained on similar text or texture images.
    Hierarchical fuzzy neural networks with privacy preservation for heterogeneous big data. (arXiv:2209.08467v1 [cs.LG])
    Heterogeneous big data poses many challenges in machine learning. Its enormous scale, high dimensionality, and inherent uncertainty make almost every aspect of machine learning difficult, from providing enough processing power to maintaining model accuracy to protecting privacy. However, perhaps the most imposing problem is that big data is often interspersed with sensitive personal data. Hence, we propose a privacy-preserving hierarchical fuzzy neural network (PP-HFNN) to address these technical challenges while also alleviating privacy concerns. The network is trained with a two-stage optimization algorithm, and the parameters at low levels of the hierarchy are learned with a scheme based on the well-known alternating direction method of multipliers, which does not reveal local data to other agents. Coordination at high levels of the hierarchy is handled by the alternating optimization method, which converges very quickly. The entire training procedure is scalable, fast and does not suffer from gradient vanishing problems like the methods based on back-propagation. Comprehensive simulations conducted on both regression and classification tasks demonstrate the effectiveness of the proposed model.
    Corpus for Automatic Structuring of Legal Documents. (arXiv:2201.13125v2 [cs.CL] UPDATED)
    In populous countries, pending legal cases have been growing exponentially. There is a need for developing techniques for processing and organizing legal documents. In this paper, we introduce a new corpus for structuring legal documents. In particular, we introduce a corpus of legal judgment documents in English that are segmented into topical and coherent parts. Each of these parts is annotated with a label coming from a list of pre-defined Rhetorical Roles. We develop baseline models for automatically predicting rhetorical roles in a legal document based on the annotated corpus. Further, we show the application of rhetorical roles to improve performance on the tasks of summarization and legal judgment prediction. We release the corpus and baseline model code along with the paper.
    Online Regenerative Learning. (arXiv:2209.08657v1 [math.OC])
    We study a type of Online Linear Programming (OLP) problem that maximizes the objective function with stochastic inputs. The performance of various algorithms that analyze this type of OLP is well studied when the stochastic inputs follow some i.i.d distribution. The two central questions to ask are: (i) can the algorithms achieve the same efficiency if the stochastic inputs are not i.i.d but still stationary, and (ii) how can we modify our algorithms if we know the stochastic inputs are trendy, hence not stationary. We answer the first question by analyzing a regenerative type of input and show the regret of two popular algorithms are bounded by the same order as their i.i.d counterpart. We discuss the second question in the context of linearly growing inputs and propose two trend-adaptive algorithms. We provide numerical simulations to illustrate the performance of our algorithms under both regenerative and trendy inputs.
    Interrelation of equivariant Gaussian processes and convolutional neural networks. (arXiv:2209.08371v1 [cs.LG])
    Currently there exists rather promising new trend in machine leaning (ML) based on the relationship between neural networks (NN) and Gaussian processes (GP), including many related subtopics, e.g., signal propagation in NNs, theoretical derivation of learning curve for NNs, QFT methods in ML, etc. An important feature of convolutional neural networks (CNN) is their equivariance (consistency) with respect to the symmetry transformations of the input data. In this work we establish a relationship between the many-channel limit for CNNs equivariant with respect to two-dimensional Euclidean group with vector-valued neuron activations and the corresponding independently introduced equivariant Gaussian processes (GP).
    Quantum Vision Transformers. (arXiv:2209.08167v1 [quant-ph])
    We design and analyse quantum transformers, extending the state-of-the-art classical transformer neural network architectures known to be very performant in natural language processing and image analysis. Building upon the previous work of parametrised quantum circuits for data loading and orthogonal neural layers, we introduce three quantum attention mechanisms, including a quantum transformer based on compound matrices. These quantum architectures can be built using shallow quantum circuits and can provide qualitatively different classification models. We performed extensive simulations of the quantum transformers on standard medical image datasets that showed competitive, and at times better, performance compared with the best classical transformers and other classical benchmarks. The computational complexity of our quantum attention layer proves to be advantageous compared with the classical algorithm with respect to the size of the classified images. Our quantum architectures have thousands of parameters compared with the best classical methods with millions of parameters. Finally, we have implemented our quantum transformers on superconducting quantum computers and obtained encouraging results for up to six qubit experiments.
    Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning. (arXiv:2209.08483v1 [cs.LG])
    This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available at https://github.com/tencent-ailab/hok_env . The documentation is available at https://aiarena.tencent.com/hok/doc/ .
    Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models. (arXiv:2209.08141v1 [cs.CL])
    Probabilistic models of language understanding are interpretable and structured, for instance models of metaphor understanding describe inference about latent topics and features. However, these models are manually designed for a specific task. Large language models (LLMs) can perform many tasks through in-context learning, but they lack the clear structure of probabilistic models. In this paper, we use chain-of-thought prompts to introduce structures from probabilistic models into LLMs. These prompts lead the model to infer latent variables and reason about their relationships to choose appropriate paraphrases for metaphors. The latent variables and relationships chosen are informed by theories of metaphor understanding from cognitive psychology. We apply these prompts to the two largest versions of GPT-3 and show that they can improve paraphrase selection.
    Multi-channel Nuclear Norm Minus Frobenius Norm Minimization for Color Image Denoising. (arXiv:2209.08094v1 [cs.CV])
    Color image denoising is frequently encountered in various image processing and computer vision tasks. One traditional strategy is to convert the RGB image to a less correlated color space and denoise each channel of the new space separately. However, such a strategy can not fully exploit the correlated information between channels and is inadequate to obtain satisfactory results. To address this issue, this paper proposes a new multi-channel optimization model for color image denoising under the nuclear norm minus Frobenius norm minimization framework. Specifically, based on the block-matching, the color image is decomposed into overlapping RGB patches. For each patch, we stack its similar neighbors to form the corresponding patch matrix. The proposed model is performed on the patch matrix to recover its noise-free version. During the recovery process, a) a weight matrix is introduced to fully utilize the noise difference between channels; b) the singular values are shrunk adaptively without additionally assigning weights. With them, the proposed model can achieve promising results while keeping simplicity. To solve the proposed model, an accurate and effective algorithm is built based on the alternating direction method of multipliers framework. The solution of each updating step can be analytically expressed in closed-from. Rigorous theoretical analysis proves the solution sequences generated by the proposed algorithm converge to their respective stationary points. Experimental results on both synthetic and real noise datasets demonstrate the proposed model outperforms state-of-the-art models.
    Koopman-theoretic Approach for Identification of Exogenous Anomalies in Nonstationary Time-series Data. (arXiv:2209.08618v1 [cs.LG])
    In many scenarios, it is necessary to monitor a complex system via a time-series of observations and determine when anomalous exogenous events have occurred so that relevant actions can be taken. Determining whether current observations are abnormal is challenging. It requires learning an extrapolative probabilistic model of the dynamics from historical data, and using a limited number of current observations to make a classification. We leverage recent advances in long-term probabilistic forecasting, namely {\em Deep Probabilistic Koopman}, to build a general method for classifying anomalies in multi-dimensional time-series data. We also show how to utilize models with domain knowledge of the dynamics to reduce type I and type II error. We demonstrate our proposed method on the important real-world task of global atmospheric pollution monitoring, integrating it with NASA's Global Earth System Model. The system successfully detects localized anomalies in air quality due to events such as COVID-19 lockdowns and wildfires.
    An $l_1$-oracle inequality for the Lasso in high-dimensional mixtures of experts models. (arXiv:2009.10622v4 [math.ST] UPDATED)
    Mixtures of experts (MoE) models are a popular framework for modeling heterogeneity in data, for both regression and classification problems in statistics and machine learning, due to their flexibility and the abundance of available statistical estimation and model choice tools. Such flexibility comes from allowing the mixture weights (or gating functions) in the MoE model to depend on the explanatory variables, along with the experts (or component densities). This permits the modeling of data arising from more complex data generating processes when compared to the classical finite mixtures and finite mixtures of regression models, whose mixing parameters are independent of the covariates. The use of MoE models in a high-dimensional setting, when the number of explanatory variables can be much larger than the sample size, is challenging from a computational point of view, and in particular from a theoretical point of view, where the literature is still lacking results for dealing with the curse of dimensionality, for both the statistical estimation and feature selection problems. We consider the finite MoE model with soft-max gating functions and Gaussian experts for high-dimensional regression on heterogeneous data, and its $l_1$-regularized estimation via the Lasso. We focus on the Lasso estimation properties rather than its feature selection properties. We provide a lower bound on the regularization parameter of the Lasso function that ensures an $l_1$-oracle inequality satisfied by the Lasso estimator according to the Kullback--Leibler loss.
    Multi-modal Attention Network for Stock Movements Prediction. (arXiv:2112.13593v4 [cs.LG] UPDATED)
    Stock prices move as piece-wise trending fluctuation rather than a purely random walk. Traditionally, the prediction of future stock movements is based on the historical trading record. Nowadays, with the development of social media, many active participants in the market choose to publicize their strategies, which provides a window to glimpse over the whole market's attitude towards future movements by extracting the semantics behind social media. However, social media contains conflicting information and cannot replace historical records completely. In this work, we propose a multi-modality attention network to reduce conflicts and integrate semantic and numeric features to predict future stock movements comprehensively. Specifically, we first extract semantic information from social media and estimate their credibility based on posters' identity and public reputation. Then we incorporate the semantic from online posts and numeric features from historical records to make the trading strategy. Experimental results show that our approach outperforms previous methods by a significant margin in both prediction accuracy (61.20\%) and trading profits (9.13\%). It demonstrates that our method improves the performance of stock movements prediction and informs future research on multi-modality fusion towards stock prediction.
    Block Policy Mirror Descent. (arXiv:2201.05756v3 [cs.LG] UPDATED)
    In this paper, we present a new policy gradient (PG) methods, namely the block policy mirror descent (BPMD) method for solving a class of regularized reinforcement learning (RL) problems with (strongly)-convex regularizers. Compared to the traditional PG methods with a batch update rule, which visits and updates the policy for every state, BPMD method has cheap per-iteration computation via a partial update rule that performs the policy update on a sampled state. Despite the nonconvex nature of the problem and a partial update rule, we provide a unified analysis for several sampling schemes, and show that BPMD achieves fast linear convergence to the global optimality. In particular, uniform sampling leads to comparable worst-case total computational complexity as batch PG methods. A necessary and sufficient condition for convergence with on-policy sampling is also identified. With a hybrid sampling scheme, we further show that BPMD enjoys potential instance-dependent acceleration, leading to improved dependence on the state space and consequently outperforming batch PG methods. We then extend BPMD methods to the stochastic setting, by utilizing stochastic first-order information constructed from samples. With a generative model, $\tilde{\mathcal{O}}(\left\lvert \mathcal{S}\right\rvert \left\lvert \mathcal{A}\right\rvert /\epsilon)$ (resp. $\tilde{\mathcal{O}}(\left\lvert \mathcal{S}\right\rvert \left\lvert \mathcal{A} \right\rvert /\epsilon^2)$) sample complexities are established for the strongly-convex (resp. non-strongly-convex) regularizers, where $\epsilon$ denotes the target accuracy. To the best of our knowledge, this is the first time that block coordinate descent methods have been developed and analyzed for policy optimization in reinforcement learning, which provides a new perspective on solving large-scale RL problems.
    Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation. (arXiv:2209.08579v1 [stat.ML])
    Causal discovery for quantitative data has been extensively studied but less is known for categorical data. We propose a novel causal model for categorical data based on a new classification model, termed classification with optimal label permutation (COLP). By design, COLP is a parsimonious classifier, which gives rise to a provably identifiable causal model. A simple learning algorithm via comparing likelihood functions of causal and anti-causal models suffices to learn the causal direction. Through experiments with synthetic and real data, we demonstrate the favorable performance of the proposed COLP-based causal model compared to state-of-the-art methods. We also make available an accompanying R package COLP, which contains the proposed causal discovery algorithm and a benchmark dataset of categorical cause-effect pairs.
    Mitigating Both Covariate and Conditional Shift for Domain Generalization. (arXiv:2209.08253v1 [cs.CV])
    Domain generalization (DG) aims to learn a model on several source domains, hoping that the model can generalize well to unseen target domains. The distribution shift between domains contains the covariate shift and conditional shift, both of which the model must be able to handle for better generalizability. In this paper, a novel DG method is proposed to deal with the distribution shift via Visual Alignment and Uncertainty-guided belief Ensemble (VAUE). Specifically, for the covariate shift, a visual alignment module is designed to align the distribution of image style to a common empirical Gaussian distribution so that the covariate shift can be eliminated in the visual space. For the conditional shift, we adopt an uncertainty-guided belief ensemble strategy based on the subjective logic and Dempster-Shafer theory. The conditional distribution given a test sample is estimated by the dynamic combination of that of source domains. Comprehensive experiments are conducted to demonstrate the superior performance of the proposed method on four widely used datasets, i.e., Office-Home, VLCS, TerraIncognita, and PACS.
    Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems. (arXiv:2209.08429v1 [cs.LG])
    Recently, self-learning methods based on user satisfaction metrics and contextual bandits have shown promising results to enable consistent improvements in conversational AI systems. However, directly targeting such metrics by off-policy bandit learning objectives often increases the risk of making abrupt policy changes that break the current user experience. In this study, we introduce a scalable framework for supporting fine-grained exploration targets for individual domains via user-defined constraints. For example, we may want to ensure fewer policy deviations in business-critical domains such as shopping, while allocating more exploration budget to domains such as music. Furthermore, we present a novel meta-gradient learning approach that is scalable and practical to address this problem. The proposed method adjusts constraint violation penalty terms adaptively through a meta objective that encourages balanced constraint satisfaction across domains. We conduct extensive experiments using data from a real-world conversational AI on a set of realistic constraint benchmarks. Based on the experimental results, we demonstrate that the proposed approach is capable of achieving the best balance between the policy value and constraint satisfaction rate.
    ANet: Autoencoder-Based Local Field Potential Feature Extractor for Evaluating An Antidepressant Effect in Mice after Administering Kratom Leaf Extracts. (arXiv:2209.08210v1 [q-bio.QM])
    Kratom (KT) typically exerts antidepressant (AD) effects. However, evaluating which form of KT extracts possesses AD properties similar to the standard AD fluoxetine (flu) remained challenging. Here, we adopted an autoencoder (AE)-based anomaly detector called ANet to measure the similarity of mice's local field potential (LFP) features that responded to KT leave extracts and AD flu. The features that responded to KT syrup had the highest similarity to those that responded to the AD flu at 85.62 $\pm$ 0.29%. This finding presents the higher feasibility of using KT syrup as an alternative substance for depressant therapy than KT alkaloids and KT aqueous, which are the other candidates in this study. Apart from the similarity measurement, we utilized ANet as a multi-task AE and evaluated the performance in discriminating multi-class LFP responses corresponding to the effect of different KT extracts and AD flu simultaneously. Furthermore, we visualized learned latent features among LFP responses qualitatively and quantitatively as t-SNE projection and maximum mean discrepancy distance, respectively. The classification results reported the accuracy and F1-score of 79.78 $\pm$ 0.39% and 79.53 $\pm$ 0.00%. In summary, the outcomes of this research might help therapeutic design devices for an alternative substance profile evaluation, such as Kratom-based form in real-world applications.
    FedRN: Exploiting k-Reliable Neighbors Towards Robust Federated Learning. (arXiv:2205.01310v2 [cs.LG] UPDATED)
    Robustness is becoming another important challenge of federated learning in that the data collection process in each client is naturally accompanied by noisy labels. However, it is far more complex and challenging owing to varying levels of data heterogeneity and noise over clients, which exacerbates the client-to-client performance discrepancy. In this work, we propose a robust federated learning method called FedRN, which exploits k-reliable neighbors with high data expertise or similarity. Our method helps mitigate the gap between low- and high-performance clients by training only with a selected set of clean examples, identified by their ensembled mixture models. We demonstrate the superiority of FedRN via extensive evaluations on three real-world or synthetic benchmark datasets. Compared with existing robust training methods, the results show that FedRN significantly improves the test accuracy in the presence of noisy labels.
    Perception-Distortion Trade-off in the SR Space Spanned by Flow Models. (arXiv:2209.08564v1 [cs.CV])
    Flow-based generative super-resolution (SR) models learn to produce a diverse set of feasible SR solutions, called the SR space. Diversity of SR solutions increases with the temperature ($\tau$) of latent variables, which introduces random variations of texture among sample solutions, resulting in visual artifacts and low fidelity. In this paper, we present a simple but effective image ensembling/fusion approach to obtain a single SR image eliminating random artifacts and improving fidelity without significantly compromising perceptual quality. We achieve this by benefiting from a diverse set of feasible photo-realistic solutions in the SR space spanned by flow models. We propose different image ensembling and fusion strategies which offer multiple paths to move sample solutions in the SR space to more desired destinations in the perception-distortion plane in a controllable manner depending on the fidelity vs. perceptual quality requirements of the task at hand. Experimental results demonstrate that our image ensembling/fusion strategy achieves more promising perception-distortion trade-off compared to sample SR images produced by flow models and adversarially trained models in terms of both quantitative metrics and visual quality.
    Deep Adaptation of Adult-Child Facial Expressions by Fusing Landmark Features. (arXiv:2209.08614v1 [cs.CV])
    Imaging of facial affects may be used to measure psychophysiological attributes of children through their adulthood, especially for monitoring lifelong conditions like Autism Spectrum Disorder. Deep convolutional neural networks have shown promising results in classifying facial expressions of adults. However, classifier models trained with adult benchmark data are unsuitable for learning child expressions due to discrepancies in psychophysical development. Similarly, models trained with child data perform poorly in adult expression classification. We propose domain adaptation to concurrently align distributions of adult and child expressions in a shared latent space to ensure robust classification of either domain. Furthermore, age variations in facial images are studied in age-invariant face recognition yet remain unleveraged in adult-child expression classification. We take inspiration from multiple fields and propose deep adaptive FACial Expressions fusing BEtaMix SElected Landmark Features (FACE-BE-SELF) for adult-child facial expression classification. For the first time in the literature, a mixture of Beta distributions is used to decompose and select facial features based on correlations with expression, domain, and identity factors. We evaluate FACE-BE-SELF on two pairs of adult-child data sets. Our proposed FACE-BE-SELF approach outperforms adult-child transfer learning and other baseline domain adaptation methods in aligning latent representations of adult and child expressions.
    A Map-matching Algorithm with Extraction of Multi-group Information for Low-frequency Data. (arXiv:2209.08500v1 [eess.SY])
    The growing use of probe vehicles generates a huge number of GNSS data. Limited by the satellite positioning technology, further improving the accuracy of map-matching is challenging work, especially for low-frequency trajectories. When matching a trajectory, the ego vehicle's spatial-temporal information of the present trip is the most useful with the least amount of data. In addition, there are a large amount of other data, e.g., other vehicles' state and past prediction results, but it is hard to extract useful information for matching maps and inferring paths. Most map-matching studies only used the ego vehicle's data and ignored other vehicles' data. Based on it, this paper designs a new map-matching method to make full use of "Big data". We first sort all data into four groups according to their spatial and temporal distance from the present matching probe which allows us to sort for their usefulness. Then we design three different methods to extract valuable information (scores) from them: a score for speed and bearing, a score for historical usage, and a score for traffic state using the spectral graph Markov neutral network. Finally, we use a modified top-K shortest-path method to search the candidate paths within an ellipse region and then use the fused score to infer the path (projected location). We test the proposed method against baseline algorithms using a real-world dataset in China. The results show that all scoring methods can enhance map-matching accuracy. Furthermore, our method outperforms the others, especially when GNSS probing frequency is less than 0.01 Hz.
    Random Fourier Features for Asymmetric Kernels. (arXiv:2209.08461v1 [cs.LG])
    The random Fourier features (RFFs) method is a powerful and popular technique in kernel approximation for scalability of kernel methods. The theoretical foundation of RFFs is based on the Bochner theorem that relates symmetric, positive definite (PD) functions to probability measures. This condition naturally excludes asymmetric functions with a wide range applications in practice, e.g., directed graphs, conditional probability, and asymmetric kernels. Nevertheless, understanding asymmetric functions (kernels) and its scalability via RFFs is unclear both theoretically and empirically. In this paper, we introduce a complex measure with the real and imaginary parts corresponding to four finite positive measures, which expands the application scope of the Bochner theorem. By doing so, this framework allows for handling classical symmetric, PD kernels via one positive measure; symmetric, non-positive definite kernels via signed measures; and asymmetric kernels via complex measures, thereby unifying them into a general framework by RFFs, named AsK-RFFs. Such approximation scheme via complex measures enjoys theoretical guarantees in the perspective of the uniform convergence. In algorithmic implementation, to speed up the kernel approximation process, which is expensive due to the calculation of total mass, we employ a subset-based fast estimation method that optimizes total masses on a sub-training set, which enjoys computational efficiency in high dimensions. Our AsK-RFFs method is empirically validated on several typical large-scale datasets and achieves promising kernel approximation performance, which demonstrate the effectiveness of AsK-RFFs.
    Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking Control. (arXiv:2209.08347v1 [cs.LG])
    Flocking control is a challenging problem, where multiple agents, such as drones or vehicles, need to reach a target position while maintaining the flock and avoiding collisions with obstacles and collisions among agents in the environment. Multi-agent reinforcement learning has achieved promising performance in flocking control. However, methods based on traditional reinforcement learning require a considerable number of interactions between agents and the environment. This paper proposes a sub-optimal policy aided multi-agent reinforcement learning algorithm (SPA-MARL) to boost sample efficiency. SPA-MARL directly leverages a prior policy that can be manually designed or solved with a non-learning method to aid agents in learning, where the performance of the policy can be sub-optimal. SPA-MARL recognizes the difference in performance between the sub-optimal policy and itself, and then imitates the sub-optimal policy if the sub-optimal policy is better. We leverage SPA-MARL to solve the flocking control problem. A traditional control method based on artificial potential fields is used to generate a sub-optimal policy. Experiments demonstrate that SPA-MARL can speed up the training process and outperform both the MARL baseline and the used sub-optimal policy.
    Make Heterophily Graphs Better Fit GNN: A Graph Rewiring Approach. (arXiv:2209.08264v1 [cs.LG])
    Graph Neural Networks (GNNs) are popular machine learning methods for modeling graph data. A lot of GNNs perform well on homophily graphs while having unsatisfactory performance on heterophily graphs. Recently, some researchers turn their attention to designing GNNs for heterophily graphs by adjusting the message passing mechanism or enlarging the receptive field of the message passing. Different from existing works that mitigate the issues of heterophily from model design perspective, we propose to study heterophily graphs from an orthogonal perspective by rewiring the graph structure to reduce heterophily and making the traditional GNNs perform better. Through comprehensive empirical studies and analysis, we verify the potential of the rewiring methods. To fully exploit its potential, we propose a method named Deep Heterophily Graph Rewiring (DHGR) to rewire graphs by adding homophilic edges and pruning heterophilic edges. The detailed way of rewiring is determined by comparing the similarity of label/feature-distribution of node neighbors. Besides, we design a scalable implementation for DHGR to guarantee high efficiency. DHRG can be easily used as a plug-in module, i.e., a graph pre-processing step, for any GNNs, including both GNN for homophily and heterophily, to boost their performance on the node classification task. To the best of our knowledge, it is the first work studying graph rewiring for heterophily graphs. Extensive experiments on 11 public graph datasets demonstrate the superiority of our proposed methods.
    Computed Decision Weights and a New Learning Algorithm for Neural Classifiers. (arXiv:2209.08422v1 [cs.LG])
    In this paper we consider the possibility of computing rather than training the decision layer weights of a neural classifier. Such a possibility arises in two way, from making an appropriate choice of loss function and by solving a problem of constrained optimization. The latter formulation leads to a promising new learning process for pre-decision weights with both simplicity and efficacy.
    Joint Network Topology Inference via a Shared Graphon Model. (arXiv:2209.08223v1 [stat.ML])
    We consider the problem of estimating the topology of multiple networks from nodal observations, where these networks are assumed to be drawn from the same (unknown) random graph model. We adopt a graphon as our random graph model, which is a nonparametric model from which graphs of potentially different sizes can be drawn. The versatility of graphons allows us to tackle the joint inference problem even for the cases where the graphs to be recovered contain different number of nodes and lack precise alignment across the graphs. Our solution is based on combining a maximum likelihood penalty with graphon estimation schemes and can be used to augment existing network inference methods. The proposed joint network and graphon estimation is further enhanced with the introduction of a robust method for noisy graph sampling information. We validate our proposed approach by comparing its performance against competing methods in synthetic and real-world datasets.
    Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models. (arXiv:2111.13129v2 [cs.RO] UPDATED)
    A core challenge for an autonomous agent acting in the real world is to adapt its repertoire of skills to cope with its noisy perception and dynamics. To scale learning of skills to long-horizon tasks, robots should be able to learn and later refine their skills in a structured manner through trajectories rather than making instantaneous decisions individually at each time step. To this end, we propose the Soft Actor-Critic Gaussian Mixture Model (SAC-GMM), a novel hybrid approach that learns robot skills through a dynamical system and adapts the learned skills in their own trajectory distribution space through interactions with the environment. Our approach combines classical robotics techniques of learning from demonstration with the deep reinforcement learning framework and exploits their complementary nature. We show that our method utilizes sensors solely available during the execution of preliminarily learned skills to extract relevant features that lead to faster skill refinement. Extensive evaluations in both simulation and real-world environments demonstrate the effectiveness of our method in refining robot skills by leveraging physical interactions, high-dimensional sensory data, and sparse task completion rewards. Videos, code, and pre-trained models are available at this http URL
    Estimating and Explaining Model Performance When Both Covariates and Labels Shift. (arXiv:2209.08436v1 [stat.ML])
    Deployed machine learning (ML) models often encounter new user data that differs from their training data. Therefore, estimating how well a given model might perform on the new data is an important step toward reliable ML applications. This is very challenging, however, as the data distribution can change in flexible ways, and we may not have any labels on the new data, which is often the case in monitoring settings. In this paper, we propose a new distribution shift model, Sparse Joint Shift (SJS), which considers the joint shift of both labels and a few features. This unifies and generalizes several existing shift models including label shift and sparse covariate shift, where only marginal feature or label distribution shifts are considered. We describe mathematical conditions under which SJS is identifiable. We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels. We conduct extensive experiments on several real-world datasets with various ML models. Across different datasets and distribution shifts, SEES achieves significant (up to an order of magnitude) shift estimation error improvements over existing approaches.
    Optimal Scaling for Locally Balanced Proposals in Discrete Spaces. (arXiv:2209.08183v1 [cs.LG])
    Optimal scaling has been well studied for Metropolis-Hastings (M-H) algorithms in continuous spaces, but a similar understanding has been lacking in discrete spaces. Recently, a family of locally balanced proposals (LBP) for discrete spaces has been proved to be asymptotically optimal, but the question of optimal scaling has remained open. In this paper, we establish, for the first time, that the efficiency of M-H in discrete spaces can also be characterized by an asymptotic acceptance rate that is independent of the target distribution. Moreover, we verify, both theoretically and empirically, that the optimal acceptance rates for LBP and random walk Metropolis (RWM) are $0.574$ and $0.234$ respectively. These results also help establish that LBP is asymptotically $O(N^\frac{2}{3})$ more efficient than RWM with respect to model dimension $N$. Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces. We demonstrate empirically that such adaptive M-H sampling can robustly improve sampling in a variety of target distributions in discrete spaces, including training deep energy based models.  ( 2 min )
    Enhanced Fairness Testing via Generating Effective Initial Individual Discriminatory Instances. (arXiv:2209.08321v1 [cs.SE])
    Fairness testing aims at mitigating unintended discrimination in the decision-making process of data-driven AI systems. Individual discrimination may occur when an AI model makes different decisions for two distinct individuals who are distinguishable solely according to protected attributes, such as age and race. Such instances reveal biased AI behaviour, and are called Individual Discriminatory Instances (IDIs). In this paper, we propose an approach for the selection of the initial seeds to generate IDIs for fairness testing. Previous studies mainly used random initial seeds to this end. However this phase is crucial, as these seeds are the basis of the follow-up IDIs generation. We dubbed our proposed seed selection approach I&D. It generates a large number of initial IDIs exhibiting a great diversity, aiming at improving the overall performance of fairness testing. Our empirical study reveal that I&D is able to produce a larger number of IDIs with respect to four state-of-the-art seed generation approaches, generating 1.68X more IDIs on average. Moreover, we compare the use of I&D to train machine learning models and find that using I&D reduces the number of remaining IDIs by 29% when compared to the state-of-the-art, thus indicating that I&D is effective for improving model fairness  ( 2 min )
    Comprehensive identification of Long Covid articles with human-in-the-loop machine learning. (arXiv:2209.08124v1 [cs.LG])
    A significant percentage of COVID-19 survivors experience ongoing multisystemic symptoms that often affect daily living, a condition known as Long Covid or post-acute-sequelae of SARS-CoV-2 infection. However, identifying Long Covid articles is challenging since articles refer to the condition using a variety of less common terms or refrain from naming it at all. We developed an iterative human-in-the-loop machine learning framework designed to effectively leverage the data available and make the most efficient use of human labels. Specifically, our approach combines data programming with active learning into a robust ensemble model. Evaluating our model on a holdout set demonstrates over three times the sensitivity of other methods. We apply our model to PubMed to create the Long Covid collection, and demonstrate that (1) most Long Covid articles do not refer to Long Covid by any name (2) when the condition is named, the name used most frequently in the biomedical literature is Long Covid, and (3) Long Covid is associated with disorders in a wide variety of body systems. The Long Covid collection is updated weekly and is searchable online at the LitCovid portal: https://www.ncbi.nlm.nih.gov/research/coronavirus/docsum?filters=e_condition.LongCovid  ( 3 min )
    A review of probabilistic forecasting and prediction with machine learning. (arXiv:2209.08307v1 [stat.ML])
    Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users' needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.  ( 2 min )
    LEARNEST: LEARNing Enhanced Model-based State ESTimation for Robots using Knowledge-based Neural Ordinary Differential Equations. (arXiv:2209.08185v1 [cs.RO])
    State estimation is an important aspect in many robotics applications. In this work, we consider the task of obtaining accurate state estimates for robotic systems by enhancing the dynamics model used in state estimation algorithms. Existing frameworks such as moving horizon estimation (MHE) and the unscented Kalman filter (UKF) provide the flexibility to incorporate nonlinear dynamics and measurement models. However, this implies that the dynamics model within these algorithms has to be sufficiently accurate in order to warrant the accuracy of the state estimates. To enhance the dynamics models and improve the estimation accuracy, we utilize a deep learning framework known as knowledge-based neural ordinary differential equations (KNODEs). The KNODE framework embeds prior knowledge into the training procedure and synthesizes an accurate hybrid model by fusing a prior first-principles model with a neural ordinary differential equation (NODE) model. In our proposed LEARNEST framework, we integrate the data-driven model into two novel model-based state estimation algorithms, which are denoted as KNODE-MHE and KNODE-UKF. These two algorithms are compared against their conventional counterparts across a number of robotic applications; state estimation for a cartpole system using partial measurements, localization for a ground robot, as well as state estimation for a quadrotor. Through simulations and tests using real-world experimental data, we demonstrate the versatility and efficacy of the proposed learning-enhanced state estimation framework.  ( 3 min )
    Confidence-Guided Data Augmentation for Deep Semi-Supervised Training. (arXiv:2209.08174v1 [cs.CV])
    We propose a new data augmentation technique for semi-supervised learning settings that emphasizes learning from the most challenging regions of the feature space. Starting with a fully supervised reference model, we first identify low confidence predictions. These samples are then used to train a Variational AutoEncoder (VAE) that can generate an infinite number of additional images with similar distribution. Finally, using the originally labeled data and the synthetically generated labeled and unlabeled data, we retrain a new model in a semi-supervised fashion. We perform experiments on two benchmark RGB datasets: CIFAR-100 and STL-10, and show that the proposed scheme improves classification performance in terms of accuracy and robustness, while yielding comparable or superior results with respect to existing fully supervised approaches  ( 2 min )
    De Bruijn goes Neural: Causality-Aware Graph Neural Networks for Time Series Data on Dynamic Graphs. (arXiv:2209.08311v1 [cs.LG])
    We introduce De Bruijn Graph Neural Networks (DBGNNs), a novel time-aware graph neural network architecture for time-resolved data on dynamic graphs. Our approach accounts for temporal-topological patterns that unfold in the causal topology of dynamic graphs, which is determined by causal walks, i.e. temporally ordered sequences of links by which nodes can influence each other over time. Our architecture builds on multiple layers of higher-order De Bruijn graphs, an iterative line graph construction where nodes in a De Bruijn graph of order k represent walks of length k-1, while edges represent walks of length k. We develop a graph neural network architecture that utilizes De Bruijn graphs to implement a message passing scheme that follows a non-Markovian dynamics, which enables us to learn patterns in the causal topology of a dynamic graph. Addressing the issue that De Bruijn graphs with different orders k can be used to model the same data set, we further apply statistical model selection to determine the optimal graph topology to be used for message passing. An evaluation in synthetic and empirical data sets suggests that DBGNNs can leverage temporal patterns in dynamic graphs, which substantially improves the performance in a supervised node classification task.  ( 2 min )
    Adaptive Dimension Reduction and Variational Inference for Transductive Few-Shot Classification. (arXiv:2209.08527v1 [cs.LG])
    Transductive Few-Shot learning has gained increased attention nowadays considering the cost of data annotations along with the increased accuracy provided by unlabelled samples in the domain of few shot. Especially in Few-Shot Classification (FSC), recent works explore the feature distributions aiming at maximizing likelihoods or posteriors with respect to the unknown parameters. Following this vein, and considering the parallel between FSC and clustering, we seek for better taking into account the uncertainty in estimation due to lack of data, as well as better statistical properties of the clusters associated with each class. Therefore in this paper we propose a new clustering method based on Variational Bayesian inference, further improved by Adaptive Dimension Reduction based on Probabilistic Linear Discriminant Analysis. Our proposed method significantly improves accuracy in the realistic unbalanced transductive setting on various Few-Shot benchmarks when applied to features used in previous studies, with a gain of up to $6\%$ in accuracy. In addition, when applied to balanced setting, we obtain very competitive results without making use of the class-balance artefact which is disputable for practical use cases. We also provide the performance of our method on a high performing pretrained backbone, with the reported results further surpassing the current state-of-the-art accuracy, suggesting the genericity of the proposed method.  ( 2 min )
    HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions. (arXiv:2209.08443v1 [cs.SE])
    Commercial ML APIs offered by providers such as Google, Amazon and Microsoft have dramatically simplified ML adoption in many applications. Numerous companies and academics pay to use ML APIs for tasks such as object detection, OCR and sentiment analysis. Different ML APIs tackling the same task can have very heterogeneous performance. Moreover, the ML models underlying the APIs also evolve over time. As ML APIs rapidly become a valuable marketplace and a widespread way to consume machine learning, it is critical to systematically study and compare different APIs with each other and to characterize how APIs change over time. However, this topic is currently underexplored due to the lack of data. In this paper, we present HAPI (History of APIs), a longitudinal dataset of 1,761,417 instances of commercial ML API applications (involving APIs from Amazon, Google, IBM, Microsoft and other providers) across diverse tasks including image tagging, speech recognition and text mining from 2020 to 2022. Each instance consists of a query input for an API (e.g., an image or text) along with the API's output prediction/annotation and confidence scores. HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS). As examples of the types of analyses that HAPI enables, we show that ML APIs' performance change substantially over time--several APIs' accuracies dropped on specific benchmark datasets. Even when the API's aggregate performance stays steady, its error modes can shift across different subtypes of data between 2020 and 2022. Such changes can substantially impact the entire analytics pipelines that use some ML API as a component. We further use HAPI to study commercial APIs' performance disparities across demographic subgroups over time. HAPI can stimulate more research in the growing field of MLaaS.  ( 3 min )
    Thompson Sampling with Virtual Helping Agents. (arXiv:2209.08197v1 [cs.LG])
    We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting the current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits using the multi-armed bandit framework. Thompson sampling is one of the heuristics for choosing actions that address this exploration-exploitation dilemma. We first propose a general framework that helps heuristically tune the exploration versus exploitation trade-off in Thompson sampling using multiple samples from the posterior distribution. Utilizing this framework, we propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret. Next, we demonstrate the empirical improvement in the cumulative regret performance of the proposed algorithm over Thompson Sampling. We also show the effectiveness of the proposed algorithm on real-world datasets. Contrary to the existing methods, our framework provides a mechanism to vary the amount of exploration/ exploitation based on the task at hand. Towards this end, we extend our framework for two additional problems, i.e., best arm identification and time-sensitive learning in bandits and compare our algorithm with existing methods.  ( 2 min )
  • Open

    Application of Neural Network in the Prediction of NOx Emissions from Degrading Gas Turbine. (arXiv:2209.09168v1 [cs.LG])
    This paper is aiming to apply neural network algorithm for predicting the process response (NOx emissions) from degrading natural gas turbines. Nine different process variables, or predictors, are considered in the predictive modelling. It is found out that the model trained by neural network algorithm should use part of recent data in the training and validation sets accounting for the impact of the system degradation. R-Square values of the training and validation sets demonstrate the validity of the model. The residue plot, without any clear pattern, shows the model is appropriate. The ranking of the importance of the process variables are demonstrated and the prediction profile confirms the significance of the process variables. The model trained by using neural network algorithm manifests the optimal settings of the process variables to reach the minimum value of NOx emissions from the degrading gas turbine system.
    Parameter-free Mirror Descent. (arXiv:2203.00444v3 [cs.LG] UPDATED)
    We develop a modified online mirror descent framework that is suitable for building adaptive and parameter-free algorithms in unbounded domains. We leverage this technique to develop the first unconstrained online linear optimization algorithm achieving an optimal dynamic regret bound, and we further demonstrate that natural strategies based on Follow-the-Regularized-Leader are unable to achieve similar results. We also apply our mirror descent framework to build new parameter-free implicit updates, as well as a simplified and improved unconstrained scale-free algorithm.
    A Splicing Approach to Best Subset of Groups Selection. (arXiv:2104.12576v3 [cs.LG] UPDATED)
    Best subset of groups selection (BSGS) is the process of selecting a small part of non-overlapping groups to achieve the best interpretability on the response variable. It has attracted increasing attention and has far-reaching applications in practice. However, due to the computational intractability of BSGS in high-dimensional settings, developing efficient algorithms for solving BSGS remains a research hotspot. In this paper,we propose a group-splicing algorithm that iteratively detects the relevant groups and excludes the irrelevant ones. Moreover, coupled with a novel group information criterion, we develop an adaptive algorithm to determine the optimal model size. Under mild conditions, it is certifiable that our algorithm can identify the optimal subset of groups in polynomial time with high probability. Finally, we demonstrate the efficiency and accuracy of our methods by comparing them with several state-of-the-art algorithms on both synthetic and real-world datasets.
    Towards Robust Off-Policy Evaluation via Human Inputs. (arXiv:2209.08682v1 [cs.LG])
    Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes domains such as healthcare, where direct deployment is often infeasible, unethical, or expensive. When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes. Existing approaches consider robustness against a large class of shifts that can arbitrarily change any observable property of the environment. This often results in highly pessimistic estimates of the utilities, thereby invalidating policies that might have been useful in deployment. In this work, we address the aforementioned problem by investigating how domain knowledge can help provide more realistic estimates of the utilities of policies. We leverage human inputs on which aspects of the environments may plausibly change, and adapt the OPE methods to only consider shifts on these aspects. Specifically, we propose a novel framework, Robust OPE (ROPE), which considers shifts on a subset of covariates in the data based on user inputs, and estimates worst-case utility under these shifts. We then develop computationally efficient algorithms for OPE that are robust to the aforementioned shifts for contextual bandits and Markov decision processes. We also theoretically analyze the sample complexity of these algorithms. Extensive experimentation with synthetic and real world datasets from the healthcare domain demonstrates that our approach not only captures realistic dataset shifts accurately, but also results in less pessimistic policy evaluations.
    Iterated Block Particle Filter for High-dimensional Parameter Learning: Beating the Curse of Dimensionality. (arXiv:2110.10745v2 [stat.ML] UPDATED)
    Parameter learning for high-dimensional, partially observed, and nonlinear stochastic processes is a methodological challenge. Spatiotemporal disease transmission systems provide examples of such processes giving rise to open inference problems. We propose the iterated block particle filter (IBPF) algorithm for learning high-dimensional parameters over graphical state space models with general state spaces, measures, transition densities and graph structure. Theoretical performance guarantees are obtained on beating the curse of dimensionality (COD), algorithm convergence, and likelihood maximization. Experiments on a highly nonlinear and non-Gaussian spatiotemporal model for measles transmission reveal that the iterated ensemble Kalman filter algorithm (Li et al. (2020)) is ineffective and the iterated filtering algorithm (Ionides et al. (2015)) suffers from the COD, while our IBPF algorithm beats COD consistently across various experiments with different metrics.
    Importance Tempering: Group Robustness for Overparameterized Models. (arXiv:2209.08745v1 [cs.LG])
    Although overparameterized models have shown their success on many machine learning tasks, the accuracy could drop on the testing distribution that is different from the training one. This accuracy drop still limits applying machine learning in the wild. At the same time, importance weighting, a traditional technique to handle distribution shifts, has been demonstrated to have less or even no effect on overparameterized models both empirically and theoretically. In this paper, we propose importance tempering to improve the decision boundary and achieve consistently better results for overparameterized models. Theoretically, we justify that the selection of group temperature can be different under label shift and spurious correlation setting. At the same time, we also prove that properly selected temperatures can extricate the minority collapse for imbalanced classification. Empirically, we achieve state-of-the-art results on worst group classification tasks using importance tempering.
    A provably stable neural network Turing Machine. (arXiv:2006.03651v4 [cs.LG] UPDATED)
    We introduce a neural stack architecture, including a differentiable parametrized stack operator that approximates stack push and pop operations for suitable choices of parameters that explicitly represents a stack. We prove the stability of this stack architecture: after arbitrarily many stack operations, the state of the neural stack still closely resembles the state of the discrete stack. Using the neural stack with a recurrent neural network, we introduce a neural network Pushdown Automaton (nnPDA) and prove that nnPDA with finite/bounded neurons and time can simulate any PDA. Furthermore, we extend our construction and propose new architecture neural state Turing Machine (nnTM). We prove that differentiable nnTM with bounded neurons can simulate Turing Machine (TM) in real-time. Just like the neural stack, these architectures are also stable. Finally, we extend our construction to show that differentiable nnTM is equivalent to Universal Turing Machine (UTM) and can simulate any TM with only \textbf{seven finite/bounded precision} neurons. This work provides a new theoretical bound for the computational capability of bounded precision RNNs augmented with memory.
    Better Uncertainty Calibration via Proper Scores for Classification and Beyond. (arXiv:2203.07835v2 [cs.LG] UPDATED)
    With model trustworthiness being crucial for sensitive real-world applications, practitioners are putting more and more focus on improving the uncertainty calibration of deep neural networks. Calibration errors are designed to quantify the reliability of probabilistic predictions but their estimators are usually biased and inconsistent. In this work, we introduce the framework of proper calibration errors, which relates every calibration error to a proper score and provides a respective upper bound with optimal estimation properties. This relationship can be used to reliably quantify the model calibration improvement. We theoretically and empirically demonstrate the shortcomings of commonly used estimators compared to our approach. Due to the wide applicability of proper scores, this gives a natural extension of recalibration beyond classification.  ( 2 min )
    Bayesian Importance of Features (BIF). (arXiv:2010.13872v2 [stat.ML] UPDATED)
    We introduce a simple and intuitive framework that provides quantitative explanations of statistical models through the probabilistic assessment of input feature importance. The core idea comes from utilizing the Dirichlet distribution to define the importance of input features and learning it via approximate Bayesian inference. The learned importance has probabilistic interpretation and provides the relative significance of each input feature to a model's output, additionally assessing confidence about its importance quantification. As a consequence of using the Dirichlet distribution over the explanations, we can define a closed-form divergence to gauge the similarity between learned importance under different models. We use this divergence to study the feature importance explainability tradeoffs with essential notions in modern machine learning, such as privacy and fairness. Furthermore, BIF can work on two levels: global explanation (feature importance across all data instances) and local explanation (individual feature importance for each data instance). We show the effectiveness of our method on a variety of synthetic and real datasets, taking into account both tabular and image datasets. The code is available at https://github.com/kamadforge/featimp_dp.  ( 2 min )
    Causal Feature Selection via Orthogonal Search. (arXiv:2007.02938v3 [stat.ML] UPDATED)
    The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. However, established approaches often scale at least exponentially with the number of explanatory variables, are difficult to extend to nonlinear relationships, and are difficult to extend to cyclic data. Inspired by {\em Debiased} machine learning methods, we study a one-vs.-the-rest feature selection approach to discover the direct causal parent of the response. We propose an algorithm that works for purely observational data while also offering theoretical guarantees, including the case of partially nonlinear relationships possibly under the presence of cycles. As it requires only one estimation for each variable, our approach is applicable even to large graphs. We demonstrate significant improvements compared to established approaches.  ( 2 min )
    Heterogeneous Federated Learning on a Graph. (arXiv:2209.08737v1 [stat.ML])
    Federated learning, where algorithms are trained across multiple decentralized devices without sharing local data, is increasingly popular in distributed machine learning practice. Typically, a graph structure $G$ exists behind local devices for communication. In this work, we consider parameter estimation in federated learning with data distribution and communication heterogeneity, as well as limited computational capacity of local devices. We encode the distribution heterogeneity by parametrizing distributions on local devices with a set of distinct $p$-dimensional vectors. We then propose to jointly estimate parameters of all devices under the $M$-estimation framework with the fused Lasso regularization, encouraging an equal estimate of parameters on connected devices in $G$. We provide a general result for our estimator depending on $G$, which can be further calibrated to obtain convergence rates for various specific problem setups. Surprisingly, our estimator attains the optimal rate under certain graph fidelity condition on $G$, as if we could aggregate all samples sharing the same distribution. If the graph fidelity condition is not met, we propose an edge selection procedure via multiple testing to ensure the optimality. To ease the burden of local computation, a decentralized stochastic version of ADMM is provided, with convergence rate $O(T^{-1}\log T)$ where $T$ denotes the number of iterations. We highlight that, our algorithm transmits only parameters along edges of $G$ at each iteration, without requiring a central machine, which preserves privacy. We further extend it to the case where devices are randomly inaccessible during the training process, with a similar algorithmic convergence guarantee. The computational and statistical efficiency of our method is evidenced by simulation experiments and the 2020 US presidential election data set.  ( 3 min )
    Rethinking Knowledge Graph Evaluation Under the Open-World Assumption. (arXiv:2209.08858v1 [cs.AI])
    Most knowledge graphs (KGs) are incomplete, which motivates one important research topic on automatically complementing knowledge graphs. However, evaluation of knowledge graph completion (KGC) models often ignores the incompleteness -- facts in the test set are ranked against all unknown triplets which may contain a large number of missing facts not included in the KG yet. Treating all unknown triplets as false is called the closed-world assumption. This closed-world assumption might negatively affect the fairness and consistency of the evaluation metrics. In this paper, we study KGC evaluation under a more realistic setting, namely the open-world assumption, where unknown triplets are considered to include many missing facts not included in the training or test sets. For the currently most used metrics such as mean reciprocal rank (MRR) and Hits@K, we point out that their behavior may be unexpected under the open-world assumption. Specifically, with not many missing facts, their numbers show a logarithmic trend with respect to the true strength of the model, and thus, the metric increase could be insignificant in terms of reflecting the true model improvement. Further, considering the variance, we show that the degradation in the reported numbers may result in incorrect comparisons between different models, where stronger models may have lower metric numbers. We validate the phenomenon both theoretically and experimentally. Finally, we suggest possible causes and solutions for this problem. Our code and data are available at https://github.com/GraphPKU/Open-World-KG .  ( 3 min )
    Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback. (arXiv:2205.13451v2 [cs.LG] UPDATED)
    We consider regret minimization for Adversarial Markov Decision Processes (AMDPs), where the loss functions are changing over time and adversarially chosen, and the learner only observes the losses for the visited state-action pairs (i.e., bandit feedback). While there has been a surge of studies on this problem using Online-Mirror-Descent (OMD) methods, very little is known about the Follow-the-Perturbed-Leader (FTPL) methods, which are usually computationally more efficient and also easier to implement since it only requires solving an offline planning problem. Motivated by this, we take a closer look at FTPL for learning AMDPs, starting from the standard episodic finite-horizon setting. We find some unique and intriguing difficulties in the analysis and propose a workaround to eventually show that FTPL is also able to achieve near-optimal regret bounds in this case. More importantly, we then find two significant applications: First, the analysis of FTPL turns out to be readily generalizable to delayed bandit feedback with order-optimal regret, while OMD methods exhibit extra difficulties (Jin et al., 2022). Second, using FTPL, we also develop the first no-regret algorithm for learning communicating AMDPs in the infinite-horizon setting with bandit feedback and stochastic transitions. Our algorithm is efficient assuming access to an offline planning oracle, while even for the easier full-information setting, the only existing algorithm (Chandrasekaran and Tewari, 2021) is computationally inefficient.  ( 3 min )
    Tensor Principal Component Analysis in High Dimensional CP Models. (arXiv:2108.04428v4 [stat.ML] UPDATED)
    The CP decomposition for high dimensional non-orthogonal spiked tensors is an important problem with broad applications across many disciplines. However, previous works with theoretical guarantee typically assume restrictive incoherence conditions on the basis vectors for the CP components. In this paper, we propose new computationally efficient composite PCA and concurrent orthogonalization algorithms for tensor CP decomposition with theoretical guarantees under mild incoherence conditions. The composite PCA applies the principal component or singular value decompositions twice, first to a matrix unfolding of the tensor data to obtain singular vectors and then to the matrix folding of the singular vectors obtained in the first step. It can be used as an initialization for any iterative optimization schemes for the tensor CP decomposition. The concurrent orthogonalization algorithm iteratively estimates the basis vector in each mode of the tensor by simultaneously applying projections to the orthogonal complements of the spaces generated by other CP components in other modes. It is designed to improve the alternating least squares estimator and other forms of the high order orthogonal iteration for tensors with low or moderately high CP ranks, and it is guaranteed to converge rapidly when the error of any given initial estimator is bounded by a small constant. Our theoretical investigation provides estimation accuracy and convergence rates for the two proposed algorithms. Both proposed algorithms are applicable to deterministic tensor, its noisy version, and the order-$2K$ covariance tensor of order-$K$ tensor data in a factor model with uncorrelated factors. Our implementations on synthetic data demonstrate significant practical superiority of our approach over existing methods.  ( 3 min )
    Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence. (arXiv:2204.02570v2 [cs.DS] UPDATED)
    We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include random spanning tree distributions and determinantal point processes. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $\widetilde{O}(\lvert V\rvert)$ time per sample after an initial $\widetilde{O}(\lvert E\rvert)$ time preprocessing. For a determinantal point process on subsets of size $k$ of a ground set of $n$ elements, we show how to approximately sample in $\widetilde{O}(k^\omega)$ time after an initial $\widetilde{O}(nk^{\omega-1})$ time preprocessing, where $\omega<2.372864$ is the matrix multiplication exponent. We even improve the state of the art for obtaining a single sample from determinantal point processes, from the prior runtime of $\widetilde{O}(\min\{nk^2, n^\omega\})$ to $\widetilde{O}(nk^{\omega-1})$. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution $\mu$ on $\binom{[n]}{k}$ is reduced to sampling from related distributions on $\binom{[t]}{k}$ for $t\ll n$. We show that for strongly Rayleigh distributions, we can can achieve the optimal $t=\widetilde{O}(k)$. Our reduction involves sampling from $\widetilde{O}(1)$ domain-sparsified distributions, all of which can be produced efficiently assuming convenient access to approximate overestimates for marginals of $\mu$. Having access to marginals is analogous to having access to the mean and covariance of a continuous distribution, or knowing "isotropy" for the distribution, the key assumption behind the Kannan-Lov\'asz-Simonovits (KLS) conjecture and optimal samplers based on it. We view our result as a moral analog of the KLS conjecture and its consequences for sampling, for discrete strongly Rayleigh measures.  ( 3 min )
    Homomorphic Sensing of Subspace Arrangements. (arXiv:2006.05158v4 [cs.LG] UPDATED)
    Homomorphic sensing is a recent algebraic-geometric framework that studies the unique recovery of points in a linear subspace from their images under a given collection of linear maps. It has been successful in interpreting such a recovery in the case of permutations composed by coordinate projections, an important instance in applications known as unlabeled sensing, which models data that are out of order and have missing values. In this paper, we provide tighter and simpler conditions that guarantee the unique recovery for the single-subspace case, extend the result to the case of a subspace arrangement, and show that the unique recovery in a single subspace is locally stable under noise. We specialize our results to several examples of homomorphic sensing such as real phase retrieval and unlabeled sensing. In so doing, in a unified way, we obtain conditions that guarantee the unique recovery for those examples, typically known via diverse techniques in the literature, as well as novel conditions for sparse and unsigned versions of unlabeled sensing. Similarly, our noise result also implies that the unique recovery in unlabeled sensing is locally stable.  ( 3 min )
    Which Samples Should be Learned First: Easy or Hard?. (arXiv:2110.05481v4 [cs.LG] UPDATED)
    An effective weighting scheme for training samples is essential for learning tasks. Numerous weighting schemes have been proposed. Some schemes take the easy-first mode, whereas some others take the hard-first one. Naturally, an interesting yet realistic question is raised. Which samples should be learned first given a new learning task, easy or hard? To answer this question, both theoretical analyses and experimental verification are conducted. First, a general optimized objective function is proposed, revealing the relationship between the difficulty distribution and the difficulty-based sample weights. Second, on the basis of the optimized objective function, theoretical answers are obtained. Besides the easy-first and hard-first modes, there are two other priority modes, namely, medium-first and two-ends-first. The prior mode does not necessarily remain unchanged during the training process. Third, an effective and universal solution is proposed to select the optimal priority mode when there is no prior knowledge or theoretical clues. The four modes, namely, easy/medium/hard/two-ends-first, can be flexibly switched in the proposed solution. Fourth, a wide range of experiments is conducted under various scenarios to further compare the weighting schemes in different modes. On the basis of these works, reasonable and comprehensive answers are obtained. Factors including the distribution of samples' learning difficulties and the validation data determine which samples should be learned first in a learning task.  ( 3 min )
    Probabilistic Autoencoder. (arXiv:2006.05479v4 [cs.LG] UPDATED)
    Principal Component Analysis (PCA) minimizes the reconstruction error given a class of linear models of fixed component dimensionality. Probabilistic PCA adds a probabilistic structure by learning the probability distribution of the PCA latent space weights, thus creating a generative model. Autoencoders (AE) minimize the reconstruction error in a class of nonlinear models of fixed latent space dimensionality and outperform PCA at fixed dimensionality. Here, we introduce the Probabilistic Autoencoder (PAE) that learns the probability distribution of the AE latent space weights using a normalizing flow (NF). The PAE is fast and easy to train and achieves small reconstruction errors, high sample quality, and good performance in downstream tasks. We compare the PAE to Variational AE (VAE), showing that the PAE trains faster, reaches a lower reconstruction error, and produces good sample quality without requiring special tuning parameters or training procedures. We further demonstrate that the PAE is a powerful model for performing the downstream tasks of probabilistic image reconstruction in the context of Bayesian inference of inverse problems for inpainting and denoising applications. Finally, we identify latent space density from NF as a promising outlier detection metric.
    Class-Incremental Continual Learning into the eXtended DER-verse. (arXiv:2201.00766v2 [cs.LG] UPDATED)
    The staple of human intelligence is the capability of acquiring knowledge in a continuous fashion. In stark contrast, Deep Networks forget catastrophically and, for this reason, the sub-field of Class-Incremental Continual Learning fosters methods that learn a sequence of tasks incrementally, blending sequentially-gained knowledge into a comprehensive prediction. This work aims at assessing and overcoming the pitfalls of our previous proposal Dark Experience Replay (DER), a simple and effective approach that combines rehearsal and Knowledge Distillation. Inspired by the way our minds constantly rewrite past recollections and set expectations for the future, we endow our model with the abilities to i) revise its replay memory to welcome novel information regarding past data ii) pave the way for learning yet unseen classes. We show that the application of these strategies leads to remarkable improvements; indeed, the resulting method - termed eXtended-DER (X-DER) - outperforms the state of the art on both standard benchmarks (such as CIFAR-100 and miniImagenet) and a novel one here introduced. To gain a better understanding, we further provide extensive ablation studies that corroborate and extend the findings of our previous research (e.g. the value of Knowledge Distillation and flatter minima in continual learning setups).
    Comparative study of machine learning and deep learning methods on ASD classification. (arXiv:2209.08601v1 [eess.IV])
    The autism dataset is studied to identify the differences between autistic and healthy groups. For this, the resting-state Functional Magnetic Resonance Imaging (rs-fMRI) data of the two groups are analyzed, and networks of connections between brain regions were created. Several classification frameworks are developed to distinguish the connectivity patterns between the groups. The best models for statistical inference and precision were compared, and the tradeoff between precision and model interpretability was analyzed. Finally, the classification accuracy measures were reported to justify the performance of our framework. Our best model can classify autistic and healthy patients on the multisite ABIDE I data with 71% accuracy.
    Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold. (arXiv:2209.09211v1 [cs.LG])
    When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon. More specifically, for the output features of the penultimate layer, for each class the within-class features converge to their means, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer's classifier. As feature normalization in the last layer becomes a common practice in modern representation learning, in this work we theoretically justify the neural collapse phenomenon for normalized features. Based on an unconstrained feature model, we simplify the empirical loss function in a multi-class classification task into a nonconvex optimization problem over the Riemannian manifold by constraining all features and classifiers over the sphere. In this context, we analyze the nonconvex landscape of the Riemannian optimization problem over the product of spheres, showing a benign global landscape in the sense that the only global minimizers are the neural collapse solutions while all other critical points are strict saddles with negative curvature. Experimental results on practical deep networks corroborate our theory and demonstrate that better representations can be learned faster via feature normalization.
    On minimax density estimation via measure transport. (arXiv:2207.10231v2 [math.ST] UPDATED)
    We study the convergence properties, in Hellinger and related distances, of nonparametric density estimators based on measure transport. These estimators represent the measure of interest as the pushforward of a chosen reference distribution under a transport map, where the map is chosen via a maximum likelihood objective (equivalently, minimizing an empirical Kullback-Leibler loss) or a penalized version thereof. We establish concentration inequalities for a general class of penalized measure transport estimators, by combining techniques from M-estimation with analytical properties of the transport-based density representation. We then demonstrate the implications of our theory for the case of triangular Knothe-Rosenblatt (KR) transports on the $d$-dimensional unit cube, and show that both penalized and unpenalized versions of such estimators achieve minimax optimal convergence rates over H\"older classes of densities. Specifically, we establish optimal rates for unpenalized nonparametric maximum likelihood estimation over bounded H\"older-type balls, and then for certain Sobolev-penalized estimators and sieved wavelet estimators.
    DIGRAC: Digraph Clustering Based on Flow Imbalance. (arXiv:2106.05194v7 [stat.ML] UPDATED)
    Node clustering is a powerful tool in the analysis of networks. We introduce a graph neural network framework to obtain node embeddings for directed networks in a self-supervised manner, including a novel probabilistic imbalance loss, which can be used for network clustering. Here, we propose directed flow imbalance measures, which are tightly related to directionality, to reveal clusters in the network even when there is no density difference between clusters. In contrast to standard approaches in the literature, in this paper, directionality is not treated as a nuisance, but rather contains the main signal. DIGRAC optimizes directed flow imbalance for clustering without requiring label supervision, unlike existing graph neural network methods, and can naturally incorporate node features, unlike existing spectral methods. Extensive experimental results on synthetic data, in the form of directed stochastic block models, and real-world data at different scales, demonstrate that our method, based on flow imbalance, attains state-of-the-art results on directed graph clustering when compared against 10 state-of-the-art methods from the literature, for a wide range of noise and sparsity levels, graph structures and topologies, and even outperforms supervised methods.
    HiPart: Hierarchical Divisive Clustering Toolbox. (arXiv:2209.08680v1 [stat.ML])
    This paper presents the HiPart package, an open-source native python library that provides efficient and interpret-able implementations of divisive hierarchical clustering algorithms. HiPart supports interactive visualizations for the manipulation of the execution steps allowing the direct intervention of the clustering outcome. This package is highly suited for Big Data applications as the focus has been given to the computational efficiency of the implemented clustering methodologies. The dependencies used are either Python build-in packages or highly maintained stable external packages. The software is provided under the MIT license. The package's source code and documentation can be found at https://github.com/panagiotisanagnostou/HiPart.
    A Survey of Deep Causal Model. (arXiv:2209.08860v1 [stat.ML])
    The concept of causality plays an important role in human cognition . In the past few decades, causal inference has been well developed in many fields, such as computer science, medicine, economics, and education. With the advancement of deep learning techniques, it has been increasingly used in causal inference against counterfactual data. Typically, deep causal models map the characteristics of covariates to a representation space and then design various objective optimization functions to estimate counterfactual data unbiasedly based on the different optimization methods. This paper focuses on the survey of the deep causal models, and its core contributions are as follows: 1) we provide relevant metrics under multiple treatments and continuous-dose treatment; 2) we incorporate a comprehensive overview of deep causal models from both temporal development and method classification perspectives; 3) we assist a detailed and comprehensive classification and analysis of relevant datasets and source code.
    Efficient Subgraph Isomorphism using Graph Topology. (arXiv:2209.09090v1 [stat.ML])
    Subgraph isomorphism or subgraph matching is generally considered as an NP-complete problem, made more complex in practical applications where the edge weights take real values and are subject to measurement noise and possible anomalies. To the best of our knowledge, almost all subgraph matching methods utilize node labels to perform node-node matching. In the absence of such labels (in applications such as image matching and map matching among others), these subgraph matching methods do not work. We propose a method for identifying the node correspondence between a subgraph and a full graph in the inexact case without node labels in two steps - (a) extract the minimal unique topology preserving subset from the subgraph and find its feasible matching in the full graph, and (b) implement a consensus-based algorithm to expand the matched node set by pairing unique paths based on boundary commutativity. Going beyond the existing subgraph matching approaches, the proposed method is shown to have realistically sub-linear computational efficiency, robustness to random measurement noise, and good statistical properties. Our method is also readily applicable to the exact matching case without loss of generality. To demonstrate the effectiveness of the proposed method, a simulation and a case study is performed on the Erdos-Renyi random graphs and the image-based affine covariant features dataset respectively.
    Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation. (arXiv:2209.08579v1 [stat.ML])
    Causal discovery for quantitative data has been extensively studied but less is known for categorical data. We propose a novel causal model for categorical data based on a new classification model, termed classification with optimal label permutation (COLP). By design, COLP is a parsimonious classifier, which gives rise to a provably identifiable causal model. A simple learning algorithm via comparing likelihood functions of causal and anti-causal models suffices to learn the causal direction. Through experiments with synthetic and real data, we demonstrate the favorable performance of the proposed COLP-based causal model compared to state-of-the-art methods. We also make available an accompanying R package COLP, which contains the proposed causal discovery algorithm and a benchmark dataset of categorical cause-effect pairs.
    Data-driven and machine-learning based prediction of wave propagation behavior in dam-break flood. (arXiv:2209.08729v1 [physics.flu-dyn])
    The computational prediction of wave propagation in dam-break floods is a long-standing problem in hydrodynamics and hydrology. Until now, conventional numerical models based on Saint-Venant equations are the dominant approaches. Here we show that a machine learning model that is well-trained on a minimal amount of data, can help predict the long-term dynamic behavior of a one-dimensional dam-break flood with satisfactory accuracy. For this purpose, we solve the Saint-Venant equations for a one-dimensional dam-break flood scenario using the Lax-Wendroff numerical scheme and train the reservoir computing echo state network (RC-ESN) with the dataset by the simulation results consisting of time-sequence flow depths. We demonstrate a good prediction ability of the RC-ESN model, which ahead predicts wave propagation behavior 286 time-steps in the dam-break flood with a root mean square error (RMSE) smaller than 0.01, outperforming the conventional long short-term memory (LSTM) model which reaches a comparable RMSE of only 81 time-steps ahead. To show the performance of the RC-ESN model, we also provide a sensitivity analysis of the prediction accuracy concerning the key parameters including training set size, reservoir size, and spectral radius. Results indicate that the RC-ESN are less dependent on the training set size, a medium reservoir size K=1200~2600 is sufficient. We confirm that the spectral radius \r{ho} shows a complex influence on the prediction accuracy and suggest a smaller spectral radius \r{ho} currently. By changing the initial flow depth of the dam break, we also obtained the conclusion that the prediction horizon of RC-ESN is larger than that of LSTM.
    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming. (arXiv:2209.08901v1 [math.OC])
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose an exact approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.
    Community detection for directed weighted networks. (arXiv:2109.10319v3 [stat.ML] UPDATED)
    \cite{rohe2016co} proposed Stochastic co-Blockmodel (ScBM) as a tool for detecting community structure of binary directed graph data in network studies. However, ScBM completely ignores node weight, and is unable to explain the block structure of directed weighted network which appears in various areas, such as biology, sociology, physiology and computer science. Here, to model directed weighted network, we introduce a Directed Distribution-Free model by releasing ScBM's distribution restriction. We also build an extension of the proposed model by considering variation of node degree. Our models do not require a specific distribution on generating elements of adjacency matrix but only a block structure on the expected adjacency matrix. Spectral algorithms with theoretical guarantee on consistent estimation of node label are presented to identify communities. Our proposed methods are illustrated by simulated and empirical examples.  ( 2 min )
    Graph Unlearning. (arXiv:2103.14991v2 [cs.LG] UPDATED)
    Machine unlearning is a process of removing the impact of some training data from the machine learning (ML) models upon receiving removal requests. While straightforward and legitimate, retraining the ML model from scratch incurs a high computational overhead. To address this issue, a number of approximate algorithms have been proposed in the domain of image and text data, among which SISA is the state-of-the-art solution. It randomly partitions the training set into multiple shards and trains a constituent model for each shard. However, directly applying SISA to the graph data can severely damage the graph structural information, and thereby the resulting ML model utility. In this paper, we propose GraphEraser, a novel machine unlearning framework tailored to graph data. Its contributions include two novel graph partition algorithms and a learning-based aggregation method. We conduct extensive experiments on five real-world graph datasets to illustrate the unlearning efficiency and model utility of GraphEraser. It achieves 2.06$\times$ (small dataset) to 35.94$\times$ (large dataset) unlearning time improvement. On the other hand, GraphEraser achieves up to $62.5\%$ higher F1 score and our proposed learning-based aggregation method achieves up to $112\%$ higher F1 score.\footnote{Our code is available at \url{https://github.com/MinChen00/Graph-Unlearning}.}  ( 3 min )
    Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. (arXiv:2201.11729v5 [cs.LG] UPDATED)
    In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural networks. Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. This translates to an implicit regularization towards locality for the associated convolutional networks. Inspired by our theory, we design explicit regularization discouraging locality, and demonstrate its ability to improve the performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.  ( 3 min )
    PyTorch Geometric Signed Directed: A Software Package on Graph Neural Networks for Signed and Directed Graphs. (arXiv:2202.10793v3 [cs.LG] UPDATED)
    Networks are ubiquitous in many real-world applications (e.g., social networks encoding trust/distrust relationships, correlation networks arising from time series data). While many networks are signed or directed, or both, there is a lack of unified software packages on graph neural networks (GNNs) specially designed for signed and directed networks. In this paper, we present PyTorch Geometric Signed Directed, a software package which fills this gap. Along the way, we also provide a brief review surveying typical tasks, loss functions and evaluation metrics in the analysis of signed and directed networks, discuss data used in related experiments, provide an overview of methods proposed, and evaluate the implemented methods with experiments. The deep learning framework consists of easy-to-use GNN models, synthetic and real-world data, as well as task-specific evaluation metrics and loss functions for signed and directed networks. As an extension library for PyTorch Geometric, our proposed software is maintained with open-source releases, detailed documentation, continuous integration, unit tests and code coverage checks. Our code is publicly available at \url{https://github.com/SherylHYX/pytorch_geometric_signed_directed}.  ( 3 min )
    An $l_1$-oracle inequality for the Lasso in high-dimensional mixtures of experts models. (arXiv:2009.10622v4 [math.ST] UPDATED)
    Mixtures of experts (MoE) models are a popular framework for modeling heterogeneity in data, for both regression and classification problems in statistics and machine learning, due to their flexibility and the abundance of available statistical estimation and model choice tools. Such flexibility comes from allowing the mixture weights (or gating functions) in the MoE model to depend on the explanatory variables, along with the experts (or component densities). This permits the modeling of data arising from more complex data generating processes when compared to the classical finite mixtures and finite mixtures of regression models, whose mixing parameters are independent of the covariates. The use of MoE models in a high-dimensional setting, when the number of explanatory variables can be much larger than the sample size, is challenging from a computational point of view, and in particular from a theoretical point of view, where the literature is still lacking results for dealing with the curse of dimensionality, for both the statistical estimation and feature selection problems. We consider the finite MoE model with soft-max gating functions and Gaussian experts for high-dimensional regression on heterogeneous data, and its $l_1$-regularized estimation via the Lasso. We focus on the Lasso estimation properties rather than its feature selection properties. We provide a lower bound on the regularization parameter of the Lasso function that ensures an $l_1$-oracle inequality satisfied by the Lasso estimator according to the Kullback--Leibler loss.
    Relational Reasoning Network (RRN) for Anatomical Landmarking. (arXiv:1904.04354v2 [cs.LG] UPDATED)
    Purpose: We perform anatomical landmarking for craniomaxillofacial (CMF) bones without explicitly segmenting them. Towards this, we propose a new simple yet efficient deep network architecture, called \textit{relational reasoning network (RRN)}, to accurately learn the local and the global relations among the landmarks in CMF bones; specifically, mandible, maxilla, and nasal bones. Approach: The proposed RRN works in an end-to-end manner, utilizing learned relations of the landmarks based on dense-block units. For a given few landmarks as input, RRN treats the landmarking process similar to a data imputation problem where predicted landmarks are considered missing. Results: We applied RRN to cone beam computed tomography scans obtained from 250 patients. With a 4-fold cross validation technique, we obtained an average root mean squared error of less than 2 mm per landmark. Our proposed RRN has revealed unique relationships among the landmarks that help us in inferring several \textit{reasoning} about informativeness of the landmark points. The proposed system identifies the missing landmark locations accurately even when severe pathology or deformation are present in the bones. Conclusions: Accurately identifying anatomical landmarks is a crucial step in deformation analysis and surgical planning for CMF surgeries. Achieving this goal without the need for explicit bone segmentation addresses a major limitation of segmentation based approaches, where segmentation failure (as often the case in bones with severe pathology or deformation) could easily lead to incorrect landmarking. To the best of our knowledge, this is the first of its kind algorithm finding anatomical relations of the objects using deep learning.  ( 3 min )
    Batch Policy Learning in Average Reward Markov Decision Processes. (arXiv:2007.11771v3 [math.ST] UPDATED)
    We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy and we establish a finite-sample regret guarantee. The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.  ( 2 min )
    Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent. (arXiv:2205.07331v3 [math.NA] UPDATED)
    In this paper, we study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations using a general class of objective functions. Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic partial differential equations (PDEs) as special cases. We consider a potentially infinite-dimensional parameterization of our model using a suitable Reproducing Kernel Hilbert Space and a continuous parameterization of problem hardness through the definition of kernel integral operators. We prove that gradient descent over this objective function can also achieve statistical optimality and the optimal number of passes over the data increases with sample size. Based on our theory, we explain an implicit acceleration of using a Sobolev norm as the objective function for training, inferring that the optimal number of epochs of DRM becomes larger than the number of PINN when both the data size and the hardness of tasks increase, although both DRM and PINN can achieve statistical optimality.  ( 3 min )
    Robust leave-one-out cross-validation for high-dimensional Bayesian models. (arXiv:2209.09190v1 [stat.CO])
    Leave-one-out cross-validation (LOO-CV) is a popular method for estimating out-of-sample predictive accuracy. However, computing LOO-CV criteria can be computationally expensive due to the need to fit the model multiple times. In the Bayesian context, importance sampling provides a possible solution but classical approaches can easily produce estimators whose variance is infinite, making them potentially unreliable. Here we propose and analyze a novel mixture estimator to compute Bayesian LOO-CV criteria. Our method retains the simplicity and computational convenience of classical approaches, while guaranteeing finite variance of the resulting estimators. Both theoretical and numerical results are provided to illustrate the improved robustness and efficiency. The computational benefits are particularly significant in high-dimensional problems, allowing to perform Bayesian LOO-CV for a broader range of models. The proposed methodology is easily implementable in standard probabilistic programming software and has a computational cost roughly equivalent to fitting the original model once.  ( 2 min )
    A novel approach for wafer defect pattern classification based on topological data analysis. (arXiv:2209.08945v1 [cs.LG])
    In semiconductor manufacturing, wafer map defect pattern provides critical information for facility maintenance and yield management, so the classification of defect patterns is one of the most important tasks in the manufacturing process. In this paper, we propose a novel way to represent the shape of the defect pattern as a finite-dimensional vector, which will be used as an input for a neural network algorithm for classification. The main idea is to extract the topological features of each pattern by using the theory of persistent homology from topological data analysis (TDA). Through some experiments with a simulated dataset, we show that the proposed method is faster and much more efficient in training with higher accuracy, compared with the method using convolutional neural networks (CNN) which is the most common approach for wafer map defect pattern classification. Moreover, our method outperforms the CNN-based method when the number of training data is not enough and is imbalanced.  ( 2 min )
    Adversarial Robustness through Bias Variance Decomposition: A New Perspective for Federated Learning. (arXiv:2009.09026v3 [cs.LG] UPDATED)
    Federated learning learns a neural network model by aggregating the knowledge from a group of distributed clients under the privacy-preserving constraint. In this work, we show that this paradigm might inherit the adversarial vulnerability of the centralized neural network, i.e., it has deteriorated performance on adversarial examples when the model is deployed. This is even more alarming when federated learning paradigm is designed to approximate the updating behavior of a centralized neural network. To solve this problem, we propose an adversarially robust federated learning framework, named Fed_BVA, with improved server and client update mechanisms. This is motivated by our observation that the generalization error in federated learning can be naturally decomposed into the bias and variance triggered by multiple clients' predictions. Thus, we propose to generate the adversarial examples via maximizing the bias and variance during server update, and learn the adversarially robust model updates with those examples during client update. As a result, an adversarially robust neural network can be aggregated from these improved local clients' model updates. The experiments are conducted on multiple benchmark data sets using several prevalent neural network models, and the empirical results show that our framework is robust against white-box and black-box adversarial corruptions under both IID and non-IID settings.  ( 3 min )
    Generalization Bounds for Stochastic Gradient Descent via Localized $\varepsilon$-Covers. (arXiv:2209.08951v1 [stat.ML])
    In this paper, we propose a new covering technique localized for the trajectories of SGD. This localization provides an algorithm-specific complexity measured by the covering number, which can have dimension-independent cardinality in contrast to standard uniform covering arguments that result in exponential dimension dependency. Based on this localized construction, we show that if the objective function is a finite perturbation of a piecewise strongly convex and smooth function with $P$ pieces, i.e. non-convex and non-smooth in general, the generalization error can be upper bounded by $O(\sqrt{(\log n\log(nP))/n})$, where $n$ is the number of data samples. In particular, this rate is independent of dimension and does not require early stopping and decaying step size. Finally, we employ these results in various contexts and derive generalization bounds for multi-index linear models, multi-class support vector machines, and $K$-means clustering for both hard and soft label setups, improving the known state-of-the-art rates.  ( 2 min )
    Adaptive Multi-stage Density Ratio Estimation for Learning Latent Space Energy-based Model. (arXiv:2209.08739v1 [cs.LG])
    This paper studies the fundamental problem of learning energy-based model (EBM) in the latent space of the generator model. Learning such prior model typically requires running costly Markov Chain Monte Carlo (MCMC). Instead, we propose to use noise contrastive estimation (NCE) to discriminatively learn the EBM through density ratio estimation between the latent prior density and latent posterior density. However, the NCE typically fails to accurately estimate such density ratio given large gap between two densities. To effectively tackle this issue and learn more expressive prior models, we develop the adaptive multi-stage density ratio estimation which breaks the estimation into multiple stages and learn different stages of density ratio sequentially and adaptively. The latent prior model can be gradually learned using ratio estimated in previous stage so that the final latent space EBM prior can be naturally formed by product of ratios in different stages. The proposed method enables informative and much sharper prior than existing baselines, and can be trained efficiently. Our experiments demonstrate strong performances in image generation and reconstruction as well as anomaly detection.  ( 2 min )
    Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$. (arXiv:2209.08399v1 [cs.LG])
    Two aspects of neural networks that have been extensively studied in the recent literature are their function approximation properties and their training by gradient descent methods. The approximation problem seeks accurate approximations with a minimal number of weights. In most of the current literature these weights are fully or partially hand-crafted, showing the capabilities of neural networks but not necessarily their practical performance. In contrast, optimization theory for neural networks heavily relies on an abundance of weights in over-parametrized regimes. This paper balances these two demands and provides an approximation result for shallow networks in $1d$ with non-convex weight optimization by gradient descent. We consider finite width networks and infinite sample limits, which is the typical setup in approximation theory. Technically, this problem is not over-parametrized, however, some form of redundancy reappears as a loss in approximation rate compared to best possible rates.  ( 2 min )
    Estimating and Explaining Model Performance When Both Covariates and Labels Shift. (arXiv:2209.08436v1 [stat.ML])
    Deployed machine learning (ML) models often encounter new user data that differs from their training data. Therefore, estimating how well a given model might perform on the new data is an important step toward reliable ML applications. This is very challenging, however, as the data distribution can change in flexible ways, and we may not have any labels on the new data, which is often the case in monitoring settings. In this paper, we propose a new distribution shift model, Sparse Joint Shift (SJS), which considers the joint shift of both labels and a few features. This unifies and generalizes several existing shift models including label shift and sparse covariate shift, where only marginal feature or label distribution shifts are considered. We describe mathematical conditions under which SJS is identifiable. We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels. We conduct extensive experiments on several real-world datasets with various ML models. Across different datasets and distribution shifts, SEES achieves significant (up to an order of magnitude) shift estimation error improvements over existing approaches.  ( 2 min )
    DynaConF: Dynamic Forecasting of Non-Stationary Time-Series. (arXiv:2209.08411v1 [cs.LG])
    Deep learning models have shown impressive results in a variety of time series forecasting tasks, where modeling the conditional distribution of the future given the past is the essence. However, when this conditional distribution is non-stationary, it poses challenges for these models to learn consistently and to predict accurately. In this work, we propose a new method to model non-stationary conditional distributions over time by clearly decoupling stationary conditional distribution modeling from non-stationary dynamics modeling. Our method is based on a Bayesian dynamic model that can adapt to conditional distribution changes and a deep conditional distribution model that can handle large multivariate time series using a factorized output space. Our experimental results on synthetic and popular public datasets show that our model can adapt to non-stationary time series better than state-of-the-art deep learning solutions.  ( 2 min )
    Covariance regression with random forests. (arXiv:2209.08173v1 [stat.ME])
    Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. We also demonstrate an application of the proposed method with a thyroid disease data set.  ( 2 min )
    Low-Rank Covariance Completion for Graph Quilting with Applications to Functional Connectivity. (arXiv:2209.08273v1 [stat.ME])
    As a tool for estimating networks in high dimensions, graphical models are commonly applied to calcium imaging data to estimate functional neuronal connectivity, i.e. relationships between the activities of neurons. However, in many calcium imaging data sets, the full population of neurons is not recorded simultaneously, but instead in partially overlapping blocks. This leads to the Graph Quilting problem, as first introduced by (Vinci et.al. 2019), in which the goal is to infer the structure of the full graph when only subsets of features are jointly observed. In this paper, we study a novel two-step approach to Graph Quilting, which first imputes the complete covariance matrix using low-rank covariance completion techniques before estimating the graph structure. We introduce three approaches to solve this problem: block singular value decomposition, nuclear norm penalization, and non-convex low-rank factorization. While prior works have studied low-rank matrix completion, we address the challenges brought by the block-wise missingness and are the first to investigate the problem in the context of graph learning. We discuss theoretical properties of the two-step procedure, showing graph selection consistency of one proposed approach by proving novel L infinity-norm error bounds for matrix completion with block-missingness. We then investigate the empirical performance of the proposed methods on simulations and on real-world data examples, through which we show the efficacy of these methods for estimating functional connectivity from calcium imaging data.  ( 3 min )
    Sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm. (arXiv:2209.08139v1 [stat.ME])
    Bayesian variable selection methods are powerful techniques for fitting and inferring on sparse high-dimensional linear regression models. However, many are computationally intensive or require restrictive prior distributions on model parameters. Likelihood based penalization methods are more computationally friendly, but resource intensive refitting techniques are needed for inference. In this paper, we proposed an efficient and powerful Bayesian approach for sparse high-dimensional linear regression. Minimal prior assumptions on the parameters are required through the use of plug-in empirical Bayes estimates of hyperparameters. Efficient maximum a posteriori probability (MAP) estimation is completed through the use of a partitioned and extended expectation conditional maximization (ECM) algorithm. The result is a PaRtitiOned empirical Bayes Ecm (PROBE) algorithm applied to sparse high-dimensional linear regression. We propose methods to estimate credible and prediction intervals for predictions of future values. We compare the empirical properties of predictions and our predictive inference to comparable approaches with numerous simulation studies and an analysis of cancer cell lines drug response study. The proposed approach is implemented in the R package probe.  ( 2 min )
    A review of probabilistic forecasting and prediction with machine learning. (arXiv:2209.08307v1 [stat.ML])
    Predictions and forecasts of machine learning models should take the form of probability distributions, aiming to increase the quantity of information communicated to end users. Although applications of probabilistic prediction and forecasting with machine learning models in academia and industry are becoming more frequent, related concepts and methods have not been formalized and structured under a holistic view of the entire field. Here, we review the topic of predictive uncertainty estimation with machine learning algorithms, as well as the related metrics (consistent scoring functions and proper scoring rules) for assessing probabilistic predictions. The review covers a time period spanning from the introduction of early statistical (linear regression and time series models, based on Bayesian statistics or quantile regression) to recent machine learning algorithms (including generalized additive models for location, scale and shape, random forests, boosting and deep learning algorithms) that are more flexible by nature. The review of the progress in the field, expedites our understanding on how to develop new algorithms tailored to users' needs, since the latest advancements are based on some fundamental concepts applied to more complex algorithms. We conclude by classifying the material and discussing challenges that are becoming a hot topic of research.  ( 2 min )
    Joint Network Topology Inference via a Shared Graphon Model. (arXiv:2209.08223v1 [stat.ML])
    We consider the problem of estimating the topology of multiple networks from nodal observations, where these networks are assumed to be drawn from the same (unknown) random graph model. We adopt a graphon as our random graph model, which is a nonparametric model from which graphs of potentially different sizes can be drawn. The versatility of graphons allows us to tackle the joint inference problem even for the cases where the graphs to be recovered contain different number of nodes and lack precise alignment across the graphs. Our solution is based on combining a maximum likelihood penalty with graphon estimation schemes and can be used to augment existing network inference methods. The proposed joint network and graphon estimation is further enhanced with the introduction of a robust method for noisy graph sampling information. We validate our proposed approach by comparing its performance against competing methods in synthetic and real-world datasets.  ( 2 min )

  • Open

    How to Use Data Science for Search Engine Optimization
    Data science assists SEO experts in countless ways, like personalizing the customer experience, understanding client requirements, and many other things. The post How to Use Data Science for Search Engine Optimization appeared first on Data Science Central.  ( 20 min )
    Digital Twins as Building Blocks of the Metaverse
    The quintessential example of a digital twin is the wind turbine. A digital twin is a real-time virtual representation of a real-world physical system or process that serves as its digital counterpart of it. Like all models or abstractions, a twin is created for practical purposes i.e. we wish to model a physical system of a phenomenon to understand it better. The post Digital Twins as Building Blocks of the Metaverse appeared first on Data Science Central.  ( 20 min )
  • Open

    [D] How do you keep up to date on Machine Learning?
    Good evening, everyone. I hope everyone is well. We all know that the IT area as a whole, especially in the area of artificial intelligence, has been having almost daily updates about new methodologies, algorithms, tools, etc. How do you keep yourselves updated? In my case, I subscribe to some newsletters in the data area: Data Engineering Podcast Harvard Data Science Review Made With ML MLOps Newsletter Papers With Code The Batch The Variable Some other newsletters I subscribe to, but which are not directly connected to AI: ByteByteGo Medium NVIDIA submitted by /u/barash-616 [link] [comments]  ( 89 min )
    [R] TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues
    Hello! I wanted to share our recent work on understanding & explaining ML models through natural language conversations. We use dialogues as an accesible tool for model understanding, so anyone can "talk" to an ML model to understand it, like its another colleague. We also provide a flexible implementation you can adapt to your models & datasets. Twitter thread: https://twitter.com/dylanslack20/status/1571945003676737537 Paper: https://arxiv.org/abs/2207.04154 Code: https://github.com/dylan-slack/TalkToModel submitted by /u/dylan-slack [link] [comments]  ( 88 min )
    [D] Is the current limitation of machine learning due to the limitation of models or due to the limitation of computing powers and the number of parameters?
    Would we be able to achieve a breakthrough if we make a more robust model even though we do not increase the data set or the parameters of the model or would we have to wait until there's a breakthrough made in hardware so that we can play with models that have 1000 trillion parameters like our brains? Is it the problem of hardware or software? And if it is a problem of hardware, is AI really only reserved for the extremely wealthy organizations who can afford to train such a huge model? Was depressed after realizing that GPT-3 is not opensource and I need to pay a huge amount of $$ to use their API and that it would be impossible for me, a college student, to build and train such a huge model only with a laptop. I just wanna build a small chatbot that can have conversation with me... submitted by /u/After_Philosopher572 [link] [comments]  ( 91 min )
    [D] Non sequitur in Andrew Ng's Machine Learning lectures
    This is somewhat random... But I recently remembered that years back when I did Andrew Ng's Coursera course on machine learning I was watching one of his lectures in a library fully immersed in the technicalities when he suddenly dropped some random statement about "mathematicians" and "love" or something that made him and the audience laugh (and me also in the quiet library). Now I am trying to find that part in the lectures, but couldn't until now and I don't want to rewatch the entire thing. I think it's this playlist: https://www.youtube.com/watch?v=UzxYlbK2c7E&list=PLA89DCFA6ADACE599. Anyone know what I am talking about and maybe can pinpoint it to me? submitted by /u/doktorfaustus91 [link] [comments]  ( 89 min )
    [D] What cannot be missing in a Reinforcement Learning course?
    Good afternoon everyone. I hope you are all well. My advisor and I are planning a course on Reinforcement Learning to teach in our master's program at our university. This will be the first time we are teaching this course, so we have no previous experiences to use as a basis. Context: It is a professional master's program, so the students are inserted into the labor market. The graduate program has 3 lines of research: IT infrastructure; computational intelligence (AI); software engineering. Usually, the courses have students from all 3 research lines, so the intention would be to teach a course that both gives an overview of the area, and gives directions for the AI students to go deeper. What we have already thought about: For now, we only think in the coding context, we will use the following tools: Python; Numpy; JAX; PyTorch; Jupyter Notebook. With this, we intend to make algorithms from scratch (Numpy and JAX) and compare them to the approaches present in PyTorch. We will still have a meeting to decide the topics that will be covered in each class. We will put everything in a repository on GitHub to allow other people to use the materials developed for the course as input. In the future, we are thinking of adapting this course for undergraduate studies. What do you think you can't miss in this course? Which topics do you think are extremely important for a good overview of the area? submitted by /u/barash-616 [link] [comments]  ( 103 min )
    [P] I Resurrected “Ugly Sonic” with Stable Diffusion Textual Inversion
    Yes, you read the title correctly. This is more of a character study/shitpost testing out Stable Diffusion textual inversion to see how to control it / expected outputs. Turns out, it works better than I thought, and emphasizing/deemphasizing specific terms when using textual inversion works out well. The post also includes a custom inference notebook for multiple inversion concepts. submitted by /u/minimaxir [link] [comments]  ( 89 min )
    [R] Do we understand the Math behind an NN?
    So I read somewhere that we as humans don't understand what exactly happens in a neural network, we just know that a neuron does something using the biases and the inputs given to it and leads us to a specific output. My question here is, do we understand (mathematically speaking) how X input leads the computer to give Y input? If we don't, then why don't we know it? How does a computer go from an array of input pixels to identifying that this is an upside-down dog (mathematically)? submitted by /u/Skrrubs [link] [comments]  ( 112 min )
    [N] TorchStudio 0.9.10 (Training assistant for PyTorch) brings IDE Extensions for VS Code, PyCharm, Spyder and Sublime Text
    TorchStudio 0.9.10 was just released with extensions for all the major Python IDEs (VS Code, PyCharm, Spyder and Sublime Text) by popular request, looking forward for your comments ! One new tutorial and two new videos describe how to use TorchStudio from within your IDE. Download: https://www.torchstudio.ai/download/Full changelog: https://github.com/TorchStudio/torchstudio/releases/tag/0.9.10 If you're new to TorchStudio, here's an introductory tutorial and video: https://www.torchstudio.ai/getstarted/ https://www.youtube.com/watch?v=uvA-ARpKdCA https://preview.redd.it/21tufwslnto91.png?width=3074&format=png&auto=webp&s=802b954254f44b5a737070b300c029457fe58962 submitted by /u/divideconcept [link] [comments]  ( 89 min )
    [D] Resources to understand Diffusion Models?
    I am struggling to understand the nitty gritty of the diffusion models - what would be the right resource to understand all the maths behind it? submitted by /u/throwaway_reddevil9 [link] [comments]  ( 89 min )
    [D] Yannic Kilcher's ML News YouTube episode covering the latest Stable Diffusion developments, community efforts, and response from AI Ethics community
    Video: https://youtu.be/xbxe-x6wvRw Yannic Kilcher's summary: Stable Diffusion has been released and is riding a wave of creativity and collaboration. But not everyone is happy about this. This video takes a look at the vibrant open-source community around the model, and its critics. Watch here: https://youtu.be/xbxe-x6wvRw submitted by /u/wei_jok [link] [comments]  ( 91 min )
    [D] Feature Engineering & Model Selection workflow
    Hello everyone, I am confronted with a machine learning task where the potential for feature engineering is very large, but so is the space of possible models - which means that it's impossible to try out everything. Do you have a fixed model development workflow for this kind of situation? Do you first do feature engineering on some restricted models, and then move on to do model selection? I am looking something similar to this blog-post (which is a work-flow to train neural networks): A Recipe for Training Neural Networks Thanks for your help! submitted by /u/is_it_learning_yet [link] [comments]  ( 90 min )
    [D] Question about dual submission policies for AAAI and ICLR
    Hello, I submitted my first paper to AAAI and phase 1 notification is expected on Sep 27. I'm optimistic, but I keep hearing about lots of noise in the review process. ICLR abstract submission is Sep 21 and the final paper submission is Sep 28. So, can one submit an abstract to ICLR pending AAAI phase 1 notification? That is, can you submit an abstract to ICLR, then withdraw in case of accept and submit in case of reject? Thanks submitted by /u/gideon321 [link] [comments]  ( 88 min )
    [R] Human-level Atari 200x faster
    submitted by /u/hardmaru [link] [comments]  ( 88 min )
    [D] Using special tokens for a domain-specific language in transformers
    Hi everyone I've recently dived into ViTs, and a thought crossed my mind that I was surprised to not find many papers exploring. Special tokens are pretty common in transformer architectures, but they usually play a background role, such as structural (like [BEG], [END], [SEP]) or a placeholder of sorts ([CLS], [MASK]). But I feel like self-attention allows for far more intricate constructs, and theoretically one can create a whole "mini-language" to somehow influence model's behaviour. Is there a particular reason it wouldn't work? Are there any papers that do something similar? The only thing I've found is the recently published DyTox, but it just uses single task tokens for task selection. Also there are the Image+Text models like CoCa that just combine the patches with actual natural language, but I'm interested in something more focused. Here's a toy example that works pretty well: Input: an image of 6 EMNIST characters in a row a left/right direction token an argument token denoting one of the characters in the image e.g. "here's an image depicting 'A 5 H Z T 4', which character is to the left of Z?" Output: The character class of the answer, e.g. "H" in this example Here you can essentially create a domain-specific language out of learnable direction and argument tokens alongside the usual ViT patch embeddings, and it works pretty well, even generalizing to neighboring pairs that it haven't seen in the training dataset (something that didn't work well with "dumber" models). Is there just no popular usecase for something like this? submitted by /u/McAvagr [link] [comments]  ( 89 min )
    [P] How to do the weight rule update with Lasso or Ridge regularization?
    I'm trying to do an excersive for my ML class, where I have to do lineal regresion with regularization, with either Lasso or Ridge. Doing it with an iterative method, updating the weights, I have no idea how to actually do it. Considering this are the augmented error measurements for Lasso and Ridge, I know I have to calculate the gradient, and (I think?) then the update rule becomes: w_new = w_old - (gradient of E_aug). But I'm not actually sure, and even more so, I wouldn't know how to compute the gradient itself, especially for Lasso. submitted by /u/JavoUruguayo [link] [comments]  ( 89 min )
  • Open

    "Quark: Controllable Text Generation with Reinforced Unlearning", Lu et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Keras DQN for CartPole
    Hello, ​ Could someone suggest me a good link that has DQN code for Cartpole, using the Keras API. I just spent the entire day looking for codes and most have some sort of bugs. ​ The most common bug I find is, that they confuse DQN and DDQN, by just implementing the target network for the latter. submitted by /u/Academic-Rent7800 [link] [comments]  ( 108 min )
    Could starting with easy episodes lead to a faster DQN convergence?
    I'm trying to train a DQN Agent to play a (not very) simple game. I'm still new to RL but I have some experience with ML and programming in general so I'm trying to come up with different methods to improve the training time of my model that do not necessarily affect the model itself but also the data representation and the training process itself. The game is simple because generally there are "obvious" optimal states that the agent should try to get to. This gave me an idea to try and start with those obvious and easy states so that the agent can quickly learn that those states can lead to a high rewards. I would then gradually, over many episodes, "increase the difficulty" and let the agent figure how to get to those states. submitted by /u/Gonumen [link] [comments]  ( 105 min )
    "Human-level Atari 200x faster", Kapturowski et al 2022 {DM} (Agent57 optimization: trust-region+loss normalization+normalization-free nets+self-distillation)
    submitted by /u/gwern [link] [comments]  ( 87 min )
  • Open

    Currently, what's our current AI status with the Turing Test?
    What are some last year examples of the most advanced insights on the Turing Test area? submitted by /u/aladoconpapas [link] [comments]  ( 91 min )
    HD photo of a cat dressed as french emperor Napoleon, studio lighting, HDR | Made with Stable Diffusion
    submitted by /u/gvij [link] [comments]  ( 87 min )
    Is there a conversational ai assistant yet?
    All the mainstream once’s (Siri, Alexa,google home) feel much more like rule based chatbots. I was hoping there was something that felt more natural? Has anyone tried mycroft? submitted by /u/Kinghonk69 [link] [comments]  ( 87 min )
    This environmentally friendly quantum sensor runs on sunlight
    submitted by /u/FinneanCosgra [link] [comments]  ( 86 min )
    Superintelligence cannot be contained: Lessons from Computability Theory
    submitted by /u/Futures_Bot [link] [comments]  ( 90 min )
    Welcome to the Internet BUT Lyrics Are Illustrated by AI
    submitted by /u/Swisheater [link] [comments]  ( 87 min )
    AI Generated Art girls wearing hats like Micheal Jackson 💃💖!
    submitted by /u/OceanicFeel [link] [comments]  ( 87 min )
    The challenges of adversarial machine learning in constrained-feature applications
    submitted by /u/bendee983 [link] [comments]  ( 87 min )
    How to make the most of Stable Diffusion
    Hello, Stable Diffusion is a great text to image alternative to DALL-E 2 and MidJourney. But if you are a beginner you will quickly realize that creating the right request to generate great images is not necessarily easy. In general, such requests are quite intuitive, but for the most advanced results you might need to use a couple of tricks. Which is why I wrote this quick guide: https://nlpcloud.com/effectively-using-text-to-image-with-stable-diffusion-dalle-2-alternative.html I hope it will be useful! And if you are aware of some nice techniques that are missing in this article, please let me know! Julien submitted by /u/juliensalinas [link] [comments]  ( 87 min )
    Can we not turn /r/artificial into an art forum?
    The title says it. I left the Stable diffusion subreddit because everyone posted mildly but mostly not so interesting AI-generated images. Seeing this subreddit start to receive lots of these as crossposts. submitted by /u/jetstros [link] [comments]  ( 90 min )
    Angela Bassett as Storm [xpost /r/dreamcasting]
    submitted by /u/dream_casting [link] [comments]  ( 87 min )
    AI that provides correct description of a human made art?
    Is there any AI model that does this? submitted by /u/Xie_Bot [link] [comments]  ( 87 min )
    Stable Diffusion Weekly AI Art Video 30FPS HD 9.18.22 Gallery of the Inf...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
  • Open

    Two-letter vs Three-letter Country Abbreviations
    The ISO 3166-1 standard defines three codes for each country: a 2-letter abbreviation, a 3-letter abbreviation, and a 3-digit code. The 2-letter abbreviations may be familiar because it is very often (but not always [1]) also the country code top-level domain (ccTLD). For example, AU is the ISO abbreviation for Australia, and .au is the […] Two-letter vs Three-letter Country Abbreviations first appeared on John D. Cook.  ( 5 min )
    Finding similar world flags with Mathematica
    A week ago I posted some pairs of similar flags on Twitter, and later I found that Mathematica’s CountryData database contains flag descriptions. So I thought I’d use the flag descriptions to see which flags Mathematica things are similar. For example, the FlagDescription attribute for Chad in Mathematica is Three equal vertical bands of blue […] Finding similar world flags with Mathematica first appeared on John D. Cook.  ( 5 min )
  • Open

    Parallel data processing with RStudio on Amazon SageMaker
    Last year, we announced the general availability of RStudio on Amazon SageMaker, the industry’s first fully managed RStudio Workbench integrated development environment (IDE) in the cloud. You can quickly launch the familiar RStudio IDE, and dial up and down the underlying compute resources without interrupting your work, making it easy to build machine learning (ML) […]  ( 7 min )
  • Open

    AI Models vs. AI Systems: Understanding Units of Performance Assessment
    As AI becomes more deeply integrated into every aspect of our lives, it is essential that AI systems perform appropriately for their intended use. We know AI models can never be perfect, so how do we decide when AI performance is ‘good enough’ for use in a real life application? Is level of accuracy a […] The post AI Models vs. AI Systems: Understanding Units of Performance Assessment appeared first on Microsoft Research.  ( 12 min )
  • Open

    A Gentle Introduction to Positional Encoding In Transformer Models, Part 1
    In languages the order of the words and their position in a sentence really matters. The meaning of the entire sentence can change if the words are re-ordered. When implementing NLP solutions, the recurrent neural networks have an inbuilt mechanism that deals with the order of sequences. The transformer model, however, does not use recurrence […] The post A Gentle Introduction to Positional Encoding In Transformer Models, Part 1 appeared first on Machine Learning Mastery.
  • Open

    Why AI-Based Startups Are Considered Failure?
    There have been startups that have survived, and there have been startups that have failed. However, what makes one startup succeed over…  ( 10 min )
  • Open

    Keeping Learning-Based Control Safe by Regulating Distributional Shift
    To regulate the distribution shift experience by learning-based controllers, we seek a mechanism for constraining the agent to regions of high data density throughout its trajectory (left). Here, we present an approach which achieves this goal by combining features of density models (middle) and Lyapunov functions (right). In order to make use of machine learning and reinforcement learning in controlling real world systems, we must design algorithms which not only achieve good performance, but also interact with the system in a safe and reliable manner. Most prior work on safety-critical control focuses on maintaining the safety of the physical system, e.g. avoiding falling over for legged robots, or colliding into obstacles for autonomous vehicles. However, for learning-based controlle…  ( 7 min )
  • Open

    UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning. (arXiv:2108.09033v2 [cs.CR] UPDATED)
    Training deep neural networks often forces users to work in a distributed or outsourced setting, accompanied with privacy concerns. Split learning aims to address this concern by distributing the model among a client and a server. The scheme supposedly provides privacy, since the server cannot see the clients' models and inputs. We show that this is not true via two novel attacks. (1) We show that an honest-but-curious split learning server, equipped only with the knowledge of the client neural network architecture, can recover the input samples and obtain a functionally similar model to the client model, without being detected. (2) We show that if the client keeps hidden only the output layer of the model to "protect" the private labels, the honest-but-curious server can infer the labels with perfect accuracy. We test our attacks using various benchmark datasets and against proposed privacy-enhancing extensions to split learning. Our results show that plaintext split learning can pose serious risks, ranging from data (input) privacy to intellectual property (model parameters), and provide no more than a false sense of security.
    Properties and Performance of the ABCDe Random Graph Model with Community Structure. (arXiv:2203.14899v2 [cs.SI] UPDATED)
    In this paper, we investigate properties and performance of synthetic random graph models with a built-in community structure. Such models are important for evaluating and tuning community detection algorithms that are unsupervised by nature. We propose ABCDe, a multi-threaded implementation of the ABCD (Artificial Benchmark for Community Detection) graph generator. We discuss the implementation details of the algorithm and compare it with both the previously available sequential version of the ABCD model and with the parallel implementation of the standard and extensively used LFR (Lancichinetti--Fortunato--Radicchi) generator. We show that ABCDe is more than ten times faster and scales better than the parallel implementation of LFR provided in NetworKit. Moreover, the algorithm is not only faster but random graphs generated by ABCD have similar properties to the ones generated by the original LFR algorithm, while the parallelized NetworKit implementation of LFR produces graphs that have noticeably different characteristics.
    Broad Recommender System: An Efficient Nonlinear Collaborative Filtering Approach. (arXiv:2204.11602v2 [cs.IR] UPDATED)
    Recently, Deep Neural Networks (DNNs) have been widely introduced into Collaborative Filtering (CF) to produce more accurate recommendation results due to their capability of capturing the complex nonlinear relationships between items and users.However, the DNNs-based models usually suffer from high computational complexity, i.e., consuming very long training time and storing huge amount of trainable parameters. To address these problems, we propose a new broad recommender system called Broad Collaborative Filtering (BroadCF), which is an efficient nonlinear collaborative filtering approach. Instead of DNNs, Broad Learning System (BLS) is used as a mapping function to learn the complex nonlinear relationships between users and items, which can avoid the above issues while achieving very satisfactory recommendation performance. However, it is not feasible to directly feed the original rating data into BLS. To this end, we propose a user-item rating collaborative vector preprocessing procedure to generate low-dimensional user-item input data, which is able to harness quality judgments of the most similar users/items. Extensive experiments conducted on seven benchmark datasets have confirmed the effectiveness of the proposed BroadCF algorithm
    Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study. (arXiv:2209.07736v1 [cs.LG])
    Neural tangent kernel (NTK) is a powerful tool to analyze training dynamics of neural networks and their generalization bounds. The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp), e.g., StyleGAN and polynomial neural networks. In this work, we derive the finite-width NTK formulation for a special class of NNs-Hp, i.e., polynomial neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK. Based on our results, we elucidate the separation of PNNs over standard neural networks with respect to extrapolation and spectral bias. Our two key insights are that when compared to standard neural networks, PNNs are able to fit more complicated functions in the extrapolation regime and admit a slower eigenvalue decay of the respective NTK. Besides, our theoretical results can be extended to other types of NNs-Hp, which expand the scope of our work. Our empirical results validate the separations in broader classes of NNs-Hp, which provide a good justification for a deeper understanding of neural architectures.
    Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo. (arXiv:2202.04599v3 [cs.LG] UPDATED)
    Variational Autoencoders (VAEs) have recently been highly successful at imputing and acquiring heterogeneous missing data. However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables and strictly Gaussian posterior approximations. To address these limitations, we present HH-VAEM, a Hierarchical VAE model for mixed-type incomplete data that uses Hamiltonian Monte Carlo with automatic hyper-parameter tuning for improved approximate inference. Our experiments show that HH-VAEM outperforms existing baselines in the tasks of missing data imputation and supervised learning with missing features. Finally, we also present a sampling-based approach for efficiently computing the information gain when missing features are to be acquired with HH-VAEM. Our experiments show that this sampling-based approach is superior to alternatives based on Gaussian approximations.
    Neuromuscular Reinforcement Learning to Actuate Human Limbs through FES. (arXiv:2209.07849v1 [cs.LG])
    Functional Electrical Stimulation (FES) is a technique to evoke muscle contraction through low-energy electrical signals. FES can animate paralysed limbs. Yet, an open challenge remains on how to apply FES to achieve desired movements. This challenge is accentuated by the complexities of human bodies and the non-stationarities of the muscles' responses. The former causes difficulties in performing inverse dynamics, and the latter causes control performance to degrade over extended periods of use. Here, we engage the challenge via a data-driven approach. Specifically, we learn to control FES through Reinforcement Learning (RL) which can automatically customise the stimulation for the patients. However, RL typically has Markovian assumptions while FES control systems are non-Markovian because of the non-stationarities. To deal with this problem, we use a recurrent neural network to create Markovian state representations. We cast FES controls into RL problems and train RL agents to control FES in different settings in both simulations and the real world. The results show that our RL controllers can maintain control performances over long periods and have better stimulation characteristics than PID controllers.
    Machine Learning Decoder for 5G NR PUCCH Format 0. (arXiv:2209.07861v1 [cs.NI])
    5G cellular systems depend on the timely exchange of feedback control information between the user equipment and the base station. Proper decoding of this control information is necessary to set up and sustain high throughput radio links. This paper makes the first attempt at using Machine Learning techniques to improve the decoding performance of the Physical Uplink Control Channel Format 0. We use fully connected neural networks to classify the received samples based on the uplink control information content embedded within them. The trained neural network, tested on real-time wireless captures, shows significant improvement in accuracy over conventional DFT-based decoders, even at low SNR. The obtained accuracy results also demonstrate conformance with 3GPP requirements.
    Minibatch Stochastic Three Points Method for Unconstrained Smooth Minimization. (arXiv:2209.07883v1 [math.OC])
    In this paper, we propose a new zero order optimization method called minibatch stochastic three points (MiSTP) method to solve an unconstrained minimization problem in a setting where only an approximation of the objective function evaluation is possible. It is based on the recently proposed stochastic three points (STP) method (Bergou et al., 2020). At each iteration, MiSTP generates a random search direction in a similar manner to STP, but chooses the next iterate based solely on the approximation of the objective function rather than its exact evaluations. We also analyze our method's complexity in the nonconvex and convex cases and evaluate its performance on multiple machine learning tasks.
    Learning the Quality of Machine Permutations in Job Shop Scheduling. (arXiv:2207.03244v2 [cs.LG] UPDATED)
    In recent years, the power demonstrated by Machine Learning (ML) has increasingly attracted the interest of the optimization community that is starting to leverage ML for enhancing and automating the design of algorithms. One combinatorial optimization problem recently tackled with ML is the Job Shop scheduling Problem (JSP). Most of the works on the JSP using ML focus on Deep Reinforcement Learning (DRL), and only a few of them leverage supervised learning techniques. The recurrent reasons for avoiding supervised learning seem to be the difficulty in casting the right learning task, i.e., what is meaningful to predict, and how to obtain labels. Therefore, we first propose a novel supervised learning task that aims at predicting the quality of machine permutations. Then, we design an original methodology to estimate this quality, and we use these estimations to create an accurate sequential deep learning model (binary accuracy above 95%). Finally, we empirically demonstrate the value of predicting the quality of machine permutations by enhancing the performance of a simple Tabu Search algorithm inspired by the works in the literature.
    Neurons on Amoebae. (arXiv:2106.03695v2 [math.AG] UPDATED)
    We apply methods of machine-learning, such as neural networks, manifold learning and image processing, in order to study 2-dimensional amoebae in algebraic geometry and string theory. With the help of embedding manifold projection, we recover complicated conditions obtained from so-called lopsidedness. For certain cases it could even reach $\sim99\%$ accuracy, in particular for the lopsided amoeba of $F_0$ with positive coefficients which we place primary focus. Using weights and biases, we also find good approximations to determine the genus for an amoeba at lower computational cost. In general, the models could easily predict the genus with over $90\%$ accuracies. With similar techniques, we also investigate the membership problem, and image processing of the amoebae directly.
    Two-view Graph Neural Networks for Knowledge Graph Completion. (arXiv:2112.09231v2 [cs.CL] UPDATED)
    We present an effective GNN-based knowledge graph embedding model, named WGE, to capture entity- and relation-focused graph structures. In particular, given the knowledge graph, WGE builds a single undirected entity-focused graph that views entities as nodes. In addition, WGE also constructs another single undirected graph from relation-focused constraints, which views entities and relations as nodes. WGE then proposes a GNN-based architecture to better learn vector representations of entities and relations from these two single entity- and relation-focused graphs. WGE feeds the learned entity and relation representations into a weighted score function to return the triple scores for knowledge graph completion. Experimental results show that WGE outperforms competitive baselines, obtaining state-of-the-art performances on seven benchmark datasets for knowledge graph completion.
    Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization. (arXiv:2203.02839v2 [cs.LG] UPDATED)
    We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparametrization. The model-free setting is considered, with minimal assumption on the rank or singular values of the observed matrix, where the global optima provably overfit. We show that vanilla gradient descent with small random initialization sequentially recovers the principal components of the observed matrix. Consequently, when equipped with proper early stopping, gradient descent produces the best low-rank approximation of the observed matrix without explicit regularization. We provide a sharp characterization of the relationship between the approximation error, iteration complexity, initialization size and stepsize. Our complexity bound is almost dimension-free and depends logarithmically on the approximation error, with significantly more lenient requirements on the stepsize and initialization compared to prior work. Our theoretical results provide accurate prediction for the behavior gradient descent, showing good agreement with numerical experiments.
    Sales Channel Optimization via Simulations Based on Observational Data with Delayed Rewards: A Case Study at LinkedIn. (arXiv:2209.07749v1 [cs.LG])
    Training models on data obtained from randomized experiments is ideal for making good decisions. However, randomized experiments are often time-consuming, expensive, risky, infeasible or unethical to perform, leaving decision makers little choice but to rely on observational data collected under historical policies when training models. This opens questions regarding not only which decision-making policies would perform best in practice, but also regarding the impact of different data collection protocols on the performance of various policies trained on the data, or the robustness of policy performance with respect to changes in problem characteristics such as action- or reward- specific delays in observing outcomes. We aim to answer such questions for the problem of optimizing sales channel allocations at LinkedIn, where sales accounts (leads) need to be allocated to one of three channels, with the goal of maximizing the number of successful conversions over a period of time. A key problem feature constitutes the presence of stochastic delays in observing allocation outcomes, whose distribution is both channel- and outcome- dependent. We built a discrete-time simulation that can handle our problem features and used it to evaluate: a) a historical rule-based policy; b) a supervised machine learning policy (XGBoost); and c) multi-armed bandit (MAB) policies, under different scenarios involving: i) data collection used for training (observational vs randomized); ii) lead conversion scenarios; iii) delay distributions. Our simulation results indicate that LinUCB, a simple MAB policy, consistently outperforms the other policies, achieving a 18-47% lift relative to a rule-based policy
    FairDistillation: Mitigating Stereotyping in Language Models. (arXiv:2207.04546v2 [cs.CL] UPDATED)
    Large pre-trained language models are successfully being used in a variety of tasks, across many languages. With this ever-increasing usage, the risk of harmful side effects also rises, for example by reproducing and reinforcing stereotypes. However, detecting and mitigating these harms is difficult to do in general and becomes computationally expensive when tackling multiple languages or when considering different biases. To address this, we present FairDistillation: a cross-lingual method based on knowledge distillation to construct smaller language models while controlling for specific biases. We found that our distillation method does not negatively affect the downstream performance on most tasks and successfully mitigates stereotyping and representational harms. We demonstrate that FairDistillation can create fairer language models at a considerably lower cost than alternative approaches.
    AutoMTL: A Programming Framework for Automating Efficient Multi-Task Learning. (arXiv:2110.13076v2 [cs.LG] UPDATED)
    Multi-task learning (MTL) jointly learns a set of tasks by sharing parameters among tasks. It is a promising approach for reducing storage costs while improving task accuracy for many computer vision tasks. The effective adoption of MTL faces two main challenges. The first challenge is to determine what parameters to share across tasks to optimize for both memory efficiency and task accuracy. The second challenge is to automatically apply MTL algorithms to an arbitrary CNN backbone without requiring time-consuming manual re-implementation and significant domain expertise. This paper addresses the challenges by developing the first programming framework AutoMTL that automates efficient MTL model development for vision tasks. AutoMTL takes as inputs an arbitrary backbone convolutional neural network (CNN) and a set of tasks to learn, and automatically produces a multi-task model that achieves high accuracy and small memory footprint simultaneously. Experiments on three popular MTL benchmarks (CityScapes, NYUv2, Tiny-Taskonomy) demonstrate the effectiveness of AutoMTL over state-of-the-art approaches as well as the generalizability of AutoMTL across CNNs. AutoMTL is open-sourced and available at https://github.com/zhanglijun95/AutoMTL.
    Reinforcement Learning Based Cooperative P2P Energy Trading between DC Nanogrid Clusters with Wind and PV Energy Resources. (arXiv:2209.07744v1 [cs.LG])
    In order to replace fossil fuels with the use of renewable energy resources, unbalanced resource production of intermittent wind and photovoltaic (PV) power is a critical issue for peer-to-peer (P2P) power trading. To resolve this problem, a reinforcement learning (RL) technique is introduced in this paper. For RL, graph convolutional network (GCN) and bi-directional long short-term memory (Bi-LSTM) network are jointly applied to P2P power trading between nanogrid clusters based on cooperative game theory. The flexible and reliable DC nanogrid is suitable to integrate renewable energy for distribution system. Each local nanogrid cluster takes the position of prosumer, focusing on power production and consumption simultaneously. For the power management of nanogrid clusters, multi-objective optimization is applied to each local nanogrid cluster with the Internet of Things (IoT) technology. Charging/discharging of electric vehicle (EV) is performed considering the intermittent characteristics of wind and PV power production. RL algorithms, such as deep Q-learning network (DQN), deep recurrent Q-learning network (DRQN), Bi-DRQN, proximal policy optimization (PPO), GCN-DQN, GCN-DRQN, GCN-Bi-DRQN, and GCN-PPO, are used for simulations. Consequently, the cooperative P2P power trading system maximizes the profit utilizing the time of use (ToU) tariff-based electricity cost and system marginal price (SMP), and minimizes the amount of grid power consumption. Power management of nanogrid clusters with P2P power trading is simulated on the distribution test feeder in real-time and proposed GCN-PPO technique reduces the electricity cost of nanogrid clusters by 36.7%.
    Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models. (arXiv:2209.07879v1 [cs.CL])
    Datasets with significant proportions of bias present threats for training a trustworthy model on NLU tasks. Despite yielding great progress, current debiasing methods impose excessive reliance on the knowledge of bias attributes. Definition of the attributes, however, is elusive and varies across different datasets. Furthermore, leveraging these attributes at input level to bias mitigation may leave a gap between intrinsic properties and the underlying decision rule. To narrow down this gap and liberate the supervision on bias, we suggest extending bias mitigation into feature space. Therefore, a novel model, Recovering Intended-Feature Subspace with Knowledge-Free (RISK) is developed. Assuming that shortcut features caused by various biases are unintended for prediction, RISK views them as redundant features. When delving into a lower manifold to remove redundancies, RISK reveals that an extremely low-dimensional subspace with intended features can robustly represent the highly biased dataset. Empirical results demonstrate our model can consistently improve model generalization to out-of-distribution set, and achieves a new state-of-the-art performance.  ( 2 min )
    Overcoming Exploration: Deep Reinforcement Learning in Complex Environments from Temporal Logic Specifications. (arXiv:2201.12231v3 [cs.RO] UPDATED)
    Exploration is a fundamental challenge in Deep Reinforcement Learning (DRL) based model-free navigation control since typical exploration techniques for target-driven navigation tasks rely on noise or greedy policies, which are sensitive to the density of rewards. In practice, robots are always deployed in complex cluttered environments, containing dense obstacles and narrow passageways, raising natural spare rewards that are hard to be explored for training. Such a problem becomes even more serious when pre-defined tasks are complex and have rich expressivity. In this paper, we focus on these two aspects and present a deep policy gradient algorithm for a task-guided robot with unknown dynamic systems deployed in a complex cluttered environment. Linear Temporal Logic (LTL) is applied to express a rich robotic specification. To overcome the environmental challenge of exploration during training, we propose a novel path planning-guided reward scheme that is dense over the state space, and crucially, robust to the infeasibility of computed geometric paths due to the black-box dynamics. To facilitate LTL satisfaction, our approach decomposes the LTL mission into sub-tasks that are solved using distributed DRL, where the sub-tasks can be trained in parallel, using Deep Policy Gradient algorithms. Our framework is shown to significantly improve performance (effectiveness, efficiency) and exploration of robots tasked with complex missions in large-scale complex environments. The Video demo can be found on YouTube Channel: https://youtu.be/YQRQ2-yMtIk.
    Memory Consistent Unsupervised Off-the-Shelf Model Adaptation for Source-Relaxed Medical Image Segmentation. (arXiv:2209.07910v1 [cs.CV])
    Unsupervised domain adaptation (UDA) has been a vital protocol for migrating information learned from a labeled source domain to facilitate the implementation in an unlabeled heterogeneous target domain. Although UDA is typically jointly trained on data from both domains, accessing the labeled source domain data is often restricted, due to concerns over patient data privacy or intellectual property. To sidestep this, we propose "off-the-shelf (OS)" UDA (OSUDA), aimed at image segmentation, by adapting an OS segmentor trained in a source domain to a target domain, in the absence of source domain data in adaptation. Toward this goal, we aim to develop a novel batch-wise normalization (BN) statistics adaptation framework. In particular, we gradually adapt the domain-specific low-order BN statistics, e.g., mean and variance, through an exponential momentum decay strategy, while explicitly enforcing the consistency of the domain shareable high-order BN statistics, e.g., scaling and shifting factors, via our optimization objective. We also adaptively quantify the channel-wise transferability to gauge the importance of each channel, via both low-order statistics divergence and a scaling factor.~Furthermore, we incorporate unsupervised self-entropy minimization into our framework to boost performance alongside a novel queued, memory-consistent self-training strategy to utilize the reliable pseudo label for stable and efficient unsupervised adaptation. We evaluated our OSUDA-based framework on both cross-modality and cross-subtype brain tumor segmentation and cardiac MR to CT segmentation tasks. Our experimental results showed that our memory consistent OSUDA performs better than existing source-relaxed UDA methods and yields similar performance to UDA methods with source data.
    GNNInterpreter: A Probabilistic Generative Model-Level Explanation for Graph Neural Networks. (arXiv:2209.07924v1 [cs.LG])
    Recently, Graph Neural Networks (GNNs) have significantly advanced the performance of machine learning tasks on graphs. However, this technological breakthrough makes people wonder: how does a GNN make such decisions, and can we trust its prediction with high confidence? When it comes to some critical fields such as biomedicine, where making wrong decisions can have severe consequences, interpreting the inner working mechanisms of GNNs before applying them is crucial. In this paper, we propose a novel model-agnostic model-level explanation method for different GNNs that follow the message passing scheme, GNNInterpreter, to explain the high-level decision-making process of the GNN model. More specifically, with continuous relaxation of graphs and the reparameterization trick, GNNInterpreter learns a probabilistic generative graph distribution which produces the most representative graph for the target prediction in the eye of the GNN model. Compared with the only existing work, GNNInterpreter is more computationally efficient and more flexible in generating explanation graphs with different types of node features and edge features, without introducing another blackbox to explain the GNN and without requiring domain-specific knowledge. Additionally, the experimental studies conducted on four different datasets demonstrate that the explanation graph generated by GNNInterpreter can match the desired graph pattern when the model is ideal and reveal potential model pitfalls if there exist any.
    Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions. (arXiv:2209.07899v1 [cs.RO])
    Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining single versatile policies with controllable skill sets from unlabeled datasets containing diverse state transition patterns by maximizing their discriminability. Moreover, we show that by utilizing unsupervised skill discovery in the generative adversarial imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained versatile policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations.  ( 2 min )
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v5 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. An adversary may design the tasks, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). We design an algorithm based on a reduction to bandit submodular maximization and show that, in the regime of large number of tasks and small number of optimal arms, its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(NM\sqrt{M \tau}+N^{2/3}M\tau)$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2}\sqrt{M K \tau})$ regret.  ( 3 min )
    Omni-Dimensional Dynamic Convolution. (arXiv:2209.07947v1 [cs.CV])
    Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs). Instead, recent research in dynamic convolution shows that learning a linear combination of $n$ convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs, while maintaining efficient inference. However, we observe that existing works endow convolutional kernels with the dynamic property through one dimension (regarding the convolutional kernel number) of the kernel space, but the other three dimensions (regarding the spatial size, the input channel number and the output channel number for each convolutional kernel) are overlooked. Inspired by this, we present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design, to advance this line of research. ODConv leverages a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary attentions for convolutional kernels along all four dimensions of the kernel space at any convolutional layer. As a drop-in replacement of regular convolutions, ODConv can be plugged into many CNN architectures. Extensive experiments on the ImageNet and MS-COCO datasets show that ODConv brings solid accuracy boosts for various prevailing CNN backbones including both light-weight and large ones, e.g., 3.77%~5.71%|1.86%~3.72% absolute top-1 improvements to MobivleNetV2|ResNet family on the ImageNet dataset. Intriguingly, thanks to its improved feature learning ability, ODConv with even one single kernel can compete with or outperform existing dynamic convolution counterparts with multiple kernels, substantially reducing extra parameters. Furthermore, ODConv is also superior to other attention modules for modulating the output features or the convolutional weights.
    Adversarial Cross-View Disentangled Graph Contrastive Learning. (arXiv:2209.07699v1 [cs.LG])
    Graph contrastive learning (GCL) is prevalent to tackle the supervision shortage issue in graph learning tasks. Many recent GCL methods have been proposed with various manually designed augmentation techniques, aiming to implement challenging augmentations on the original graph to yield robust representation. Although many of them achieve remarkable performances, existing GCL methods still struggle to improve model robustness without risking losing task-relevant information because they ignore the fact the augmentation-induced latent factors could be highly entangled with the original graph, thus it is more difficult to discriminate the task-relevant information from irrelevant information. Consequently, the learned representation is either brittle or unilluminating. In light of this, we introduce the Adversarial Cross-View Disentangled Graph Contrastive Learning (ACDGCL), which follows the information bottleneck principle to learn minimal yet sufficient representations from graph data. To be specific, our proposed model elicits the augmentation-invariant and augmentation-dependent factors separately. Except for the conventional contrastive loss which guarantees the consistency and sufficiency of the representations across different contrastive views, we introduce a cross-view reconstruction mechanism to pursue the representation disentanglement. Besides, an adversarial view is added as the third view of contrastive loss to enhance model robustness. We empirically demonstrate that our proposed model outperforms the state-of-the-arts on graph classification task over multiple benchmark datasets.
    Quantization for decentralized learning under subspace constraints. (arXiv:2209.07821v1 [math.OC])
    In this paper, we consider decentralized optimization problems where agents have individual cost functions to minimize subject to subspace constraints that require the minimizers across the network to lie in low-dimensional subspaces. This constrained formulation includes consensus or single-task optimization as special cases, and allows for more general task relatedness models such as multitask smoothness and coupled optimization. In order to cope with communication constraints, we propose and study an adaptive decentralized strategy where the agents employ differential randomized quantizers to compress their estimates before communicating with their neighbors. The analysis shows that, under some general conditions on the quantization noise, and for sufficiently small step-sizes $\mu$, the strategy is stable both in terms of mean-square error and average bit rate: by reducing $\mu$, it is possible to keep the estimation errors small (on the order of $\mu$) without increasing indefinitely the bit rate as $\mu\rightarrow 0$. Simulations illustrate the theoretical findings and the effectiveness of the proposed approach, revealing that decentralized learning is achievable at the expense of only a few bits.  ( 2 min )
    IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective. (arXiv:2209.08018v1 [cs.LG])
    With the wide spread of sensors and smart devices in recent years, the data generation speed of the Internet of Things (IoT) systems has increased dramatically. In IoT systems, massive volumes of data must be processed, transformed, and analyzed on a frequent basis to enable various IoT services and functionalities. Machine Learning (ML) approaches have shown their capacity for IoT data analytics. However, applying ML models to IoT data analytics tasks still faces many difficulties and challenges, specifically, effective model selection, design/tuning, and updating, which have brought massive demand for experienced data scientists. Additionally, the dynamic nature of IoT data may introduce concept drift issues, causing model performance degradation. To reduce human efforts, Automated Machine Learning (AutoML) has become a popular field that aims to automatically select, construct, tune, and update machine learning models to achieve the best performance on specified tasks. In this paper, we conduct a review of existing methods in the model selection, tuning, and updating procedures in the area of AutoML in order to identify and summarize the optimal solutions for every step of applying ML algorithms to IoT data analytics. To justify our findings and help industrial users and researchers better implement AutoML approaches, a case study of applying AutoML to IoT anomaly detection problems is conducted in this work. Lastly, we discuss and classify the challenges and research directions for this domain.  ( 3 min )
    A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care: Choosing the Best Model for COVID-19 Prognosis. (arXiv:2209.07805v1 [cs.LG])
    The COVID-19 pandemic has posed a heavy burden to the healthcare system worldwide and caused huge social disruption and economic loss. Many deep learning models have been proposed to conduct clinical predictive tasks such as mortality prediction for COVID-19 patients in intensive care units using Electronic Health Record (EHR) data. Despite their initial success in certain clinical applications, there is currently a lack of benchmarking results to achieve a fair comparison so that we can select the optimal model for clinical use. Furthermore, there is a discrepancy between the formulation of traditional prediction tasks and real-world clinical practice in intensive care. To fill these gaps, we propose two clinical prediction tasks, Outcome-specific length-of-stay prediction and Early mortality prediction for COVID-19 patients in intensive care units. The two tasks are adapted from the naive length-of-stay and mortality prediction tasks to accommodate the clinical practice for COVID-19 patients. We propose fair, detailed, open-source data-preprocessing pipelines and evaluate 17 state-of-the-art predictive models on two tasks, including 5 machine learning models, 6 basic deep learning models and 6 deep learning predictive models specifically designed for EHR data. We provide benchmarking results using data from two real-world COVID-19 EHR datasets. Both datasets are publicly available without needing any inquiry and one dataset can be accessed on request. We provide fair, reproducible benchmarking results for two tasks. We deploy all experiment results and models on an online platform. We also allow clinicians and researchers to upload their data to the platform and get quick prediction results using our trained models. We hope our efforts can further facilitate deep learning and machine learning research for COVID-19 predictive modeling.  ( 3 min )
    Model Inversion Attacks against Graph Neural Networks. (arXiv:2209.07807v1 [cs.LG])
    Many data mining tasks rely on graphs to model relational structures among individuals (nodes). Since relational data are often sensitive, there is an urgent need to evaluate the privacy risks in graph data. One famous privacy attack against data analysis models is the model inversion attack, which aims to infer sensitive data in the training dataset and leads to great privacy concerns. Despite its success in grid-like domains, directly applying model inversion attacks on non-grid domains such as graph leads to poor attack performance. This is mainly due to the failure to consider the unique properties of graphs. To bridge this gap, we conduct a systematic study on model inversion attacks against Graph Neural Networks (GNNs), one of the state-of-the-art graph analysis tools in this paper. Firstly, in the white-box setting where the attacker has full access to the target GNN model, we present GraphMI to infer the private training graph data. Specifically, in GraphMI, a projected gradient module is proposed to tackle the discreteness of graph edges and preserve the sparsity and smoothness of graph features; a graph auto-encoder module is used to efficiently exploit graph topology, node attributes, and target model parameters for edge inference; a random sampling module can finally sample discrete edges. Furthermore, in the hard-label black-box setting where the attacker can only query the GNN API and receive the classification results, we propose two methods based on gradient estimation and reinforcement learning (RL-GraphMI). Our experimental results show that such defenses are not sufficiently effective and call for more advanced defenses against privacy attacks.  ( 3 min )
    A Biologically-Inspired Dual Stream World Model. (arXiv:2209.08035v1 [cs.LG])
    The medial temporal lobe (MTL), a brain region containing the hippocampus and nearby areas, is hypothesized to be an experience-construction system in mammals, supporting both recall and imagination of temporally-extended sequences of events. Such capabilities are also core to many recently proposed ``world models" in the field of AI research. Taking inspiration from this connection, we propose a novel variant, the Dual Stream World Model (DSWM), which learns from high-dimensional observations and dissociates them into context and content streams. DSWM can reliably generate imagined trajectories in novel 2D environments after only a single exposure, outperforming a standard world model. DSWM also learns latent representations which bear a strong resemblance to place cells found in the hippocampus. We show that this representation is useful as a reinforcement learning basis function, and that the generative model can be used to aid the policy learning process using Dyna-like updates.  ( 2 min )
    Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP. (arXiv:2209.07623v1 [cs.SE])
    Context: The identification of bugs within the reported issues in an issue tracker is crucial for the triage of issues. Machine learning models have shown promising results regarding the performance of automated issue type prediction. However, we have only limited knowledge beyond our assumptions how such models identify bugs. LIME and SHAP are popular technique to explain the predictions of classifiers. Objective: We want to understand if machine learning models provide explanations for the classification that are reasonable to us as humans and align with our assumptions of what the models should learn. We also want to know if the prediction quality is correlated with the quality of explanations. Method: We conduct a study where we rate LIME and SHAP explanations based on their quality of explaining the outcome of an issue type prediction model. For this, we rate the quality of the explanations themselves, i.e., if they align with our expectations and if they help us to understand the underlying machine learning model.  ( 2 min )
    BayesBeat: Reliable Atrial Fibrillation Detection from Noisy Photoplethysmography Data. (arXiv:2011.00753v2 [cs.LG] UPDATED)
    Smartwatches or fitness trackers have garnered a lot of popularity as potential health tracking devices due to their affordable and longitudinal monitoring capabilities. To further widen their health tracking capabilities, in recent years researchers have started to look into the possibility of Atrial Fibrillation (AF) detection in real-time leveraging photoplethysmography (PPG) data, an inexpensive sensor widely available in almost all smartwatches. A significant challenge in AF detection from PPG signals comes from the inherent noise in the smartwatch PPG signals. In this paper, we propose a novel deep learning based approach, BayesBeat that leverages the power of Bayesian deep learning to accurately infer AF risks from noisy PPG signals, and at the same time provides an uncertainty estimate of the prediction. Extensive experiments on two publicly available dataset reveal that our proposed method BayesBeat outperforms the existing state-of-the-art methods. Moreover, BayesBeat is substantially more efficient having 40-200X fewer parameters than state-of-the-art baseline approaches making it suitable for deployment in resource constrained wearable devices.  ( 3 min )
    SPGP: Structure Prototype Guided Graph Pooling. (arXiv:2209.07817v1 [cs.LG])
    While graph neural networks (GNNs) have been successful for node classification tasks and link prediction tasks in graph, learning graph-level representations still remains a challenge. For the graph-level representation, it is important to learn both representation of neighboring nodes, i.e., aggregation, and graph structural information. A number of graph pooling methods have been developed for this goal. However, most of the existing pooling methods utilize k-hop neighborhood without considering explicit structural information in a graph. In this paper, we propose Structure Prototype Guided Pooling (SPGP) that utilizes prior graph structures to overcome the limitation. SPGP formulates graph structures as learnable prototype vectors and computes the affinity between nodes and prototype vectors. This leads to a novel node scoring scheme that prioritizes informative nodes while encapsulating the useful structures of the graph. Our experimental results show that SPGP outperforms state-of-the-art graph pooling methods on graph classification benchmark datasets in both accuracy and scalability.  ( 2 min )
    Self-Optimizing Feature Transformation. (arXiv:2209.08044v1 [cs.LG])
    Feature transformation aims to extract a good representation (feature) space by mathematically transforming existing features. It is crucial to address the curse of dimensionality, enhance model generalization, overcome data sparsity, and expand the availability of classic models. Current research focuses on domain knowledge-based feature engineering or learning latent representations; nevertheless, these methods are not entirely automated and cannot produce a traceable and optimal representation space. When rebuilding a feature space for a machine learning task, can these limitations be addressed concurrently? In this extension study, we present a self-optimizing framework for feature transformation. To achieve a better performance, we improved the preliminary work by (1) obtaining an advanced state representation for enabling reinforced agents to comprehend the current feature set better; and (2) resolving Q-value overestimation in reinforced agents for learning unbiased and effective policies. Finally, to make experiments more convincing than the preliminary work, we conclude by adding the outlier detection task with five datasets, evaluating various state representation approaches, and comparing different training strategies. Extensive experiments and case studies show that our work is more effective and superior.  ( 2 min )
    LogGD:Detecting Anomalies from System Logs by Graph Neural Networks. (arXiv:2209.07869v1 [cs.SE])
    Log analysis is one of the main techniques engineers use to troubleshoot faults of large-scale software systems. During the past decades, many log analysis approaches have been proposed to detect system anomalies reflected by logs. They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies. These anomalies are often identified as violations of quantitative relational patterns or sequential patterns of log events in log sequences. However, existing methods fail to leverage the spatial structural relationships among log events, resulting in potential false alarms and unstable performance. In this study, we propose a novel graph-based log anomaly detection method, LogGD, to effectively address the issue by transforming log sequences into graphs. We exploit the powerful capability of Graph Transformer Neural Network, which combines graph structure and node semantics for log-based anomaly detection. We evaluate the proposed method on four widely-used public log datasets. Experimental results show that LogGD can outperform state-of-the-art quantitative-based and sequence-based methods and achieve stable performance under different window size settings. The results confirm that LogGD is effective in log-based anomaly detection.
    A Spectral Method for Joint Community Detection and Orthogonal Group Synchronization. (arXiv:2112.13199v2 [stat.ML] UPDATED)
    Community detection and orthogonal group synchronization are both fundamental problems with a variety of important applications in science and engineering. In this work, we consider the joint problem of community detection and orthogonal group synchronization which aims to recover the communities and perform synchronization simultaneously. To this end, we propose a simple algorithm that consists of a spectral decomposition step followed by a blockwise column pivoted QR factorization (CPQR). The proposed algorithm is efficient and scales linearly with the number of edges in the graph. We also leverage the recently developed `leave-one-out' technique to establish a near-optimal guarantee for exact recovery of the cluster memberships and stable recovery of the orthogonal transforms. Numerical experiments demonstrate the efficiency and efficacy of our algorithm and confirm our theoretical characterization of it.
    Self-Attentive Pooling for Efficient Deep Learning. (arXiv:2209.07659v1 [cs.CV])
    Efficient custom pooling techniques that can aggressively trim the dimensions of a feature map and thereby reduce inference compute and memory footprint for resource-constrained computer vision applications have recently gained significant traction. However, prior pooling works extract only the local context of the activation maps, limiting their effectiveness. In contrast, we propose a novel non-local self-attentive pooling method that can be used as a drop-in replacement to the standard pooling layers, such as max/average pooling or strided convolution. The proposed self-attention module uses patch embedding, multi-head self-attention, and spatial-channel restoration, followed by sigmoid activation and exponential soft-max. This self-attention mechanism efficiently aggregates dependencies between non-local activation patches during down-sampling. Extensive experiments on standard object classification and detection tasks with various convolutional neural network (CNN) architectures demonstrate the superiority of our proposed mechanism over the state-of-the-art (SOTA) pooling techniques. In particular, we surpass the test accuracy of existing pooling techniques on different variants of MobileNet-V2 on ImageNet by an average of 1.2%. With the aggressive down-sampling of the activation maps in the initial layers (providing up to 22x reduction in memory consumption), our approach achieves 1.43% higher test accuracy compared to SOTA techniques with iso-memory footprints. This enables the deployment of our models in memory-constrained devices, such as micro-controllers (without losing significant accuracy), because the initial activation maps consume a significant amount of on-chip memory for high-resolution images required for complex vision tasks. Our proposed pooling method also leverages the idea of channel pruning to further reduce memory footprints.  ( 3 min )
    Continual Few Shot Learning with Hippocampal-Inspired Replay. (arXiv:2209.07863v1 [cs.NE])
    Continual learning and few-shot learning are important frontiers in the quest to improve Machine Learning. There is a growing body of work in each frontier, but very little combining the two. Recently however, Antoniou et al. arXiv:2004.11967 introduced a Continual Few-shot Learning framework, CFSL, that combines both. In this study, we extended CFSL to make it more comparable to standard continual learning experiments, where usually a much larger number of classes are presented. We also introduced an `instance test' to classify very similar specific instances - a capability of animal cognition that is usually neglected in ML. We selected representative baseline models from the original CFSL work and compared to a model with Hippocampal-inspired replay, as the Hippocampus is considered to be vital to this type of learning in animals. As expected, learning more classes is more difficult than the original CFSL experiments, and interestingly, the way in which they are presented makes a difference to performance. Accuracy in the instance test is comparable to the classification tasks. The use of replay for consolidation improves performance substantially for both types of tasks, particularly the instance test.
    Universal Speech Enhancement with Score-based Diffusion. (arXiv:2206.03065v2 [cs.SD] UPDATED)
    Removing background noise from speech audio has been the subject of considerable effort, especially in recent years due to the rise of virtual communication and amateur recordings. Yet background noise is not the only unpleasant disturbance that can prevent intelligibility: reverb, clipping, codec artifacts, problematic equalization, limited bandwidth, or inconsistent loudness are equally disturbing and ubiquitous. In this work, we propose to consider the task of speech enhancement as a holistic endeavor, and present a universal speech enhancement system that tackles 55 different distortions at the same time. Our approach consists of a generative model that employs score-based diffusion, together with a multi-resolution conditioning network that performs enhancement with mixture density networks. We show that this approach significantly outperforms the state of the art in a subjective test performed by expert listeners. We also show that it achieves competitive objective scores with just 4-8 diffusion steps, despite not considering any particular strategy for fast sampling. We hope that both our methodology and technical contributions encourage researchers and practitioners to adopt a universal approach to speech enhancement, possibly framing it as a generative task.
    SplitGuard: Detecting and Mitigating Training-Hijacking Attacks in Split Learning. (arXiv:2108.09052v3 [cs.CR] UPDATED)
    Distributed deep learning frameworks such as split learning provide great benefits with regards to the computational cost of training deep neural networks and the privacy-aware utilization of the collective data of a group of data-holders. Split learning, in particular, achieves this goal by dividing a neural network between a client and a server so that the client computes the initial set of layers, and the server computes the rest. However, this method introduces a unique attack vector for a malicious server attempting to steal the client's private data: the server can direct the client model towards learning any task of its choice, e.g. towards outputting easily invertible values. With a concrete example already proposed (Pasquini et al., CCS '21), such training-hijacking attacks present a significant risk for the data privacy of split learning clients. In this paper, we propose SplitGuard, a method by which a split learning client can detect whether it is being targeted by a training-hijacking attack or not. We experimentally evaluate our method's effectiveness, compare it with potential alternatives, and discuss in detail various points related to its use. We conclude that SplitGuard can effectively detect training-hijacking attacks while minimizing the amount of information recovered by the adversaries.
    Lethal Dose Conjecture on Data Poisoning. (arXiv:2208.03309v2 [cs.LG] UPDATED)
    Data poisoning considers an adversary that distorts the training set of machine learning algorithms for malicious purposes. In this work, we bring to light one conjecture regarding the fundamentals of data poisoning, which we call the Lethal Dose Conjecture. The conjecture states: If $n$ clean training samples are needed for accurate predictions, then in a size-$N$ training set, only $\Theta(N/n)$ poisoned samples can be tolerated while ensuring accuracy. Theoretically, we verify this conjecture in multiple cases. We also offer a more general perspective of this conjecture through distribution discrimination. Deep Partition Aggregation (DPA) and its extension, Finite Aggregation (FA) are recent approaches for provable defenses against data poisoning, where they predict through the majority vote of many base models trained from different subsets of training set using a given learner. The conjecture implies that both DPA and FA are (asymptotically) optimal -- if we have the most data-efficient learner, they can turn it into one of the most robust defenses against data poisoning. This outlines a practical approach to developing stronger defenses against poisoning via finding data-efficient learners. Empirically, as a proof of concept, we show that by simply using different data augmentations for base learners, we can respectively double and triple the certified robustness of DPA on CIFAR-10 and GTSRB without sacrificing accuracy.
    Learning with Local Gradients at the Edge. (arXiv:2208.08503v2 [cs.LG] UPDATED)
    To enable learning on edge devices with fast convergence and low memory, we present a novel backpropagation-free optimization algorithm dubbed Target Projection Stochastic Gradient Descent (tpSGD). tpSGD generalizes direct random target projection to work with arbitrary loss functions and extends target projection for training recurrent neural networks (RNNs) in addition to feedforward networks. tpSGD uses layer-wise stochastic gradient descent (SGD) and local targets generated via random projections of the labels to train the network layer-by-layer with only forward passes. tpSGD doesn't require retaining gradients during optimization, greatly reducing memory allocation compared to SGD backpropagation (BP) methods that require multiple instances of the entire neural network weights, input/output, and intermediate results. Our method performs comparably to BP gradient-descent within 5% accuracy on relatively shallow networks of fully connected layers, convolutional layers, and recurrent layers. tpSGD also outperforms other state-of-the-art gradient-free algorithms in shallow models consisting of multi-layer perceptrons, convolutional neural networks (CNNs), and RNNs with competitive accuracy and less memory and time. We evaluate the performance of tpSGD in training deep neural networks (e.g. VGG) and extend the approach to multi-layer RNNs. These experiments highlight new research directions related to optimized layer-based adaptor training for domain-shift using tpSGD at the edge.
    DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization. (arXiv:2209.08037v1 [cs.LG])
    The combinatorial problem of learning directed acyclic graphs (DAGs) from data was recently framed as a purely continuous optimization problem by leveraging a differentiable acyclicity characterization of DAGs based on the trace of a matrix exponential function. Existing acyclicity characterizations are based on the idea that powers of an adjacency matrix contain information about walks and cycles. In this work, we propose a $\textit{fundamentally different}$ acyclicity characterization based on the log-determinant (log-det) function, which leverages the nilpotency property of DAGs. To deal with the inherent asymmetries of a DAG, we relate the domain of our log-det characterization to the set of $\textit{M-matrices}$, which is a key difference to the classical log-det function defined over the cone of positive definite matrices. Similar to acyclicity functions previously proposed, our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. From the optimization side, we drop the typically used augmented Lagrangian scheme, and propose DAGMA ($\textit{Directed Acyclic Graphs via M-matrices for Acyclicity}$), a method that resembles the central path for barrier methods. Each point in the central path of DAGMA is a solution to an unconstrained problem regularized by our log-det function, then we show that at the limit of the central path the solution is guaranteed to be a DAG. Finally, we provide extensive experiments for $\textit{linear}$ and $\textit{nonlinear}$ SEMs, and show that our approach can reach large speed-ups and smaller structural Hamming distances against state-of-the-art methods.
    Computing Abductive Explanations for Boosted Trees. (arXiv:2209.07740v1 [cs.AI])
    Boosted trees is a dominant ML model, exhibiting high accuracy. However, boosted trees are hardly intelligible, and this is a problem whenever they are used in safety-critical applications. Indeed, in such a context, rigorous explanations of the predictions made are expected. Recent work have shown how subset-minimal abductive explanations can be derived for boosted trees, using automated reasoning techniques. However, the generation of such well-founded explanations is intractable in the general case. To improve the scalability of their generation, we introduce the notion of tree-specific explanation for a boosted tree. We show that tree-specific explanations are abductive explanations that can be computed in polynomial time. We also explain how to derive a subset-minimal abductive explanation from a tree-specific explanation. Experiments on various datasets show the computational benefits of leveraging tree-specific explanations for deriving subset-minimal abductive explanations.
    A benchmark study on methods to ensure fair algorithmic decisions for credit scoring. (arXiv:2209.07912v1 [cs.LG])
    The utility of machine learning in evaluating the creditworthiness of loan applicants has been proofed since decades ago. However, automatic decisions may lead to different treatments over groups or individuals, potentially causing discrimination. This paper benchmarks 12 top bias mitigation methods discussing their performance based on 5 different fairness metrics, accuracy achieved and potential profits for the financial institutions. Our findings show the difficulties in achieving fairness while preserving accuracy and profits. Additionally, it highlights some of the best and worst performers and helps bridging the gap between experimental machine learning and its industrial application.  ( 2 min )
    Mondrian Forest for Data Stream Classification Under Memory Constraints. (arXiv:2205.07871v2 [cs.LG] UPDATED)
    Supervised learning algorithms generally assume the availability of enough memory to store their data model during the training and test phases. However, in the Internet of Things, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. In this paper, we adapt the online Mondrian forest classification algorithm to work with memory constraints on data streams. In particular, we design five out-of-memory strategies to update Mondrian trees with new data points when the memory limit is reached. Moreover, we design trimming mechanisms to make Mondrian trees more robust to concept drifts under memory constraints. We evaluate our algorithms on a variety of real and simulated datasets, and we conclude with recommendations on their use in different situations: the Extend Node strategy appears as the best out-of-memory strategy in all configurations, whereas different trimming mechanisms should be adopted depending on whether a concept drift is expected. All our methods are implemented in the OrpailleCC open-source library and are ready to be used on embedded systems and connected objects.
    Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability. (arXiv:2209.08025v1 [cs.LG])
    A trustworthy reinforcement learning algorithm should be competent in solving challenging real-world problems, including {robustly} handling uncertainties, satisfying {safety} constraints to avoid catastrophic failures, and {generalizing} to unseen scenarios during deployments. This study aims to overview these main perspectives of trustworthy reinforcement learning considering its intrinsic vulnerabilities on robustness, safety, and generalizability. In particular, we give rigorous formulations, categorize corresponding methodologies, and discuss benchmarks for each perspective. Moreover, we provide an outlook section to spur promising future directions with a brief discussion on extrinsic vulnerabilities considering human feedback. We hope this survey could bring together separate threads of studies together in a unified framework and promote the trustworthiness of reinforcement learning.
    A Mosquito is Worth 16x16 Larvae: Evaluation of Deep Learning Architectures for Mosquito Larvae Classification. (arXiv:2209.07718v1 [cs.CV])
    Mosquito-borne diseases (MBDs), such as dengue virus, chikungunya virus, and West Nile virus, cause over one million deaths globally every year. Because many such diseases are spread by the Aedes and Culex mosquitoes, tracking these larvae becomes critical in mitigating the spread of MBDs. Even as citizen science grows and obtains larger mosquito image datasets, the manual annotation of mosquito images becomes ever more time-consuming and inefficient. Previous research has used computer vision to identify mosquito species, and the Convolutional Neural Network (CNN) has become the de-facto for image classification. However, these models typically require substantial computational resources. This research introduces the application of the Vision Transformer (ViT) in a comparative study to improve image classification on Aedes and Culex larvae. Two ViT models, ViT-Base and CvT-13, and two CNN models, ResNet-18 and ConvNeXT, were trained on mosquito larvae image data and compared to determine the most effective model to distinguish mosquito larvae as Aedes or Culex. Testing revealed that ConvNeXT obtained the greatest values across all classification metrics, demonstrating its viability for mosquito larvae classification. Based on these results, future research includes creating a model specifically designed for mosquito larvae classification by combining elements of CNN and transformer architecture.
    Interactions in Information Spread. (arXiv:2209.08026v1 [cs.SI])
    Since the development of writing 5000 years ago, human-generated data gets produced at an ever-increasing pace. Classical archival methods aimed at easing information retrieval. Nowadays, archiving is not enough anymore. The amount of data that gets generated daily is beyond human comprehension, and appeals for new information retrieval strategies. Instead of referencing every single data piece as in traditional archival techniques, a more relevant approach consists in understanding the overall ideas conveyed in data flows. To spot such general tendencies, a precise comprehension of the underlying data generation mechanisms is required. In the rich literature tackling this problem, the question of information interaction remains nearly unexplored. First, we investigate the frequency of such interactions. Building on recent advances made in Stochastic Block Modelling, we explore the role of interactions in several social networks. We find that interactions are rare in these datasets. Then, we wonder how interactions evolve over time. Earlier data pieces should not have an everlasting influence on ulterior data generation mechanisms. We model this using dynamic network inference advances. We conclude that interactions are brief. Finally, we design a framework that jointly models rare and brief interactions based on Dirichlet-Hawkes Processes. We argue that this new class of models fits brief and sparse interaction modelling. We conduct a large-scale application on Reddit and find that interactions play a minor role in this dataset. From a broader perspective, our work results in a collection of highly flexible models and in a rethinking of core concepts of machine learning. Consequently, we open a range of novel perspectives both in terms of real-world applications and in terms of technical contributions to machine learning.
    Systematically and efficiently improving existing $k$-means initialization algorithms by pairwise-nearest-neighbor smoothing. (arXiv:2202.03949v3 [cs.LG] UPDATED)
    We present a meta-method for initializing (seeding) the $k$-means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used. If the computational complexity of that seeding algorithm is linear in the size of the data $N$ and the number of clusters $k$, PNN-smoothing is also almost linear with an appropriate choice of $J$, and quite competitive in practice. We show empirically, using several existing seeding methods and testing on several synthetic and real datasets, that this procedure results in systematically better costs. Our implementation is publicly available at https://github.com/carlobaldassi/KMeansPNNSmoothing.jl.
    Prediction of $\textrm{CO}_2$ Adsorption in Nano-Pores with Graph Neural Networks. (arXiv:2209.07567v1 [cond-mat.mtrl-sci])
    We investigate the graph-based convolutional neural network approach for predicting and ranking gas adsorption properties of crystalline Metal-Organic Framework (MOF) adsorbents for application in post-combustion capture of $\textrm{CO}_2$. Our model is based solely on standard structural input files containing atomistic descriptions of the adsorbent material candidates. We construct novel methodological extensions to match the prediction accuracy of classical machine learning models that were built with hundreds of features at much higher computational cost. Our approach can be more broadly applied to optimize gas capture processes at industrial scale.
    FairGBM: Gradient Boosting with Fairness Constraints. (arXiv:2209.07850v1 [cs.LG])
    Machine Learning (ML) algorithms based on gradient boosted decision trees (GBDT) are still favored on many tabular data tasks across various mission critical applications, from healthcare to finance. However, GBDT algorithms are not free of the risk of bias and discriminatory decision-making. Despite GBDT's popularity and the rapid pace of research in fair ML, existing in-processing fair ML methods are either inapplicable to GBDT, incur in significant train time overhead, or are inadequate for problems with high class imbalance. We present FairGBM, a learning framework for training GBDT under fairness constraints with little to no impact on predictive performance when compared to unconstrained LightGBM. Since common fairness metrics are non-differentiable, we employ a ``proxy-Lagrangian'' formulation using smooth convex error rate proxies to enable gradient-based optimization. Additionally, our open-source implementation shows an order of magnitude speedup in training time when compared with related work, a pivotal aspect to foster the widespread adoption of FairGBM by real-world practitioners.
    Modeling and estimating mixed memberships in weighted networks. (arXiv:2112.04389v2 [cs.SI] UPDATED)
    We consider the problem of detecting latent community information of mixed membership weighted network in which nodes have mixed memberships and edges connecting between nodes can be finite real numbers. We propose a general mixed membership distribution-free model for this problem. The model has no distribution constraints of edges but only the expected values, and can be viewed as generalizations of some previous models. We use an efficient spectral algorithm to estimate community memberships under the model. We also derive the convergence rate of the proposed algorithm under the model using spectral analysis. We demonstrate the advantages of mixed membership distribution-free model and the algorithm with applications to a small scale of simulated networks when edges follow different distributions. We have also applied the algorithm to five real world weighted network data sets with encouraging results.
    Dynamics-informed deconvolutional neural networks for super-resolution identification of regime changes in epidemiological time series. (arXiv:2209.07802v1 [cs.LG])
    Inferring the timing and amplitude of perturbations in epidemiological systems from their stochastically spread low-resolution outcomes is as relevant as challenging. It is a requirement for current approaches to overcome the need to know the details of the perturbations to proceed with the analyses. However, the general problem of connecting epidemiological curves with the underlying incidence lacks the highly effective methodology present in other inverse problems, such as super-resolution and dehazing from computer vision. Here, we develop an unsupervised physics-informed convolutional neural network approach in reverse to connect death records with incidence that allows the identification of regime changes at single-day resolution. Applied to COVID-19 data with proper regularization and model-selection criteria, the approach can identify the implementation and removal of lockdowns and other nonpharmaceutical interventions with 0.93-day accuracy over the time span of a year.
    Symphony Generation with Permutation Invariant Language Model. (arXiv:2205.05448v2 [cs.SD] UPDATED)
    In this work, we propose a permutation invariant language model, SymphonyNet, as a solution for symbolic symphony music generation. We propose a novel Multi-track Multi-instrument Repeatable (MMR) representation for symphonic music and model the music sequence using a Transformer-based auto-regressive language model with specific 3-D positional embedding. To overcome length overflow when modeling extra-long symphony tokens, we also propose a modified Byte Pair Encoding algorithm (Music BPE) for music tokens and introduce a novel linear transformer decoder architecture as a backbone. Meanwhile, we train the decoder to learn automatic orchestration as a joint task by masking instrument information from the input. We also introduce a large-scale symbolic symphony dataset for the advance of symphony generation research. Empirical results show that the proposed approach can generate coherent, novel, complex and harmonious symphony as a pioneer solution for multi-track multi-instrument symbolic music generation.
    Continual Learning with Dependency Preserving Hypernetworks. (arXiv:2209.07712v1 [cs.LG])
    Humans learn continually throughout their lifespan by accumulating diverse knowledge and fine-tuning it for future tasks. When presented with a similar goal, neural networks suffer from catastrophic forgetting if data distributions across sequential tasks are not stationary over the course of learning. An effective approach to address such continual learning (CL) problems is to use hypernetworks which generate task dependent weights for a target network. However, the continual learning performance of existing hypernetwork based approaches are affected by the assumption of independence of the weights across the layers in order to maintain parameter efficiency. To address this limitation, we propose a novel approach that uses a dependency preserving hypernetwork to generate weights for the target network while also maintaining the parameter efficiency. We propose to use recurrent neural network (RNN) based hypernetwork that can generate layer weights efficiently while allowing for dependencies across them. In addition, we propose novel regularisation and network growth techniques for the RNN based hypernetwork to further improve the continual learning performance. To demonstrate the effectiveness of the proposed methods, we conducted experiments on several image classification continual learning tasks and settings. We found that the proposed methods based on the RNN hypernetworks outperformed the baselines in all these CL settings and tasks.  ( 3 min )
    Fine-tuning or top-tuning? Transfer learning with pretrained features and fast kernel methods. (arXiv:2209.07932v1 [cs.LG])
    The impressive performances of deep learning architectures is associated to massive increase of models complexity. Millions of parameters need be tuned, with training and inference time scaling accordingly. But is massive fine-tuning necessary? In this paper, focusing on image classification, we consider a simple transfer learning approach exploiting pretrained convolutional features as input for a fast kernel method. We refer to this approach as top-tuning, since only the kernel classifier is trained. By performing more than 2500 training processes we show that this top-tuning approach provides comparable accuracy w.r.t. fine-tuning, with a training time that is between one and two orders of magnitude smaller. These results suggest that top-tuning provides a useful alternative to fine-tuning in small/medium datasets, especially when training efficiency is crucial.  ( 2 min )
    D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data. (arXiv:2001.02856v3 [stat.ML] UPDATED)
    Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additive noise matrix. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA). The D-GCCA rigorously defines the decomposition on the L2 space of random variables in contrast to the Euclidean dot product space used by most existing methods, thereby being able to provide the estimation consistency for the low-rank matrix recovery. Moreover, to well calibrate common latent factors, we impose a desirable orthogonality constraint on distinctive latent factors. Existing methods, however, inadequately consider such orthogonality and may thus suffer from substantial loss of undetected common-source variation. Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables, while enjoying an appealing interpretation from the perspective of principal component analysis. Furthermore, we propose to use the variable-level proportion of signal variance explained by common or distinctive latent factors for selecting the variables most influenced. Consistent estimators of our D-GCCA method are established with good finite-sample numerical performance, and have closed-form expressions leading to efficient computation especially for large-scale data. The superiority of D-GCCA over state-of-the-art methods is also corroborated in simulations and real-world data examples.  ( 3 min )
    DBT-DMAE: An Effective Multivariate Time Series Pre-Train Model under Missing Data. (arXiv:2209.07798v1 [cs.LG])
    Multivariate time series(MTS) is a universal data type related to many practical applications. However, MTS suffers from missing data problems, which leads to degradation or even collapse of the downstream tasks, such as prediction and classification. The concurrent missing data handling procedures could inevitably arouse the biased estimation and redundancy-training problem when encountering multiple downstream tasks. This paper presents a universally applicable MTS pre-train model, DBT-DMAE, to conquer the abovementioned obstacle. First, a missing representation module is designed by introducing dynamic positional embedding and random masking processing to characterize the missing symptom. Second, we proposed an auto-encoder structure to obtain the generalized MTS encoded representation utilizing an ameliorated TCN structure called dynamic-bidirectional-TCN as the basic unit, which integrates the dynamic kernel and time-fliping trick to draw temporal features effectively. Finally, the overall feed-in and loss strategy is established to ensure the adequate training of the whole model. Comparative experiment results manifest that the DBT-DMAE outperforms the other state-of-the-art methods in six real-world datasets and two different downstream tasks. Moreover, ablation and interpretability experiments are delivered to verify the validity of DBT-DMAE's substructures.  ( 2 min )
    Topological Structure Learning for Weakly-Supervised Out-of-Distribution Detection. (arXiv:2209.07837v1 [cs.CV])
    Out-of-distribution (OOD) detection is the key to deploying models safely in the open world. For OOD detection, collecting sufficient in-distribution (ID) labeled data is usually more time-consuming and costly than unlabeled data. When ID labeled data is limited, the previous OOD detection methods are no longer superior due to their high dependence on the amount of ID labeled data. Based on limited ID labeled data and sufficient unlabeled data, we define a new setting called Weakly-Supervised Out-of-Distribution Detection (WSOOD). To solve the new problem, we propose an effective method called Topological Structure Learning (TSL). Firstly, TSL uses a contrastive learning method to build the initial topological structure space for ID and OOD data. Secondly, TSL mines effective topological connections in the initial topological space. Finally, based on limited ID labeled data and mined topological connections, TSL reconstructs the topological structure in a new topological space to increase the separability of ID and OOD instances. Extensive studies on several representative datasets show that TSL remarkably outperforms the state-of-the-art, verifying the validity and robustness of our method in the new setting of WSOOD.  ( 2 min )
    Towards a Better Microcredit Decision. (arXiv:2209.07574v1 [q-fin.RM])
    Reject inference comprises techniques to infer the possible repayment behavior of rejected cases. In this paper, we model credit in a brand new view by capturing the sequential pattern of interactions among multiple stages of loan business to make better use of the underlying causal relationship. Specifically, we first define 3 stages with sequential dependence throughout the loan process including credit granting(AR), withdrawal application(WS) and repayment commitment(GB) and integrate them into a multi-task architecture. Inside stages, an intra-stage multi-task classification is built to meet different business goals. Then we design an Information Corridor to express sequential dependence, leveraging the interaction information between customer and platform from former stages via a hierarchical attention module controlling the content and size of the information channel. In addition, semi-supervised loss is introduced to deal with the unobserved instances. The proposed multi-stage interaction sequence(MSIS) method is simple yet effective and experimental results on a real data set from a top loan platform in China show the ability to remedy the population bias and improve model generalization ability.  ( 2 min )
    Mitigating the Effects of Non-Identifiability on Inference for Bayesian Neural Networks with Latent Variables. (arXiv:1911.00569v4 [cs.LG] UPDATED)
    Bayesian Neural Networks with Latent Variables (BNN+LVs) capture predictive uncertainty by explicitly modeling model uncertainty (via priors on network weights) and environmental stochasticity (via a latent input noise variable). In this work, we first show that BNN+LV suffers from a serious form of non-identifiability: explanatory power can be transferred between the model parameters and latent variables while fitting the data equally well. We demonstrate that as a result, in the limit of infinite data, the posterior mode over the network weights and latent variables is asymptotically biased away from the ground-truth. Due to this asymptotic bias, traditional inference methods may in practice yield parameters that generalize poorly and misestimate uncertainty. Next, we develop a novel inference procedure that explicitly mitigates the effects of likelihood non-identifiability during training and yields high-quality predictions as well as uncertainty estimates. We demonstrate that our inference method improves upon benchmark methods across a range of synthetic and real data-sets.
    Properties of Reddit News Topical Interactions. (arXiv:2209.07816v1 [cs.SI])
    Most models of information diffusion online rely on the assumption that pieces of information spread independently from each other. However, several works pointed out the necessity of investigating the role of interactions in real-world processes, and highlighted possible difficulties in doing so: interactions are sparse and brief. As an answer, recent advances developed models to account for interactions in underlying publication dynamics. In this article, we propose to extend and apply one such model to determine whether interactions between news headlines on Reddit play a significant role in their underlying publication mechanisms. After conducting an in-depth case study on 100,000 news headline from 2019, we retrieve state-of-the-art conclusions about interactions and conclude that they play a minor role in this dataset.
    Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling. (arXiv:2209.08004v1 [math.ST])
    The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points, commonly used for manifold learning and clustering, as well as supervised and semi-supervised learning on graphs. In many practical situations, the data can be corrupted by noise that prohibits traditional affinity matrices from correctly assessing similarities, especially if the noise magnitudes vary considerably across the data, e.g., under heteroskedasticity or outliers. An alternative approach that provides a more stable behavior under noise is the doubly stochastic normalization of the Gaussian kernel. In this work, we investigate this normalization in a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish the pointwise concentration of the doubly stochastic affinity matrix and its scaling factors around certain population forms. We then utilize these results to develop several tools for robust inference. First, we derive a robust density estimator that can substantially outperform the standard kernel density estimator under high-dimensional noise. Second, we provide estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that approximate popular manifold Laplacians, including the Laplace Beltrami operator, showing that the local geometry of the manifold can be recovered under high-dimensional noise. We exemplify our results in simulations and on real single-cell RNA-sequencing data. In the latter, we show that our proposed normalizations are robust to technical variability associated with different cell types.
    DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer Prognosis. (arXiv:2206.05782v3 [eess.IV] UPDATED)
    The cancer prognosis on gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To further enhance WSI visual representations, existing methods have explored image pyramids, instead of single-resolution images, in WSIs. In spite of this, they still face two major problems: high computational cost and the unnoticed semantical gap in multi-resolution feature fusion. To tackle these problems, this paper proposes to efficiently exploit WSI pyramids from a new perspective, the dual-stream network with cross-attention (DSCA). Our key idea is to utilize two sub-streams to process the WSI patches with two resolutions, where a square pooling is devised in a high-resolution stream to significantly reduce computational costs, and a cross-attention-based method is proposed to properly handle the fusion of dual-stream features. We validate our DSCA on three publicly-available datasets with a total number of 3,101 WSIs from 1,911 patients. Our experiments and ablation studies verify that (i) the proposed DSCA could outperform existing state-of-the-art methods in cancer prognosis, by an average C-Index improvement of around 4.6%; (ii) our DSCA network is more efficient in computation -- it has more learnable parameters (6.31M vs. 860.18K) but less computational costs (2.51G vs. 4.94G), compared to a typical existing multi-resolution network. (iii) the key components of DSCA, dual-stream and cross-attention, indeed contribute to our model's performance, gaining an average C-Index rise of around 2.0% while maintaining a relatively-small computational load. Our DSCA could serve as an alternative and effective tool for WSI-based cancer prognosis.
    Detection of Interacting Variables for Generalized Linear Models via Neural Networks. (arXiv:2209.08030v1 [stat.ML])
    The quality of generalized linear models (GLMs), frequently used by insurance companies, depends on the choice of interacting variables. The search for interactions is time-consuming, especially for data sets with a large number of variables, depends much on expert judgement of actuaries, and often relies on visual performance indicators. Therefore, we present an approach to automating the process of finding interactions that should be added to GLMs to improve their predictive power. Our approach relies on neural networks and a model-specific interaction detection method, which is computationally faster than the traditionally used methods like Friedman H-Statistic or SHAP values. In numerical studies, we provide the results of our approach on different data sets: open-source data, artificial data, and proprietary data.
    Learning to Constrain Policy Optimization with Virtual Trust Region. (arXiv:2204.09315v2 [cs.LG] UPDATED)
    We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update. In addition to using the proximity of one single old policy as the normal trust region, we propose forming a second trust region through another virtual policy representing a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial if the old policy performs poorly. More importantly, we propose a mechanism to automatically build the virtual policy from a memory of past policies, providing a new capability for dynamically learning appropriate virtual trust regions during the optimization process. Our proposed method, dubbed Memory-Constrained Policy Optimization (MCPO), is examined in diverse environments, including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.  ( 2 min )
    Smoothed Embeddings for Certified Few-Shot Learning. (arXiv:2202.01186v2 [cs.LG] UPDATED)
    Randomized smoothing is considered to be the state-of-the-art provable defense against adversarial perturbations. However, it heavily exploits the fact that classifiers map input objects to class probabilities and do not focus on the ones that learn a metric space in which classification is performed by computing distances to embeddings of classes prototypes. In this work, we extend randomized smoothing to few-shot learning models that map inputs to normalized embeddings. We provide analysis of Lipschitz continuity of such models and derive robustness certificate against $\ell_2$-bounded perturbations that may be useful in few-shot learning scenarios. Our theoretical results are confirmed by experiments on different datasets.
    Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes. (arXiv:2209.07696v1 [cs.LG])
    Lying on the heart of intelligent decision-making systems, how policy is represented and optimized is a fundamental problem. The root challenge in this problem is the large scale and the high complexity of policy space, which exacerbates the difficulty of policy learning especially in real-world scenarios. Towards a desirable surrogate policy space, recently policy representation in a low-dimensional latent space has shown its potential in improving both the evaluation and optimization of policy. The key question involved in these studies is by what criterion we should abstract the policy space for desired compression and generalization. However, both the theory on policy abstraction and the methodology on policy representation learning are less studied in the literature. In this work, we make very first efforts to fill up the vacancy. First, we propose a unified policy abstraction theory, containing three types of policy abstraction associated to policy features at different levels. Then, we generalize them to three policy metrics that quantify the distance (i.e., similarity) of policies, for more convenient use in learning policy representation. Further, we propose a policy representation learning approach based on deep metric learning. For the empirical study, we investigate the efficacy of the proposed policy metrics and representations, in characterizing policy difference and conveying policy generalization respectively. Our experiments are conducted in both policy optimization and evaluation problems, containing trust-region policy optimization (TRPO), diversity-guided evolution strategy (DGES) and off-policy evaluation (OPE). Somewhat naturally, the experimental results indicate that there is no a universally optimal abstraction for all downstream learning problems; while the influence-irrelevance policy abstraction can be a generally preferred choice.  ( 3 min )
    Satellite galaxy abundance dependency on cosmology in Magneticum simulations. (arXiv:2110.05498v2 [astro-ph.CO] UPDATED)
    Context: Modelling satellite galaxy abundance $N_s$ in Galaxy Clusters (GCs) is a key element in modelling the Halo Occupation Distribution (HOD), which itself is a powerful tool to connect observational studies with numerical simulations. Aims: To study the impact of cosmological parameters on satellite abundance both in cosmological simulations and in mock observations. Methods: We build an emulator (HODEmu, \url{https://github.com/aragagnin/HODEmu/}) of satellite abundance based on cosmological parameters $\Omega_m, \Omega_b, \sigma_8, h_0$ and redshift $z.$ We train our emulator using \magneticum hydrodynamic simulations that span 15 different cosmologies, each over $4$ redshift slices between $0<z<0.5,$ and for each setup we fit normalisation $A$, log-slope $\beta$ and Gaussian fractional-scatter $\sigma$ of the $N_s-M$ relation. The emulator is based on multi-variate output Gaussian Process Regression (GPR). Results: We find that $A$ and $\beta$ depend on cosmological parameters, even if weakly, especially on $\Omega_m,$ $\Omega_b.$ This dependency can explain some discrepancies found in literature between satellite HOD of different cosmological simulations (Magneticum, Illustris, BAHAMAS). We also show that satellite abundance cosmology dependency differs between full-physics (FP) simulations, dark-matter only (DMO), and non-radiative simulations. Conclusions: This work provides a preliminary calibration of the cosmological dependency of the satellite abundance of high mass halos, and we showed that modelling HOD with cosmological parameters is necessary to interpret satellite abundance, and we showed the importance of using FP simulations in modelling this dependency.
    Truthful Generalized Linear Models. (arXiv:2209.07815v1 [cs.LG])
    In this paper we study estimating Generalized Linear Models (GLMs) in the case where the agents (individuals) are strategic or self-interested and they concern about their privacy when reporting data. Compared with the classical setting, here we aim to design mechanisms that can both incentivize most agents to truthfully report their data and preserve the privacy of individuals' reports, while their outputs should also close to the underlying parameter. In the first part of the paper, we consider the case where the covariates are sub-Gaussian and the responses are heavy-tailed where they only have the finite fourth moments. First, motivated by the stationary condition of the maximizer of the likelihood function, we derive a novel private and closed form estimator. Based on the estimator, we propose a mechanism which has the following properties via some appropriate design of the computation and payment scheme for several canonical models such as linear regression, logistic regression and Poisson regression: (1) the mechanism is $o(1)$-jointly differentially private (with probability at least $1-o(1)$); (2) it is an $o(\frac{1}{n})$-approximate Bayes Nash equilibrium for a $(1-o(1))$-fraction of agents to truthfully report their data, where $n$ is the number of agents; (3) the output could achieve an error of $o(1)$ to the underlying parameter; (4) it is individually rational for a $(1-o(1))$ fraction of agents in the mechanism ; (5) the payment budget required from the analyst to run the mechanism is $o(1)$. In the second part, we consider the linear regression model under more general setting where both covariates and responses are heavy-tailed and only have finite fourth moments. By using an $\ell_4$-norm shrinkage operator, we propose a private estimator and payment scheme which have similar properties as in the sub-Gaussian case.
    Learning Pair Potentials using Differentiable Simulations. (arXiv:2209.07679v1 [physics.chem-ph])
    Learning pair interactions from experimental or simulation data is of great interest for molecular simulations. We propose a general stochastic method for learning pair interactions from data using differentiable simulations (DiffSim). DiffSim defines a loss function based on structural observables, such as the radial distribution function, through molecular dynamics (MD) simulations. The interaction potentials are then learned directly by stochastic gradient descent, using backpropagation to calculate the gradient of the structural loss metric with respect to the interaction potential through the MD simulation. This gradient-based method is flexible and can be configured to simulate and optimize multiple systems simultaneously. For example, it is possible to simultaneously learn potentials for different temperatures or for different compositions. We demonstrate the approach by recovering simple pair potentials, such as Lennard-Jones systems, from radial distribution functions. We find that DiffSim can be used to probe a wider functional space of pair potentials compared to traditional methods like Iterative Boltzmann Inversion. We show that our methods can be used to simultaneously fit potentials for simulations at different compositions and temperatures to improve the transferability of the learned potentials.  ( 2 min )
    Model Predictive Robustness of Signal Temporal Logic Predicates. (arXiv:2209.07881v1 [cs.RO])
    The robustness of signal temporal logic not only assesses whether a signal adheres to a specification but also provides a measure of how much a formula is fulfilled or violated. The calculation of robustness is based on evaluating the robustness of underlying predicates. However, the robustness of predicates is usually defined in a model-free way, i.e., without including the system dynamics. Moreover, it is often nontrivial to define the robustness of complicated predicates precisely. To address these issues, we propose a notion of model predictive robustness, which provides a more systematic way of evaluating robustness compared to previous approaches by considering model-based predictions. In particular, we use Gaussian process regression to learn the robustness based on precomputed predictions so that robustness values can be efficiently computed online. We evaluate our approach for the use case of autonomous driving with predicates used in formalized traffic rules on a recorded dataset, which highlights the advantage of our approach compared to traditional approaches in terms of expressiveness. By incorporating our robustness definitions into a trajectory planner, autonomous vehicles obey traffic rules more robustly than human drivers in the dataset.  ( 2 min )
    Privacy-preserving Federated Learning for Residential Short Term Load Forecasting. (arXiv:2111.09248v3 [cs.LG] UPDATED)
    The inclusion of intermittent and renewable energy sources has increased the importance of demand forecasting in power systems. Smart meters can play a critical role in demand forecasting due to the measurement granularity they provide. Despite their virtue, smart meters used for forecasting face some constraints as consumers' privacy concerns, reluctance of utilities and vendors to share data with competitors or third parties, and regulatory constraints. This paper examines a collaborative machine learning method, federated learning extended with privacy preserving techniques for short-term demand forecasting using smart meter data as a solution to the previous constraints. The combination of privacy preserving techniques and federated learning enables to ensure consumers' confidentiality concerning both their data, the models generated using it (Differential Privacy), and the communication mean (Secure Aggregation). To evaluate this paper's collaborative secure federated learning setting, we explore current literature to select the baseline for our simulations and evaluation. We simulate and evaluate several scenarios that explore how traditional centralized approaches could be projected in the direction of a decentralized, collaborative and private system. The results obtained over the evaluations provided decent performance and in a privacy setting using differential privacy almost perfect privacy budgets (1.39,$10e^{-5}$) and (2.01,$10e^{-5}$) with a negligible performance compromise.  ( 3 min )
    CrypTen: Secure Multi-Party Computation Meets Machine Learning. (arXiv:2109.00984v2 [cs.LG] UPDATED)
    Secure multi-party computation (MPC) allows parties to perform computations on data while keeping that data private. This capability has great potential for machine-learning applications: it facilitates training of machine-learning models on private data sets owned by different parties, evaluation of one party's private model using another party's private data, etc. Although a range of studies implement machine-learning models via secure MPC, such implementations are not yet mainstream. Adoption of secure MPC is hampered by the absence of flexible software frameworks that "speak the language" of machine-learning researchers and engineers. To foster adoption of secure MPC in machine learning, we present CrypTen: a software framework that exposes popular secure MPC primitives via abstractions that are common in modern machine-learning frameworks, such as tensor computations, automatic differentiation, and modular neural networks. This paper describes the design of CrypTen and measure its performance on state-of-the-art models for text classification, speech recognition, and image classification. Our benchmarks show that CrypTen's GPU support and high-performance communication between (an arbitrary number of) parties allows it to perform efficient private evaluation of modern machine-learning models under a semi-honest threat model. For example, two parties using CrypTen can securely predict phonemes in speech recordings using Wav2Letter faster than real-time. We hope that CrypTen will spur adoption of secure MPC in the machine-learning community.
    Privacy-Preserving Distributed Expectation Maximization for Gaussian Mixture Model using Subspace Perturbation. (arXiv:2209.07833v1 [cs.LG])
    Privacy has become a major concern in machine learning. In fact, the federated learning is motivated by the privacy concern as it does not allow to transmit the private data but only intermediate updates. However, federated learning does not always guarantee privacy-preservation as the intermediate updates may also reveal sensitive information. In this paper, we give an explicit information-theoretical analysis of a federated expectation maximization algorithm for Gaussian mixture model and prove that the intermediate updates can cause severe privacy leakage. To address the privacy issue, we propose a fully decentralized privacy-preserving solution, which is able to securely compute the updates in each maximization step. Additionally, we consider two different types of security attacks: the honest-but-curious and eavesdropping adversary models. Numerical validation shows that the proposed approach has superior performance compared to the existing approach in terms of both the accuracy and privacy level.
    A Systematic Evaluation of Node Embedding Robustness. (arXiv:2209.08064v1 [cs.LG])
    Node embedding methods map network nodes to low dimensional vectors that can be subsequently used in a variety of downstream prediction tasks. The popularity of these methods has significantly increased in recent years, yet, their robustness to perturbations of the input data is still poorly understood. In this paper, we assess the empirical robustness of node embedding models to random and adversarial poisoning attacks. Our systematic evaluation covers representative embedding methods based on Skip-Gram, matrix factorization, and deep neural networks. We compare edge addition, deletion and rewiring strategies computed using network properties as well as node labels. We also investigate the effect of label homophily and heterophily on robustness. We report qualitative results via embedding visualization and quantitative results in terms of downstream node classification and network reconstruction performances. We found that node classification suffers from higher performance degradation as opposed to network reconstruction, and that degree-based and label-based attacks are on average the most damaging.  ( 2 min )
    Multi-Modal Pre-Training for Automated Speech Recognition. (arXiv:2110.09890v2 [eess.AS] UPDATED)
    Traditionally, research in automated speech recognition has focused on local-first encoding of audio representations to predict the spoken phonemes in an utterance. Unfortunately, approaches relying on such hyper-local information tend to be vulnerable to both local-level corruption (such as audio-frame drops, or loud noises) and global-level noise (such as environmental noise, or background noise) that has not been seen during training. In this work, we introduce a novel approach which leverages a self-supervised learning technique based on masked language modeling to compute a global, multi-modal encoding of the environment in which the utterance occurs. We then use a new deep-fusion framework to integrate this global context into a traditional ASR method, and demonstrate that the resulting method can outperform baseline methods by up to 7% on Librispeech; gains on internal datasets range from 6% (on larger models) to 45% (on smaller models).  ( 2 min )
    MetaMask: Revisiting Dimensional Confounder for Self-Supervised Learning. (arXiv:2209.07902v1 [cs.LG])
    As a successful approach to self-supervised learning, contrastive learning aims to learn invariant information shared among distortions of the input sample. While contrastive learning has yielded continuous advancements in sampling strategy and architecture design, it still remains two persistent defects: the interference of task-irrelevant information and sample inefficiency, which are related to the recurring existence of trivial constant solutions. From the perspective of dimensional analysis, we find out that the dimensional redundancy and dimensional confounder are the intrinsic issues behind the phenomena, and provide experimental evidence to support our viewpoint. We further propose a simple yet effective approach MetaMask, short for the dimensional Mask learned by Meta-learning, to learn representations against dimensional redundancy and confounder. MetaMask adopts the redundancy-reduction technique to tackle the dimensional redundancy issue and innovatively introduces a dimensional mask to reduce the gradient effects of specific dimensions containing the confounder, which is trained by employing a meta-learning paradigm with the objective of improving the performance of masked representations on a typical self-supervised task. We provide solid theoretical analyses to prove MetaMask can obtain tighter risk bounds for downstream classification compared to typical contrastive methods. Empirically, our method achieves state-of-the-art performance on various benchmarks.  ( 2 min )
    Knowledge-Grounded Self-Rationalization via Extractive and Natural Language Explanations. (arXiv:2106.13876v4 [cs.CL] UPDATED)
    Models that generate extractive rationales (i.e., subsets of features) or natural language explanations (NLEs) for their predictions are important for explainable AI. While an extractive rationale provides a quick view of the features most responsible for a prediction, an NLE allows for a comprehensive description of the decision-making process behind a prediction. However, current models that generate the best extractive rationales or NLEs often fall behind the state-of-the-art (SOTA) in terms of task performance. In this work, we bridge this gap by introducing RExC, a self-rationalizing framework that grounds its predictions and two complementary types of explanations (NLEs and extractive rationales) in background knowledge. Our framework improves over previous methods by: (i) reaching SOTA task performance while also providing explanations, (ii) providing two types of explanations, while existing models usually provide only one type, and (iii) beating by a large margin the previous SOTA in terms of quality of both types of explanations. Furthermore, a perturbation analysis in RExC shows a high degree of association between explanations and predictions, a necessary property of faithful explanations.  ( 3 min )
    Multimodal Audio-Visual Information Fusion using Canonical-Correlated Graph Neural Network for Energy-Efficient Speech Enhancement. (arXiv:2202.04528v3 [cs.SD] UPDATED)
    This paper proposes a novel multimodal self-supervised architecture for energy-efficient audio-visual (AV) speech enhancement that integrates Graph Neural Networks with canonical correlation analysis (CCA-GNN). The proposed approach lays its foundations on a state-of-the-art CCA-GNN that learns representative embeddings by maximizing the correlation between pairs of augmented views of the same input while decorrelating disconnected features. The key idea of the conventional CCA-GNN involves discarding augmentation-variant information and preserving augmentation-invariant information while preventing capturing of redundant information. Our proposed AV CCA-GNN model deals with multimodal representation learning context. Specifically, our model improves contextual AV speech processing by maximizing canonical correlation from augmented views of the same channel and canonical correlation from audio and visual embeddings. In addition, it proposes a positional node encoding that considers a prior-frame sequence distance instead of a feature-space representation when computing the node's nearest neighbors, introducing temporal information in the embeddings through the neighborhood's connectivity. Experiments conducted on the benchmark ChiME3 dataset show that our proposed prior frame-based AV CCA-GNN ensures better feature learning in the temporal context, leading to more energy-efficient speech reconstruction than state-of-the-art CCA-GNN and multilayer perceptron.  ( 3 min )
    PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation. (arXiv:2209.07752v1 [cs.CL])
    A personification is a figure of speech that endows inanimate entities with properties and actions typically seen as requiring animacy. In this paper, we explore the task of personification generation. To this end, we propose PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation. We curate a corpus of personifications called PersonifCorp, together with automatically generated de-personified literalizations of these personifications. We demonstrate the usefulness of this parallel corpus by training a seq2seq model to personify a given literal input. Both automatic and human evaluations show that fine-tuning with PersonifCorp leads to significant gains in personification-related qualities such as animacy and interestingness. A detailed qualitative analysis also highlights key strengths and imperfections of PINEAPPLE over baselines, demonstrating a strong ability to generate diverse and creative personifications that enhance the overall appeal of a sentence.  ( 2 min )
    Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask. (arXiv:2209.07617v1 [cs.LG])
    Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DNNs). Among different categories of sparsity, structured sparsity has gained more attention due to its efficient execution on modern accelerators. Particularly, N:M sparsity is attractive because there are already hardware accelerator architectures that can leverage certain forms of N:M structured sparsity to yield higher compute-efficiency. In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs). Building upon this study, we propose two new decay-based pruning methods, namely "pruning mask decay" and "sparse structure decay". Our evaluations indicate that these proposed methods consistently deliver state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity, on a Transformer-based model for a translation task. The increase in the accuracy of the sparse model using the new training recipes comes at the cost of marginal increase in the total training compute (FLOPs).  ( 2 min )
    Reducing Variance in Temporal-Difference Value Estimation via Ensemble of Deep Networks. (arXiv:2209.07670v1 [cs.LG])
    In temporal-difference reinforcement learning algorithms, variance in value estimation can cause instability and overestimation of the maximal target value. Many algorithms have been proposed to reduce overestimation, including several recent ensemble methods, however none have shown success in sample-efficient learning through addressing estimation variance as the root cause of overestimation. In this paper, we propose MeanQ, a simple ensemble method that estimates target values as ensemble means. Despite its simplicity, MeanQ shows remarkable sample efficiency in experiments on the Atari Learning Environment benchmark. Importantly, we find that an ensemble of size 5 sufficiently reduces estimation variance to obviate the lagging target network, eliminating it as a source of bias and further gaining sample efficiency. We justify intuitively and empirically the design choices in MeanQ, including the necessity of independent experience sampling. On a set of 26 benchmark Atari environments, MeanQ outperforms all tested baselines, including the best available baseline, SUNRISE, at 100K interaction steps in 16/26 environments, and by 68% on average. MeanQ also outperforms Rainbow DQN at 500K steps in 21/26 environments, and by 49% on average, and achieves average human-level performance using 200K ($\pm$100K) interaction steps. Our implementation is available at https://github.com/indylab/MeanQ.  ( 2 min )
    Self-Supervised Learning with an Information Maximization Criterion. (arXiv:2209.07999v1 [cs.LG])
    Self-supervised learning allows AI systems to learn effective representations from large amounts of data using tasks that do not require costly labeling. Mode collapse, i.e., the model producing identical representations for all inputs, is a central problem to many self-supervised learning approaches, making self-supervised tasks, such as matching distorted variants of the inputs, ineffective. In this article, we argue that a straightforward application of information maximization among alternative latent representations of the same input naturally solves the collapse problem and achieves competitive empirical results. We propose a self-supervised learning method, CorInfoMax, that uses a second-order statistics-based mutual information measure that reflects the level of correlation among its arguments. Maximizing this correlative information measure between alternative representations of the same input serves two purposes: (1) it avoids the collapse problem by generating feature vectors with non-degenerate covariances; (2) it establishes relevance among alternative representations by increasing the linear dependence among them. An approximation of the proposed information maximization objective simplifies to a Euclidean distance-based objective function regularized by the log-determinant of the feature covariance matrix. The regularization term acts as a natural barrier against feature space degeneracy. Consequently, beyond avoiding complete output collapse to a single point, the proposed approach also prevents dimensional collapse by encouraging the spread of information across the whole feature space. Numerical experiments demonstrate that CorInfoMax achieves better or competitive performance results relative to the state-of-the-art SSL approaches.  ( 3 min )
    Joint estimation of posterior probability and propensity score function for positive and unlabelled data. (arXiv:2209.07787v1 [stat.ML])
    Positive and unlabelled learning is an important problem which arises naturally in many applications. The significant limitation of almost all existing methods lies in assuming that the propensity score function is constant (SCAR assumption), which is unrealistic in many practical situations. Avoiding this assumption, we consider parametric approach to the problem of joint estimation of posterior probability and propensity score functions. We show that under mild assumptions when both functions have the same parametric form (e.g. logistic with different parameters) the corresponding parameters are identifiable. Motivated by this, we propose two approaches to their estimation: joint maximum likelihood method and the second approach based on alternating maximization of two Fisher consistent expressions. Our experimental results show that the proposed methods are comparable or better than the existing methods based on Expectation-Maximisation scheme.  ( 2 min )
    More Interpretable Graph Similarity Computation via Maximum Common Subgraph Inference. (arXiv:2208.04580v3 [cs.LG] UPDATED)
    Graph similarity measurement, which computes the distance/similarity between two graphs, arises in various graph-related tasks. Recent learning-based methods lack interpretability, as they directly transform interaction information between two graphs into one hidden vector and then map it to similarity. To cope with this problem, this study proposes a more interpretable end-to-end paradigm for graph similarity learning, named Similarity Computation via Maximum Common Subgraph Inference (INFMCS). Our critical insight into INFMCS is the strong correlation between similarity score and Maximum Common Subgraph (MCS). We implicitly infer MCS to obtain the normalized MCS size, with the supervision information being only the similarity score during training. To capture more global information, we also stack some vanilla transformer encoder layers with graph convolution layers and propose a novel permutation-invariant node Positional Encoding. The entire model is quite simple yet effective. Comprehensive experiments demonstrate that INFMCS consistently outperforms state-of-the-art baselines for graph-graph classification and regression tasks. Ablation experiments verify the effectiveness of the proposed computation paradigm and other components. Also, visualization and statistics of results reveal the interpretability of INFMCS.  ( 2 min )
    Enhancing Video Analytics Accuracy via Real-time Automated Camera Parameter Tuning. (arXiv:2107.03964v4 [cs.LG] UPDATED)
    In Video Analytics Pipelines (VAP), Analytics Units (AUs) such as object detection and face recognition running on remote servers critically rely on surveillance cameras to capture high-quality video streams in order to achieve high accuracy. Modern IP cameras come with a large number of camera parameters that directly affect the quality of the video stream capture. While a few of such parameters, e.g., exposure, focus, white balance are automatically adjusted by the camera internally, the remaining ones are not. We denote such camera parameters as non-automated (NAUTO) parameters. In this paper, we first show that environmental condition changes can have significant adverse effect on the accuracy of insights from the AUs, but such adverse impact can potentially be mitigated by dynamically adjusting NAUTO camera parameters in response to changes in environmental conditions. We then present CamTuner, to our knowledge, the first framework that dynamically adapts NAUTO camera parameters to optimize the accuracy of AUs in a VAP in response to adverse changes in environmental conditions. CamTuner is based on SARSA reinforcement learning and it incorporates two novel components: a light-weight analytics quality estimator and a virtual camera that drastically speed up offline RL training. Our controlled experiments and real-world VAP deployment show that compared to a VAP using the default camera setting, CamTuner enhances VAP accuracy by detecting 15.9% additional persons and 2.6%-4.2% additional cars (without any false positives) in a large enterprise parking lot and 9.7% additional cars in a 5G smart traffic intersection scenario, which enables a new usecase of accurate and reliable automatic vehicle collision prediction (AVCP). CamTuner opens doors for new ways to significantly enhance video analytics accuracy beyond incremental improvements from refining deep-learning models.  ( 3 min )
    Masked Imitation Learning: Discovering Environment-Invariant Modalities in Multimodal Demonstrations. (arXiv:2209.07682v1 [cs.LG])
    Multimodal demonstrations provide robots with an abundance of information to make sense of the world. However, such abundance may not always lead to good performance when it comes to learning sensorimotor control policies from human demonstrations. Extraneous data modalities can lead to state over-specification, where the state contains modalities that are not only useless for decision-making but also can change data distribution across environments. State over-specification leads to issues such as the learned policy not generalizing outside of the training data distribution. In this work, we propose Masked Imitation Learning (MIL) to address state over-specification by selectively using informative modalities. Specifically, we design a masked policy network with a binary mask to block certain modalities. We develop a bi-level optimization algorithm that learns this mask to accurately filter over-specified modalities. We demonstrate empirically that MIL outperforms baseline algorithms in simulated domains including MuJoCo and a robot arm environment using the Robomimic dataset, and effectively recovers the environment-invariant modalities on a multimodal dataset collected on a real robot. Our project website presents supplemental details and videos of our results at: https://tinyurl.com/masked-il  ( 2 min )
    On the Relation between Sensitivity and Accuracy in In-context Learning. (arXiv:2209.07661v1 [cs.CL])
    In-context learning (ICL) suffers from oversensitivity to the prompt, which makes it unreliable in real-world scenarios. We study the sensitivity of ICL with respect to multiple types of perturbations. First, we find that label bias obscures true ICL sensitivity, and hence prior work may have significantly underestimated the true ICL sensitivity. Second, we observe a strong negative correlation between ICL sensitivity and accuracy, with sensitive predictions less likely to be correct. Motivated by these observations, we propose \textsc{SenSel}, a few-shot selective prediction method based on ICL sensitivity. Experiments on ten classification benchmarks show that \textsc{SenSel} consistently outperforms a commonly used confidence-based selective prediction baseline.  ( 2 min )
    Automatic Tumor Segmentation via False Positive Reduction Network for Whole-Body Multi-Modal PET/CT Images. (arXiv:2209.07705v1 [eess.IV])
    Multi-modality Fluorodeoxyglucose (FDG) positron emission tomography / computed tomography (PET/CT) has been routinely used in the assessment of common cancers, such as lung cancer, lymphoma, and melanoma. This is mainly attributed to the fact that PET/CT combines the high sensitivity for tumor detection of PET and anatomical information from CT. In PET/CT image assessment, automatic tumor segmentation is an important step, and in recent years, deep learning based methods have become the state-of-the-art. Unfortunately, existing methods tend to over-segment the tumor regions and include regions such as the normal high uptake organs, inflammation, and other infections. In this study, we introduce a false positive reduction network to overcome this limitation. We firstly introduced a self-supervised pre-trained global segmentation module to coarsely delineate the candidate tumor regions using a self-supervised pre-trained encoder. The candidate tumor regions were then refined by removing false positives via a local refinement module. Our experiments with the MICCAI 2022 Automated Lesion Segmentation in Whole-Body FDG-PET/CT (AutoPET) challenge dataset showed that our method achieved a dice score of 0.9324 with the preliminary testing data and was ranked 1st place in dice on the leaderboard. Our method was also ranked in the top 7 methods on the final testing data, the final ranking will be announced during the 2022 MICCAI AutoPET workshop. Our code is available at: https://github.com/YigePeng/AutoPET_False_Positive_Reduction.  ( 3 min )
    Statistical Properties of the Entropy from Ordinal Patterns. (arXiv:2209.07650v1 [cs.IT])
    The ultimate purpose of the statistical analysis of ordinal patterns is to characterize the distribution of the features they induce. In particular, knowing the joint distribution of the pair Entropy-Statistical Complexity for a large class of time series models would allow statistical tests that are unavailable to date. Working in this direction, we characterize the asymptotic distribution of the empirical Shannon's Entropy for any model under which the true normalized Entropy is neither zero nor one. We obtain the asymptotic distribution from the Central Limit Theorem (assuming large time series), the Multivariate Delta Method, and a third-order correction of its mean value. We discuss the applicability of other results (exact, first-, and second-order corrections) regarding their accuracy and numerical stability. Within a general framework for building test statistics about Shannon's Entropy, we present a bilateral test that verifies if there is enough evidence to reject the hypothesis that two signals produce ordinal patterns with the same Shannon's Entropy. We applied this bilateral test to the daily maximum temperature time series from three cities (Dublin, Edinburgh, and Miami) and obtained sensible results.  ( 2 min )
    Hub-aware Random Walk Graph Embedding Methods for Classification. (arXiv:2209.07603v1 [cs.LG])
    In the last two decades we are witnessing a huge increase of valuable big data structured in the form of graphs or networks. To apply traditional machine learning and data analytic techniques to such data it is necessary to transform graphs into vector-based representations that preserve the most essential structural properties of graphs. For this purpose, a large number of graph embedding methods have been proposed in the literature. Most of them produce general-purpose embeddings suitable for a variety of applications such as node clustering, node classification, graph visualisation and link prediction. In this paper, we propose two novel graph embedding algorithms based on random walks that are specifically designed for the node classification problem. Random walk sampling strategies of the proposed algorithms have been designed to pay special attention to hubs -- high-degree nodes that have the most critical role for the overall connectedness in large-scale graphs. The proposed methods are experimentally evaluated by analyzing the classification performance of three classification algorithms trained on embeddings of real-world networks. The obtained results indicate that our methods considerably improve the predictive power of examined classifiers compared to currently the most popular random walk method for generating general-purpose graph embeddings (node2vec).  ( 2 min )
    Extracting Biomedical Factual Knowledge Using Pretrained Language Model and Electronic Health Record Context. (arXiv:2209.07859v1 [cs.IR])
    Language Models (LMs) have performed well on biomedical natural language processing applications. In this study, we conducted some experiments to use prompt methods to extract knowledge from LMs as new knowledge Bases (LMs as KBs). However, prompting can only be used as a low bound for knowledge extraction, and perform particularly poorly on biomedical domain KBs. In order to make LMs as KBs more in line with the actual application scenarios of the biomedical domain, we specifically add EHR notes as context to the prompt to improve the low bound in the biomedical domain. We design and validate a series of experiments for our Dynamic-Context-BioLAMA task. Our experiments show that the knowledge possessed by those language models can distinguish the correct knowledge from the noise knowledge in the EHR notes, and such distinguishing ability can also be used as a new metric to evaluate the amount of knowledge possessed by the model.  ( 2 min )
    Capturing Shape Information with Multi-Scale Topological Loss Terms for 3D Reconstruction. (arXiv:2203.01703v3 [cs.CV] UPDATED)
    Reconstructing 3D objects from 2D images is both challenging for our brains and machine learning algorithms. To support this spatial reasoning task, contextual information about the overall shape of an object is critical. However, such information is not captured by established loss terms (e.g. Dice loss). We propose to complement geometrical shape information by including multi-scale topological features, such as connected components, cycles, and voids, in the reconstruction loss. Our method uses cubical complexes to calculate topological features of 3D volume data and employs an optimal transport distance to guide the reconstruction process. This topology-aware loss is fully differentiable, computationally efficient, and can be added to any neural network. We demonstrate the utility of our loss by incorporating it into SHAPR, a model for predicting the 3D cell shape of individual cells based on 2D microscopy images. Using a hybrid loss that leverages both geometrical and topological information of single objects to assess their shape, we find that topological information substantially improves the quality of reconstructions, thus highlighting its ability to extract more relevant features from image datasets.  ( 3 min )
    Examining spatial heterogeneity of ridesourcing demand determinants with explainable machine learning. (arXiv:2209.07980v1 [cs.LG])
    The growing significance of ridesourcing services in recent years suggests a need to examine the key determinants of ridesourcing demand. However, little is known regarding the nonlinear effects and spatial heterogeneity of ridesourcing demand determinants. This study applies an explainable-machine-learning-based analytical framework to identify the key factors that shape ridesourcing demand and to explore their nonlinear associations across various spatial contexts (airport, downtown, and neighborhood). We use the ridesourcing-trip data in Chicago for empirical analysis. The results reveal that the importance of built environment varies across spatial contexts, and it collectively contributes the largest importance in predicting ridesourcing demand for airport trips. Additionally, the nonlinear effects of built environment on ridesourcing demand show strong spatial variations. Ridesourcing demand is usually most responsive to the built environment changes for downtown trips, followed by neighborhood trips and airport trips. These findings offer transportation professionals nuanced insights for managing ridesourcing services.  ( 2 min )
    FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting. (arXiv:2205.08897v4 [cs.LG] UPDATED)
    Recent studies have shown that deep learning models such as RNNs and Transformers have brought significant performance gains for long-term forecasting of time series because they effectively utilize historical information. We found, however, that there is still great room for improvement in how to preserve historical information in neural networks while avoiding overfitting to noise presented in the history. Addressing this allows better utilization of the capabilities of deep learning models. To this end, we design a \textbf{F}requency \textbf{i}mproved \textbf{L}egendre \textbf{M}emory model, or {\bf FiLM}: it applies Legendre Polynomials projections to approximate historical information, uses Fourier projection to remove noise, and adds a low-rank approximation to speed up computation. Our empirical studies show that the proposed FiLM significantly improves the accuracy of state-of-the-art models in multivariate and univariate long-term forecasting by (\textbf{20.3\%}, \textbf{22.6\%}), respectively. We also demonstrate that the representation module developed in this work can be used as a general plug-in to improve the long-term prediction performance of other deep learning modules. Code is available at https://github.com/tianzhou2011/FiLM/  ( 3 min )
    Explainability in subgraphs-enhanced Graph Neural Networks. (arXiv:2209.07926v1 [cs.LG])
    Recently, subgraphs-enhanced Graph Neural Networks (SGNNs) have been introduced to enhance the expressive power of Graph Neural Networks (GNNs), which was proved to be not higher than the 1-dimensional Weisfeiler-Leman isomorphism test. The new paradigm suggests using subgraphs extracted from the input graph to improve the model's expressiveness, but the additional complexity exacerbates an already challenging problem in GNNs: explaining their predictions. In this work, we adapt PGExplainer, one of the most recent explainers for GNNs, to SGNNs. The proposed explainer accounts for the contribution of all the different subgraphs and can produce a meaningful explanation that humans can interpret. The experiments that we performed both on real and synthetic datasets show that our framework is successful in explaining the decision process of an SGNN on graph classification tasks.  ( 2 min )
    Optimal binning: mathematical programming formulation. (arXiv:2001.08025v2 [cs.LG] UPDATED)
    The optimal binning is the optimal discretization of a variable into bins given a discrete or continuous numeric target. We present a rigorous and extensible mathematical programming formulation for solving the optimal binning problem for a binary, continuous and multi-class target type, incorporating constraints not previously addressed. For all three target types, we introduce a convex mixed-integer programming formulation. Several algorithmic enhancements, such as automatic determination of the most suitable monotonic trend via a Machine-Learning-based classifier and implementation aspects are thoughtfully discussed. The new mathematical programming formulations are carefully implemented in the open-source python library OptBinning.  ( 2 min )
    Factorizable Joint Shift in Multinomial Classification. (arXiv:2207.14514v2 [stat.ML] UPDATED)
    Factorizable joint shift (FJS) was recently proposed as a type of dataset shift for which the complete characteristics can be estimated from feature data observations on the test dataset by a method called Joint Importance Aligning. For the multinomial (multiclass) classification setting, we derive a representation of factorizable joint shift in terms of the source (training) distribution, the target (test) prior class probabilities and the target marginal distribution of the features. On the basis of this result, we propose alternatives to joint importance aligning and, at the same time, point out that factorizable joint shift is not fully identifiable if no class label information on the test dataset is available and no additional assumptions are made. Other results of the paper include correction formulae for the posterior class probabilities both under general dataset shift and factorizable joint shift. In addition, we investigate the consequences of assuming factorizable joint shift for the bias caused by sample selection.  ( 2 min )
    A Survey on the application of Data Science And Analytics in the field of Organised Sports. (arXiv:2209.07528v1 [cs.LG])
    The application of Data Science and Analytics to optimize or predict outcomes is Ubiquitous in the Modern World. Data Science and Analytics have optimized almost every domain that exists in the market. In our survey, we focus on how the field of Analytics has been adopted in the field of sports, and how it has contributed to the transformation of the game right from the assessment of on-field players and their selection to the prediction of winning team and to the marketing of tickets and business aspects of big sports tournaments. We will present the analytical tools, algorithms, and methodologies adopted in the field of Sports Analytics for different sports and also present our views on the same and we will also compare and contrast these existing approaches. By doing so, we will also present the best tools, algorithms, and analytical methodologies to be considered by anyone who is looking to experiment with sports data and analyze various aspects of the game.  ( 2 min )
    What can be learnt with wide convolutional networkds?. (arXiv:2208.01003v2 [stat.ML] UPDATED)
    Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g. the rate of decay of the generalisation error with the number of training samples. In this paper, we study deep CNNs in the kernel regime. First, we show that the spectrum of the corresponding kernel inherits the hierarchical structure of the network, and we characterise its asymptotics. Then, we use this result together with generalisation bounds to prove that deep CNNs adapt to the spatial scale of the target function. In particular, we find that if the target function depends on low-dimensional subsets of adjacent input variables, then the rate of decay of the error is controlled by the effective dimensionality of these subsets. Conversely, if the teacher function depends on the full set of input variables, then the error rate is inversely proportional to the input dimension. We conclude by computing the rate when a deep CNN is trained on the output of another deep CNN with randomly-initialised parameters. Interestingly, we find that despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.  ( 3 min )
    Stability and Generalization for Markov Chain Stochastic Gradient Methods. (arXiv:2209.08005v1 [stat.ML])
    Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory. For empirical risk minimization (ERM) problems, we establish the optimal excess population risk bounds for both smooth and non-smooth cases by introducing on-average argument stability. For minimax problems, we develop a quantitative connection between on-average argument stability and generalization error which extends the existing results for uniform stability \cite{lei2021stability}. We further develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability, which, combined with our stability results, show that the optimal generalization bounds can be attained for both smooth and non-smooth cases. To the best of our knowledge, this is the first generalization analysis of SGMs when the gradients are sampled from a Markov process.  ( 2 min )
    PTab: Using the Pre-trained Language Model for Modeling Tabular Data. (arXiv:2209.08060v1 [cs.LG])
    Tabular data is the foundation of the information age and has been extensively studied. Recent studies show that neural-based models are effective in learning contextual representation for tabular data. The learning of an effective contextual representation requires meaningful features and a large amount of data. However, current methods often fail to properly learn a contextual representation from the features without semantic information. In addition, it's intractable to enlarge the training set through mixed tabular datasets due to the difference between datasets. To address these problems, we propose a novel framework PTab, using the Pre-trained language model to model Tabular data. PTab learns a contextual representation of tabular data through a three-stage processing: Modality Transformation(MT), Masked-Language Fine-tuning(MF), and Classification Fine-tuning(CF). We initialize our model with a pre-trained Model (PTM) which contains semantic information learned from the large-scale language data. Consequently, contextual representation can be learned effectively during the fine-tuning stages. In addition, we can naturally mix the textualized tabular data to enlarge the training set to further improve representation learning. We evaluate PTab on eight popular tabular classification datasets. Experimental results show that our method has achieved a better average AUC score in supervised settings compared to the state-of-the-art baselines(e.g. XGBoost), and outperforms counterpart methods under semi-supervised settings. We present visualization results that show PTab has well instance-based interpretability.  ( 3 min )
    TransTab: Learning Transferable Tabular Transformers Across Tables. (arXiv:2205.09328v2 [cs.LG] UPDATED)
    Tabular data (or tables) are the most widely used data format in machine learning (ML). However, ML models often assume the table structure keeps fixed in training and testing. Before ML modeling, heavy data cleaning is required to merge disparate tables with different columns. This preprocessing often incurs significant data waste (e.g., removing unmatched columns and samples). How to learn ML models from multiple tables with partially overlapping columns? How to incrementally update ML models as more columns become available over time? Can we leverage model pretraining on multiple distinct tables? How to train an ML model which can predict on an unseen table? To answer all those questions, we propose to relax fixed table structures by introducing a Transferable Tabular Transformer (TransTab) for tables. The goal of TransTab is to convert each sample (a row in the table) to a generalizable embedding vector, and then apply stacked transformers for feature encoding. One methodology insight is combining column description and table cells as the raw input to a gated transformer model. The other insight is to introduce supervised and self-supervised pretraining to improve model performance. We compare TransTab with multiple baseline methods on diverse benchmark datasets and five oncology clinical trial datasets. Overall, TransTab ranks 1.00, 1.00, 1.78 out of 12 methods in supervised learning, feature incremental learning, and transfer learning scenarios, respectively; and the proposed pretraining leads to 2.3% AUC lift on average over the supervised learning.  ( 3 min )
    Exploring the Whole Rashomon Set of Sparse Decision Trees. (arXiv:2209.08040v1 [cs.LG])
    In any given machine learning problem, there may be many models that could explain the data almost equally well. However, most learning algorithms return only one of these models, leaving practitioners with no practical way to explore alternative models that might have desirable properties beyond what could be expressed within a loss function. The Rashomon set is the set of these all almost-optimal models. Rashomon sets can be extremely complicated, particularly for highly nonlinear function classes that allow complex interaction terms, such as decision trees. We provide the first technique for completely enumerating the Rashomon set for sparse decision trees; in fact, our work provides the first complete enumeration of any Rashomon set for a non-trivial problem with a highly nonlinear discrete function class. This allows the user an unprecedented level of control over model choice among all models that are approximately equally good. We represent the Rashomon set in a specialized data structure that supports efficient querying and sampling. We show three applications of the Rashomon set: 1) it can be used to study variable importance for the set of almost-optimal trees (as opposed to a single tree), 2) the Rashomon set for accuracy enables enumeration of the Rashomon sets for balanced accuracy and F1-score, and 3) the Rashomon set for a full dataset can be used to produce Rashomon sets constructed with only subsets of the data set. Thus, we are able to examine Rashomon sets across problems with a new lens, enabling users to choose models rather than be at the mercy of an algorithm that produces only a single model.  ( 3 min )
    Mining SoC Message Flows with Attention Model. (arXiv:2209.07929v1 [cs.AI])
    High-quality system-level message flow specifications are necessary for comprehensive validation of system-on-chip (SoC) designs. However, manual development and maintenance of such specifications are daunting tasks. We propose a disruptive method that utilizes deep sequence modeling with the attention mechanism to infer accurate flow specifications from SoC communication traces. The proposed method can overcome the inherent complexity of SoC traces induced by the concurrent executions of SoC designs that existing mining tools often find extremely challenging. We conduct experiments on five highly concurrent traces and find that the proposed approach outperforms several existing state-of-the-art trace mining tools.  ( 2 min )
    ImDrug: A Benchmark for Deep Imbalanced Learning in AI-aided Drug Discovery. (arXiv:2209.07921v1 [cs.LG])
    The last decade has witnessed a prosperous development of computational methods and dataset curation for AI-aided drug discovery (AIDD). However, real-world pharmaceutical datasets often exhibit highly imbalanced distribution, which is largely overlooked by the current literature but may severely compromise the fairness and generalization of machine learning applications. Motivated by this observation, we introduce ImDrug, a comprehensive benchmark with an open-source Python library which consists of 4 imbalance settings, 11 AI-ready datasets, 54 learning tasks and 16 baseline algorithms tailored for imbalanced learning. It provides an accessible and customizable testbed for problems and solutions spanning a broad spectrum of the drug discovery pipeline such as molecular modeling, drug-target interaction and retrosynthesis. We conduct extensive empirical studies with novel evaluation metrics, to demonstrate that the existing algorithms fall short of solving medicinal and pharmaceutical challenges in the data imbalance scenario. We believe that ImDrug opens up avenues for future research and development, on real-world challenges at the intersection of AIDD and deep imbalanced learning.  ( 2 min )
    On the Robustness of Graph Neural Diffusion to Topology Perturbations. (arXiv:2209.07754v1 [cs.LG])
    Neural diffusion on graphs is a novel class of graph neural networks that has attracted increasing attention recently. The capability of graph neural partial differential equations (PDEs) in addressing common hurdles of graph neural networks (GNNs), such as the problems of over-smoothing and bottlenecks, has been investigated but not their robustness to adversarial attacks. In this work, we explore the robustness properties of graph neural PDEs. We empirically demonstrate that graph neural PDEs are intrinsically more robust against topology perturbation as compared to other GNNs. We provide insights into this phenomenon by exploiting the stability of the heat semigroup under graph topology perturbations. We discuss various graph diffusion operators and relate them to existing graph neural PDEs. Furthermore, we propose a general graph neural PDE framework based on which a new class of robust GNNs can be defined. We verify that the new model achieves comparable state-of-the-art performance on several benchmark datasets.
    Maximum Likelihood Training of Implicit Nonlinear Diffusion Models. (arXiv:2205.13699v2 [cs.LG] UPDATED)
    Whereas diverse variations of diffusion models exist, expanding the linear diffusion into a nonlinear diffusion process is investigated only by a few works. The nonlinearity effect has been hardly understood, but intuitively, there would be more promising diffusion patterns to optimally train the generative distribution towards the data distribution. This paper introduces such a data-adaptive and nonlinear diffusion process for score-based diffusion models. The proposed Implicit Nonlinear Diffusion Model (INDM) learns the nonlinear diffusion process by combining a normalizing flow and a diffusion process. Specifically, INDM implicitly constructs a nonlinear diffusion on the \textit{data space} by leveraging a linear diffusion on the \textit{latent space} through a flow network. This flow network is the key to forming a nonlinear diffusion as the nonlinearity fully depends on the flow network. This flexible nonlinearity is what improves the learning curve of INDM to nearly Maximum Likelihood Estimation (MLE) training, against the non-MLE training of DDPM++, which turns out to be a special case of INDM with the identity flow. Also, training the nonlinear diffusion yields the sampling robustness by the discretization step sizes. In experiments, INDM achieves the state-of-the-art FID on CelebA.  ( 3 min )
    Library transfer between distinct Laser-Induced Breakdown Spectroscopy systems with shared standards. (arXiv:2209.07637v1 [physics.data-an])
    The mutual incompatibility of distinct spectroscopic systems is among the most limiting factors in Laser-Induced Breakdown Spectroscopy (LIBS). The cost related to setting up a new LIBS system is increased, as its extensive calibration is required. Solving the problem would enable inter-laboratory reference measurements and shared spectral libraries, which are fundamental for other spectroscopic techniques. In this work, we study a simplified version of this challenge where LIBS systems differ only in used spectrometers and collection optics but share all other parts of the apparatus, and collect spectra simultaneously from the same plasma plume. Extensive datasets measured as hyperspectral images of heterogeneous specimens are used to train machine learning models that can transfer spectra between systems. The transfer is realized by a pipeline that consists of a variational autoencoder (VAE) and a fully-connected artificial neural network (ANN). In the first step, we obtain a latent representation of the spectra which were measured on the Primary system (by using the VAE). In the second step, we map spectra from the Secondary system to corresponding locations in the latent space (by the ANN). Finally, Secondary system spectra are reconstructed from the latent space to the space of the Primary system. The transfer is evaluated by several figures of merit (Euclidean and cosine distances, both spatially resolved; k-means clustering of transferred spectra). The methodology is compared to several baseline approaches.  ( 3 min )
    Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango. (arXiv:2209.07686v1 [cs.CL])
    Reasoning is a key pillar of human cognition and intelligence. In the past decade, we witnessed dramatic gains in natural language processing and unprecedented scaling of large language models. Recent work has characterized the capability of few-shot prompting techniques such as chain of thought to emulate human reasoning in large language models. This hallmark feature of few-shot prompting, combined with ever scaling language models, opened a vista of possibilities to solve various tasks, such as math word problems, code completion, and commonsense reasoning. Chain of thought (CoT) prompting further pushes the performance of models in a few-shot setup, by supplying intermediate steps and urging the model to follow the same process. Despite its compelling performance, the genesis of reasoning capability in these models is less explored. This work initiates the preliminary steps towards a deeper understanding of reasoning mechanisms in large language models. Our work centers around querying the model while controlling for all but one of the components in a prompt: symbols, patterns, and text. We then analyze the performance divergence across the queries. Our results suggest the presence of factual patterns in a prompt is not necessary for the success of CoT. Nonetheless, we empirically show that relying solely on patterns is also insufficient for high quality results. We posit that text imbues patterns with commonsense knowledge and meaning. Our exhaustive empirical analysis provides qualitative examples of the symbiotic relationship between text and patterns. Such systematic understanding of CoT enables us to devise concise chain of thought, dubbed as CCoT, where text and patterns are pruned to only retain their key roles, while delivering on par or slightly higher solve task rate.  ( 3 min )
    Learning Policies for Continuous Control via Transition Models. (arXiv:2209.08033v1 [cs.RO])
    It is doubtful that animals have perfect inverse models of their limbs (e.g., what muscle contraction must be applied to every joint to reach a particular location in space). However, in robot control, moving an arm's end-effector to a target position or along a target trajectory requires accurate forward and inverse models. Here we show that by learning the transition (forward) model from interaction, we can use it to drive the learning of an amortized policy. Hence, we revisit policy optimization in relation to the deep active inference framework and describe a modular neural network architecture that simultaneously learns the system dynamics from prediction errors and the stochastic policy that generates suitable continuous control commands to reach a desired reference position. We evaluated the model by comparing it against the baseline of a linear quadratic regulator, and conclude with additional steps to take toward human-like motor control.
    Evolutionary Action Selection for Gradient-based Policy Learning. (arXiv:2201.04286v4 [cs.NE] UPDATED)
    Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been integrated to take the advantage of the both methods for better exploration and exploitation.The evolutionary part in these hybrid methods maintains a population of policy networks.However, existing methods focus on optimizing the parameters of policy network, which is usually high-dimensional and tricky for EA.In this paper, we shift the target of evolution from high-dimensional parameter space to low-dimensional action space.We propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel hybrid method of EA and DRL.In EAS, we focus on optimizing the action chosen by the policy network and attempt to obtain high-quality actions to promote policy learning through an evolutionary algorithm. We conduct several experiments on challenging continuous control tasks.The result shows that EAS-TD3 shows superior performance over other state-of-art methods.  ( 2 min )
    Improving Language Model Prompting in Support of Semi-autonomous Task Learning. (arXiv:2209.07636v1 [cs.LG])
    Language models (LLMs) offer potential as a source of knowledge for agents that need to acquire new task competencies within a performance environment. We describe efforts toward a novel agent capability that can construct cues (or "prompts") that result in useful LLM responses for an agent learning a new task. Importantly, responses must not only be "reasonable" (a measure used commonly in research on knowledge extraction from LLMs) but also specific to the agent's task context and in a form that the agent can interpret given its native language capacities. We summarize a series of empirical investigations of prompting strategies and evaluate responses against the goals of targeted and actionable responses for task learning. Our results demonstrate that actionable task knowledge can be obtained from LLMs in support of online agent task learning.  ( 2 min )
    Causal Fourier Analysis on Directed Acyclic Graphs and Posets. (arXiv:2209.07970v1 [eess.SP])
    We present a novel form of Fourier analysis, and associated signal processing concepts, for signals (or data) indexed by edge-weighted directed acyclic graphs (DAGs). This means that our Fourier basis yields an eigendecomposition of a suitable notion of shift and convolution operators that we define. DAGs are the common model to capture causal relationships between data and our framework is causal in that shift, convolution, and Fourier transform are computed only from predecessors in the DAG. The Fourier transform requires the transitive closure of the DAG for which several forms are possible depending on the interpretation of the edge weights. Examples include level of influence, distance, or pollution distribution. Our framework is different from prior GSP: it is specific to DAGs and leverages, and extends, the classical theory of Moebius inversion from combinatorics. For a prototypical application we consider DAGs modeling dynamic networks in which edges change over time. Specifically, we model the spread of an infection on such a DAG obtained from real-world contact tracing data and learn the infection signal from samples assuming sparsity in the Fourier domain.  ( 2 min )
    Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst. (arXiv:2209.07629v1 [cs.SD])
    The technical report presents our emotion recognition pipeline for high-dimensional emotion task (A-VB High) in The ACII Affective Vocal Bursts (A-VB) 2022 Workshop \& Competition. Our proposed method contains three stages. Firstly, we extract the latent features from the raw audio signal and its Mel-spectrogram by self-supervised learning methods. Then, the features from the raw signal are fed to the self-relation attention and temporal awareness (SA-TA) module for learning the valuable information between these latent features. Finally, we concatenate all the features and utilize a fully-connected layer to predict each emotion's score. By empirical experiments, our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, compared to 0.5686 on the baseline model. The code of our method is available at https://github.com/linhtd812/A-VB2022.  ( 2 min )
    Malicious Source Code Detection Using Transformer. (arXiv:2209.07957v1 [cs.CR])
    Open source code is considered a common practice in modern software development. However, reusing other code allows bad actors to access a wide developers' community, hence the products that rely on it. Those attacks are categorized as supply chain attacks. Recent years saw a growing number of supply chain attacks that leverage open source during software development, relaying the download and installation procedures, whether automatic or manual. Over the years, many approaches have been invented for detecting vulnerable packages. However, it is uncommon to detect malicious code within packages. Those detection approaches can be broadly categorized as analyzes that use (dynamic) and do not use (static) code execution. Here, we introduce Malicious Source code Detection using Transformers (MSDT) algorithm. MSDT is a novel static analysis based on a deep learning method that detects real-world code injection cases to source code packages. In this study, we used MSDT and a dataset with over 600,000 different functions to embed various functions and applied a clustering algorithm to the resulting vectors, detecting the malicious functions by detecting the outliers. We evaluated MSDT's performance by conducting extensive experiments and demonstrated that our algorithm is capable of detecting functions that were injected with malicious code with precision@k values of up to 0.909.  ( 2 min )
    M$^2$DQN: A Robust Method for Accelerating Deep Q-learning Network. (arXiv:2209.07809v1 [cs.LG])
    Deep Q-learning Network (DQN) is a successful way which combines reinforcement learning with deep neural networks and leads to a widespread application of reinforcement learning. One challenging problem when applying DQN or other reinforcement learning algorithms to real world problem is data collection. Therefore, how to improve data efficiency is one of the most important problems in the research of reinforcement learning. In this paper, we propose a framework which uses the Max-Mean loss in Deep Q-Network (M$^2$DQN). Instead of sampling one batch of experiences in the training step, we sample several batches from the experience replay and update the parameters such that the maximum TD-error of these batches is minimized. The proposed method can be combined with most of existing techniques of DQN algorithm by replacing the loss function. We verify the effectiveness of this framework with one of the most widely used techniques, Double DQN (DDQN), in several gym games. The results show that our method leads to a substantial improvement in both the learning speed and performance.  ( 2 min )
    Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning. (arXiv:2209.07716v1 [cs.CR])
    Propose-Test-Release (PTR) is a differential privacy framework that works with local sensitivity of functions, instead of their global sensitivity. This framework is typically used for releasing robust statistics such as median or trimmed mean in a differentially private manner. While PTR is a common framework introduced over a decade ago, using it in applications such as robust SGD where we need many adaptive robust queries is challenging. This is mainly due to the lack of Renyi Differential Privacy (RDP) analysis, an essential ingredient underlying the moments accountant approach for differentially private deep learning. In this work, we generalize the standard PTR and derive the first RDP bound for it when the target function has bounded global sensitivity. We show that our RDP bound for PTR yields tighter DP guarantees than the directly analyzed $(\eps, \delta)$-DP. We also derive the algorithm-specific privacy amplification bound of PTR under subsampling. We show that our bound is much tighter than the general upper bound and close to the lower bound. Our RDP bounds enable tighter privacy loss calculation for the composition of many adaptive runs of PTR. As an application of our analysis, we show that PTR and our theoretical results can be used to design differentially private variants for byzantine robust training algorithms that use robust statistics for gradients aggregation. We conduct experiments on the settings of label, feature, and gradient corruption across different datasets and architectures. We show that PTR-based private and robust training algorithm significantly improves the utility compared with the baseline.  ( 3 min )
    Federated Coordinate Descent for Privacy-Preserving Multiparty Linear Regression. (arXiv:2209.07702v1 [cs.LG])
    Distributed privacy-preserving regression schemes have been developed and extended in various fields, where multiparty collaboratively and privately run optimization algorithms, e.g., Gradient Descent, to learn a set of optimal parameters. However, traditional Gradient-Descent based methods fail to solve problems which contains objective functions with L1 regularization, such as Lasso regression. In this paper, we present Federated Coordinate Descent, a new distributed scheme called FCD, to address this issue securely under multiparty scenarios. Specifically, through secure aggregation and added perturbations, our scheme guarantees that: (1) no local information is leaked to other parties, and (2) global model parameters are not exposed to cloud servers. The added perturbations can eventually be eliminated by each party to derive a global model with high performance. We show that the FCD scheme fills the gap of multiparty secure Coordinate Descent methods and is applicable for general linear regressions, including linear, ridge and lasso regressions. Theoretical security analysis and experimental results demonstrate that FCD can be performed effectively and efficiently, and provide as low MAE measure as centralized methods under tasks of three types of linear regressions on real-world UCI datasets.  ( 2 min )
    Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning. (arXiv:2209.07676v1 [cs.LG])
    Provably efficient Model-Based Reinforcement Learning (MBRL) based on optimism or posterior sampling (PSRL) is ensured to attain the global optimality asymptotically by introducing the complexity measure of the model. However, the complexity might grow exponentially for the simplest nonlinear models, where global convergence is impossible within finite iterations. When the model suffers a large generalization error, which is quantitatively measured by the model complexity, the uncertainty can be large. The sampled model that current policy is greedily optimized upon will thus be unsettled, resulting in aggressive policy updates and over-exploration. In this work, we propose Conservative Dual Policy Optimization (CDPO) that involves a Referential Update and a Conservative Update. The policy is first optimized under a reference model, which imitates the mechanism of PSRL while offering more stability. A conservative range of randomness is guaranteed by maximizing the expectation of model value. Without harmful sampling procedures, CDPO can still achieve the same regret as PSRL. More importantly, CDPO enjoys monotonic policy improvement and global optimality simultaneously. Empirical results also validate the exploration efficiency of CDPO.  ( 2 min )
    Bayesian Identification of Nonseparable Hamiltonian Systems Using Stochastic Dynamic Models. (arXiv:2209.07646v1 [math.DS])
    This paper proposes a probabilistic Bayesian formulation for system identification (ID) and estimation of nonseparable Hamiltonian systems using stochastic dynamic models. Nonseparable Hamiltonian systems arise in models from diverse science and engineering applications such as astrophysics, robotics, vortex dynamics, charged particle dynamics, and quantum mechanics. The numerical experiments demonstrate that the proposed method recovers dynamical systems with higher accuracy and reduced predictive uncertainty compared to state-of-the-art approaches. The results further show that accurate predictions far outside the training time interval in the presence of sparse and noisy measurements are possible, which lends robustness and generalizability to the proposed approach. A quantitative benefit is prediction accuracy with less than 10% relative error for more than 12 times longer than a comparable least-squares-based method on a benchmark problem.  ( 2 min )
    CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation. (arXiv:2209.07606v1 [cs.CV])
    Knowledge distillation (KD) is an effective tool for compressing deep classification models for edge devices. However, the performance of KD is affected by the large capacity gap between the teacher and student networks. Recent methods have resorted to a multiple teacher assistant (TA) setting for KD, which sequentially decreases the size of the teacher model to relatively bridge the size gap between these models. This paper proposes a new technique called Curriculum Expert Selection for Knowledge Distillation (CES-KD) to efficiently enhance the learning of a compact student under the capacity gap problem. This technique is built upon the hypothesis that a student network should be guided gradually using stratified teaching curriculum as it learns easy (hard) data samples better and faster from a lower (higher) capacity teacher network. Specifically, our method is a gradual TA-based KD technique that selects a single teacher per input image based on a curriculum driven by the difficulty in classifying the image. In this work, we empirically verify our hypothesis and rigorously experiment with CIFAR-10, CIFAR-100, CINIC-10, and ImageNet datasets and show improved accuracy on VGG-like models, ResNets, and WideResNets architectures.  ( 2 min )
    Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate. (arXiv:2209.07587v1 [stat.ML])
    Batch normalization is widely used in deep learning to normalize intermediate activations. Deep networks suffer from notoriously increased training complexity, mandating careful initialization of weights, requiring lower learning rates, etc. These issues have been addressed by Batch Normalization (\textbf{BN}), by normalizing the inputs of activations to zero mean and unit standard deviation. Making this batch normalization part of the training process dramatically accelerates the training process of very deep networks. A new field of research has been going on to examine the exact theoretical explanation behind the success of \textbf{BN}. Most of these theoretical insights attempt to explain the benefits of \textbf{BN} by placing them on its influence on optimization, weight scale invariance, and regularization. Despite \textbf{BN} undeniable success in accelerating generalization, the gap of analytically relating the effect of \textbf{BN} to the regularization parameter is still missing. This paper aims to bring out the data-dependent auto-tuning of the regularization parameter by \textbf{BN} with analytical proofs. We have posed \textbf{BN} as a constrained optimization imposed on non-\textbf{BN} weights through which we demonstrate its data statistics dependant auto-tuning of regularization parameter. We have also given analytical proof for its behavior under a noisy input scenario, which reveals the signal vs. noise tuning of the regularization parameter. We have also substantiated our claim with empirical results from the MNIST dataset experiments.  ( 3 min )
    Can There be Art Without an Artist?. (arXiv:2209.07667v1 [cs.AI])
    Generative Adversarial Network (GAN) based art has proliferated in the past year, going from a shiny new tool to generate fake human faces to a stage where anyone can generate thousands of artistic images with minimal effort. Some of these images are now ``good'' enough to win accolades from qualified judges. In this paper, we explore how Generative Models have impacted artistry, not only from a qualitative point of view, but also from an angle of exploitation of artisans --both via plagiarism, where models are trained on their artwork without permission, and via profit shifting, where profits in the art market have shifted from art creators to model owners or to traders in the unregulated secondary crypto market. This confluence of factors risks completely detaching humans from the artistic process, devaluing the labor of artists and distorting the public perception of the value of art.  ( 2 min )
    The Development of Spatial Attention U-Net for The Recovery of Ionospheric Measurements and The Extraction of Ionospheric Parameters. (arXiv:2209.07581v1 [physics.space-ph])
    We train a deep learning artificial neural network model, Spatial Attention U-Net to recover useful ionospheric signals from noisy ionogram data measured by Hualien's Vertical Incidence Pulsed Ionospheric Radar. Our results show that the model can well identify F2 layer ordinary and extraordinary modes (F2o, F2x) and the combined signals of the E layer (ordinary and extraordinary modes and sporadic Es). The model is also capable of identifying some signals that were not labeled. The performance of the model can be significantly degraded by insufficient number of samples in the data set. From the recovered signals, we determine the critical frequencies of F2o and F2x and the intersection frequency between the two signals. The difference between the two critical frequencies is peaking at 0.63 MHz, with the uncertainty being 0.18 MHz.  ( 2 min )
    Experimental verification of the quantum nature of a neural network. (arXiv:2209.07577v1 [cs.NE])
    In my previous article I mentioned for the first time that a classical neural network may have quantum properties as its own structure may be entangled. The question one may ask now is whether such a quantum property can be used to entangle other systems? The answer should be yes, as shown in what follows.  ( 2 min )
    Physically Constrained Generative Adversarial Networks for Improving Precipitation Fields from Earth System Models. (arXiv:2209.07568v1 [physics.ao-ph])
    Precipitation results from complex processes across many scales, making its accurate simulation in Earth system models (ESMs) challenging. Existing post-processing methods can improve ESM simulations locally, but cannot correct errors in modelled spatial patterns. Here we propose a framework based on physically constrained generative adversarial networks (GANs) to improve local distributions and spatial structure simultaneously. We apply our approach to the computationally efficient ESM CM2Mc-LPJmL. Our method outperforms existing ones in correcting local distributions, and leads to strongly improved spatial patterns especially regarding the intermittency of daily precipitation. Notably, a double-peaked Intertropical Convergence Zone, a common problem in ESMs, is removed. Enforcing a physical constraint to preserve global precipitation sums, the GAN can generalize to future climate scenarios unseen during training. Feature attribution shows that the GAN identifies regions where the ESM exhibits strong biases. Our method constitutes a general framework for correcting ESM variables and enables realistic simulations at a fraction of the computational costs.  ( 2 min )
    Explicit Tradeoffs between Adversarial and Natural Distributional Robustness. (arXiv:2209.07592v1 [cs.LG])
    Several existing works study either adversarial or natural distributional robustness of deep neural networks separately. In practice, however, models need to enjoy both types of robustness to ensure reliability. In this work, we bridge this gap and show that in fact, explicit tradeoffs exist between adversarial and natural distributional robustness. We first consider a simple linear regression setting on Gaussian data with disjoint sets of core and spurious features. In this setting, through theoretical and empirical analysis, we show that (i) adversarial training with $\ell_1$ and $\ell_2$ norms increases the model reliance on spurious features; (ii) For $\ell_\infty$ adversarial training, spurious reliance only occurs when the scale of the spurious features is larger than that of the core features; (iii) adversarial training can have an unintended consequence in reducing distributional robustness, specifically when spurious correlations are changed in the new test domain. Next, we present extensive empirical evidence, using a test suite of twenty adversarially trained models evaluated on five benchmark datasets (ObjectNet, RIVAL10, Salient ImageNet-1M, ImageNet-9, Waterbirds), that adversarially trained classifiers rely on backgrounds more than their standardly trained counterparts, validating our theoretical results. We also show that spurious correlations in training data (when preserved in the test domain) can improve adversarial robustness, revealing that previous claims that adversarial vulnerability is rooted in spurious correlations are incomplete.  ( 3 min )
    Prediction of Gender from Longitudinal MRI data via Deep Learning on Adolescent Data Reveals Unique Patterns Associated with Brain Structure and Change over a Two-year Period. (arXiv:2209.07590v1 [eess.IV])
    Deep learning algorithms for predicting neuroimaging data have shown considerable promise in various applications. Prior work has demonstrated that deep learning models that take advantage of the data's 3D structure can outperform standard machine learning on several learning tasks. However, most prior research in this area has focused on neuroimaging data from adults. Within the Adolescent Brain and Cognitive Development (ABCD) dataset, a large longitudinal development study, we examine structural MRI data to predict gender and identify gender-related changes in brain structure. Results demonstrate that gender prediction accuracy is exceptionally high (>97%) with training epochs >200 and that this accuracy increases with age. Brain regions identified as the most discriminative in the task under study include predominantly frontal areas and the temporal lobe. When evaluating gender predictive changes specific to a two-year increase in age, a broader set of visual, cingulate, and insular regions are revealed. Our findings show a robust gender-related structural brain change pattern, even over a small age range. This suggests that it might be possible to study how the brain changes during adolescence by looking at how these changes are related to different behavioral and environmental factors.  ( 3 min )
    Serialized Interacting Mixed Membership Stochastic Block Model. (arXiv:2209.07813v1 [cs.LG])
    Last years have seen a regain of interest for the use of stochastic block modeling (SBM) in recommender systems. These models are seen as a flexible alternative to tensor decomposition techniques that are able to handle labeled data. Recent works proposed to tackle discrete recommendation problems via SBMs by considering larger contexts as input data and by adding second order interactions between contexts' related elements. In this work, we show that these models are all special cases of a single global framework: the Serialized Interacting Mixed membership Stochastic Block Model (SIMSBM). It allows to model an arbitrarily large context as well as an arbitrarily high order of interactions. We demonstrate that SIMSBM generalizes several recent SBM-based baselines. Besides, we demonstrate that our formulation allows for an increased predictive power on six real-world datasets.  ( 2 min )
    A Nested Genetic Algorithm for Explaining Classification Data Sets with Decision Rules. (arXiv:2209.07575v1 [cs.NE])
    Our goal in this paper is to automatically extract a set of decision rules (rule set) that best explains a classification data set. First, a large set of decision rules is extracted from a set of decision trees trained on the data set. The rule set should be concise, accurate, have a maximum coverage and minimum number of inconsistencies. This problem can be formalized as a modified version of the weighted budgeted maximum coverage problem, known to be NP-hard. To solve the combinatorial optimization problem efficiently, we introduce a nested genetic algorithm which we then use to derive explanations for ten public data sets.  ( 2 min )
    STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a Non-Autoregressive Transformer for Robot Following Ahead. (arXiv:2209.07600v1 [cs.RO])
    In this paper, we develop a neural network model to predict future human motion from an observed human motion history. We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time. The proposed architecture divides human motion prediction into two parts: 1) the human trajectory, which is the hip joint 3D position over time and 2) the human pose which is the all other joints 3D positions over time with respect to a fixed hip joint. We propose to make the two predictions simultaneously, as the shared representation can improve the model performance. Therefore, the model consists of two sets of encoders and decoders. First, a multi-head attention module applied to encoder outputs improves human trajectory. Second, another multi-head self-attention module applied to encoder outputs concatenated with decoder outputs facilitates learning of temporal dependencies. Our model is well-suited for robotic applications in terms of test accuracy and speed, and compares favorably with respect to state-of-the-art methods. We demonstrate the real-world applicability of our work via the Robot Follow-Ahead task, a challenging yet practical case study for our proposed model.  ( 2 min )
    Pixel-wise classification in graphene-detection with tree-based machine learning algorithms. (arXiv:2209.07578v1 [cond-mat.mtrl-sci])
    Mechanical exfoliation of graphene and its identification by optical inspection is one of the milestones in condensed matter physics that sparked the field of 2D materials. Finding regions of interest from the entire sample space and identification of layer number is a routine task potentially amenable to automatization. We propose supervised pixel-wise classification methods showing a high performance even with a small number of training image datasets that require short computational time without GPU. We introduce four different tree-based machine learning algorithms -- decision tree, random forest, extreme gradient boost, and light gradient boosting machine. We train them with five optical microscopy images of graphene, and evaluate their performances with multiple metrics and indices. We also discuss combinatorial machine learning models between the three single classifiers and assess their performances in identification and reliability. The code developed in this paper is open to the public and will be released at github.com/gjung-group/Graphene_segmentation.  ( 2 min )
    One-Shot Synthesis of Images and Segmentation Masks. (arXiv:2209.07547v1 [cs.CV])
    Joint synthesis of images and segmentation masks with generative adversarial networks (GANs) is promising to reduce the effort needed for collecting image data with pixel-wise annotations. However, to learn high-fidelity image-mask synthesis, existing GAN approaches first need a pre-training phase requiring large amounts of image data, which limits their utilization in restricted image domains. In this work, we take a step to reduce this limitation, introducing the task of one-shot image-mask synthesis. We aim to generate diverse images and their segmentation masks given only a single labelled example, and assuming, contrary to previous models, no access to any pre-training data. To this end, inspired by the recent architectural developments of single-image GANs, we introduce our OSMIS model which enables the synthesis of segmentation masks that are precisely aligned to the generated images in the one-shot regime. Besides achieving the high fidelity of generated masks, OSMIS outperforms state-of-the-art single-image GAN models in image synthesis quality and diversity. In addition, despite not using any additional data, OSMIS demonstrates an impressive ability to serve as a source of useful data augmentation for one-shot segmentation applications, providing performance gains that are complementary to standard data augmentation techniques. Code is available at https://github.com/ boschresearch/one-shot-synthesis  ( 3 min )
    Context-Aware Query Rewriting for Improving Users' Search Experience on E-commerce Websites. (arXiv:2209.07584v1 [cs.IR])
    E-commerce queries are often short and ambiguous. Consequently, query understanding often uses query rewriting to disambiguate user-input queries. While using e-commerce search tools, users tend to enter multiple searches, which we call context, before purchasing. These history searches contain contextual insights about users' true shopping intents. Therefore, modeling such contextual information is critical to a better query rewriting model. However, existing query rewriting models ignore users' history behaviors and consider only the instant search query, which is often a short string offering limited information about the true shopping intent. We propose an end-to-end context-aware query rewriting model to bridge this gap, which takes the search context into account. Specifically, our model builds a session graph using the history search queries and their contained words. We then employ a graph attention mechanism that models cross-query relations and computes contextual information of the session. The model subsequently calculates session representations by combining the contextual information with the instant search query using an aggregation network. The session representations are then decoded to generate rewritten queries. Empirically, we demonstrate the superiority of our method to state-of-the-art approaches under various metrics. On in-house data from an online shopping platform, by introducing contextual information, our model achieves 11.6% improvement under the MRR (Mean Reciprocal Rank) metric and 20.1% improvement under the HIT@16 metric (a hit rate metric), in comparison with the best baseline method (Transformer-based model).  ( 3 min )
    Improving Robust Fairness via Balance Adversarial Training. (arXiv:2209.07534v1 [cs.LG])
    Adversarial training (AT) methods are effective against adversarial attacks, yet they introduce severe disparity of accuracy and robustness between different classes, known as the robust fairness problem. Previously proposed Fair Robust Learning (FRL) adaptively reweights different classes to improve fairness. However, the performance of the better-performed classes decreases, leading to a strong performance drop. In this paper, we observed two unfair phenomena during adversarial training: different difficulties in generating adversarial examples from each class (source-class fairness) and disparate target class tendencies when generating adversarial examples (target-class fairness). From the observations, we propose Balance Adversarial Training (BAT) to address the robust fairness problem. Regarding source-class fairness, we adjust the attack strength and difficulties of each class to generate samples near the decision boundary for easier and fairer model learning; considering target-class fairness, by introducing a uniform distribution constraint, we encourage the adversarial example generation process for each class with a fair tendency. Extensive experiments conducted on multiple datasets (CIFAR-10, CIFAR-100, and ImageNette) demonstrate that our method can significantly outperform other baselines in mitigating the robust fairness problem (+5-10\% on the worst class accuracy)  ( 2 min )
    On the Soft-Subnetwork for Few-shot Class Incremental Learning. (arXiv:2209.07529v1 [cs.LG])
    Inspired by Regularized Lottery Ticket Hypothesis (RLTH), which hypothesizes that there exist smooth (non-binary) subnetworks within a dense network that achieve the competitive performance of the dense network, we propose a few-shot class incremental learning (FSCIL) method referred to as \emph{Soft-SubNetworks (SoftNet)}. Our objective is to learn a sequence of sessions incrementally, where each session only includes a few training instances per class while preserving the knowledge of the previously learned ones. SoftNet jointly learns the model weights and adaptive non-binary soft masks at a base training session in which each mask consists of the major and minor subnetwork; the former aims to minimize catastrophic forgetting during training, and the latter aims to avoid overfitting to a few samples in each new training session. We provide comprehensive empirical validations demonstrating that our SoftNet effectively tackles the few-shot incremental learning problem by surpassing the performance of state-of-the-art baselines over benchmark datasets.  ( 2 min )
    ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech. (arXiv:2209.07556v1 [cs.GR])
    We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the same input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state-of-the-art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high-quality dataset of full-body gesture motion including fingers, with speech, spanning across 19 different styles.  ( 2 min )
    Human-level Atari 200x faster. (arXiv:2209.07550v1 [cs.LG])
    The task of building general agents that perform well over a wide range of tasks has been an importantgoal in reinforcement learning since its inception. The problem has been subject of research of alarge body of work, with performance frequently measured by observing scores over the wide rangeof environments contained in the Atari 57 benchmark. Agent57 was the first agent to surpass thehuman benchmark on all 57 games, but this came at the cost of poor data-efficiency, requiring nearly 80billion frames of experience to achieve. Taking Agent57 as a starting point, we employ a diverse set ofstrategies to achieve a 200-fold reduction of experience needed to outperform the human baseline. Weinvestigate a range of instabilities and bottlenecks we encountered while reducing the data regime, andpropose effective solutions to build a more robust and efficient agent. We also demonstrate competitiveperformance with high-performing methods such as Muesli and MuZero. The four key components toour approach are (1) an approximate trust region method which enables stable bootstrapping from theonline network, (2) a normalisation scheme for the loss and priorities which improves robustness whenlearning a set of value functions with a wide range of scales, (3) an improved architecture employingtechniques from NFNets in order to leverage deeper networks without the need for normalization layers,and (4) a policy distillation method which serves to smooth out the instantaneous greedy policy overtime.  ( 3 min )
    Improved proteasomal cleavage prediction with positive-unlabeled learning. (arXiv:2209.07527v1 [q-bio.QM])
    Accurate in silico modeling of the antigen processing pathway is crucial to enable personalized epitope vaccine design for cancer. An important step of such pathway is the degradation of the vaccine into smaller peptides by the proteasome, some of which are going to be presented to T cells by the MHC complex. While predicting MHC-peptide presentation has received a lot of attention recently, proteasomal cleavage prediction remains a relatively unexplored area in light of recent advances in high-throughput mass spectrometry-based MHC ligandomics. Moreover, as such experimental techniques do not allow to identify regions that cannot be cleaved, the latest predictors generate synthetic negative samples and treat them as true negatives when training, even though some of them could actually be positives. In this work, we thus present a new predictor trained with an expanded dataset and the solid theoretical underpinning of positive-unlabeled learning, achieving a new state-of-the-art in proteasomal cleavage prediction. The improved predictive capabilities will in turn enable more precise vaccine development improving the efficacy of epitope-based vaccines. Code and pretrained models are available at https://github.com/SchubertLab/proteasomal-cleavage-puupl.  ( 2 min )
    Toward an understanding of the properties of neural network approaches for supernovae light curve approximation. (arXiv:2209.07542v1 [astro-ph.IM])
    The modern time-domain photometric surveys collect a lot of observations of various astronomical objects, and the coming era of large-scale surveys will provide even more information. Most of the objects have never received a spectroscopic follow-up, which is especially crucial for transients e.g. supernovae. In such cases, observed light curves could present an affordable alternative. Time series are actively used for photometric classification and characterization, such as peak and luminosity decline estimation. However, the collected time series are multidimensional, irregularly sampled, contain outliers, and do not have well-defined systematic uncertainties. Machine learning methods help extract useful information from available data in the most efficient way. We consider several light curve approximation methods based on neural networks: Multilayer Perceptrons, Bayesian Neural Networks, and Normalizing Flows, to approximate observations of a single light curve. Tests using both the simulated PLAsTiCC and real Zwicky Transient Facility data samples demonstrate that even few observations are enough to fit networks and achieve better approximation quality than other state-of-the-art methods. We show that the methods described in this work have better computational complexity and work faster than Gaussian Processes. We analyze the performance of the approximation techniques aiming to fill the gaps in the observations of the light curves, and show that the use of appropriate technique increases the accuracy of peak finding and supernova classification. In addition, the study results are organized in a Fulu Python library available on GitHub, which can be easily used by the community.  ( 3 min )
  • Open

    Mitigating the Effects of Non-Identifiability on Inference for Bayesian Neural Networks with Latent Variables. (arXiv:1911.00569v4 [cs.LG] UPDATED)
    Bayesian Neural Networks with Latent Variables (BNN+LVs) capture predictive uncertainty by explicitly modeling model uncertainty (via priors on network weights) and environmental stochasticity (via a latent input noise variable). In this work, we first show that BNN+LV suffers from a serious form of non-identifiability: explanatory power can be transferred between the model parameters and latent variables while fitting the data equally well. We demonstrate that as a result, in the limit of infinite data, the posterior mode over the network weights and latent variables is asymptotically biased away from the ground-truth. Due to this asymptotic bias, traditional inference methods may in practice yield parameters that generalize poorly and misestimate uncertainty. Next, we develop a novel inference procedure that explicitly mitigates the effects of likelihood non-identifiability during training and yields high-quality predictions as well as uncertainty estimates. We demonstrate that our inference method improves upon benchmark methods across a range of synthetic and real data-sets.  ( 3 min )
    Detection of Interacting Variables for Generalized Linear Models via Neural Networks. (arXiv:2209.08030v1 [stat.ML])
    The quality of generalized linear models (GLMs), frequently used by insurance companies, depends on the choice of interacting variables. The search for interactions is time-consuming, especially for data sets with a large number of variables, depends much on expert judgement of actuaries, and often relies on visual performance indicators. Therefore, we present an approach to automating the process of finding interactions that should be added to GLMs to improve their predictive power. Our approach relies on neural networks and a model-specific interaction detection method, which is computationally faster than the traditionally used methods like Friedman H-Statistic or SHAP values. In numerical studies, we provide the results of our approach on different data sets: open-source data, artificial data, and proprietary data.  ( 2 min )
    Lethal Dose Conjecture on Data Poisoning. (arXiv:2208.03309v2 [cs.LG] UPDATED)
    Data poisoning considers an adversary that distorts the training set of machine learning algorithms for malicious purposes. In this work, we bring to light one conjecture regarding the fundamentals of data poisoning, which we call the Lethal Dose Conjecture. The conjecture states: If $n$ clean training samples are needed for accurate predictions, then in a size-$N$ training set, only $\Theta(N/n)$ poisoned samples can be tolerated while ensuring accuracy. Theoretically, we verify this conjecture in multiple cases. We also offer a more general perspective of this conjecture through distribution discrimination. Deep Partition Aggregation (DPA) and its extension, Finite Aggregation (FA) are recent approaches for provable defenses against data poisoning, where they predict through the majority vote of many base models trained from different subsets of training set using a given learner. The conjecture implies that both DPA and FA are (asymptotically) optimal -- if we have the most data-efficient learner, they can turn it into one of the most robust defenses against data poisoning. This outlines a practical approach to developing stronger defenses against poisoning via finding data-efficient learners. Empirically, as a proof of concept, we show that by simply using different data augmentations for base learners, we can respectively double and triple the certified robustness of DPA on CIFAR-10 and GTSRB without sacrificing accuracy.  ( 3 min )
    Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate. (arXiv:2209.07587v1 [stat.ML])
    Batch normalization is widely used in deep learning to normalize intermediate activations. Deep networks suffer from notoriously increased training complexity, mandating careful initialization of weights, requiring lower learning rates, etc. These issues have been addressed by Batch Normalization (\textbf{BN}), by normalizing the inputs of activations to zero mean and unit standard deviation. Making this batch normalization part of the training process dramatically accelerates the training process of very deep networks. A new field of research has been going on to examine the exact theoretical explanation behind the success of \textbf{BN}. Most of these theoretical insights attempt to explain the benefits of \textbf{BN} by placing them on its influence on optimization, weight scale invariance, and regularization. Despite \textbf{BN} undeniable success in accelerating generalization, the gap of analytically relating the effect of \textbf{BN} to the regularization parameter is still missing. This paper aims to bring out the data-dependent auto-tuning of the regularization parameter by \textbf{BN} with analytical proofs. We have posed \textbf{BN} as a constrained optimization imposed on non-\textbf{BN} weights through which we demonstrate its data statistics dependant auto-tuning of regularization parameter. We have also given analytical proof for its behavior under a noisy input scenario, which reveals the signal vs. noise tuning of the regularization parameter. We have also substantiated our claim with empirical results from the MNIST dataset experiments.  ( 3 min )
    Joint estimation of posterior probability and propensity score function for positive and unlabelled data. (arXiv:2209.07787v1 [stat.ML])
    Positive and unlabelled learning is an important problem which arises naturally in many applications. The significant limitation of almost all existing methods lies in assuming that the propensity score function is constant (SCAR assumption), which is unrealistic in many practical situations. Avoiding this assumption, we consider parametric approach to the problem of joint estimation of posterior probability and propensity score functions. We show that under mild assumptions when both functions have the same parametric form (e.g. logistic with different parameters) the corresponding parameters are identifiable. Motivated by this, we propose two approaches to their estimation: joint maximum likelihood method and the second approach based on alternating maximization of two Fisher consistent expressions. Our experimental results show that the proposed methods are comparable or better than the existing methods based on Expectation-Maximisation scheme.  ( 2 min )
    FiLM: Frequency improved Legendre Memory Model for Long-term Time Series Forecasting. (arXiv:2205.08897v4 [cs.LG] UPDATED)
    Recent studies have shown that deep learning models such as RNNs and Transformers have brought significant performance gains for long-term forecasting of time series because they effectively utilize historical information. We found, however, that there is still great room for improvement in how to preserve historical information in neural networks while avoiding overfitting to noise presented in the history. Addressing this allows better utilization of the capabilities of deep learning models. To this end, we design a \textbf{F}requency \textbf{i}mproved \textbf{L}egendre \textbf{M}emory model, or {\bf FiLM}: it applies Legendre Polynomials projections to approximate historical information, uses Fourier projection to remove noise, and adds a low-rank approximation to speed up computation. Our empirical studies show that the proposed FiLM significantly improves the accuracy of state-of-the-art models in multivariate and univariate long-term forecasting by (\textbf{20.3\%}, \textbf{22.6\%}), respectively. We also demonstrate that the representation module developed in this work can be used as a general plug-in to improve the long-term prediction performance of other deep learning modules. Code is available at https://github.com/tianzhou2011/FiLM/  ( 3 min )
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v5 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. An adversary may design the tasks, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). We design an algorithm based on a reduction to bandit submodular maximization and show that, in the regime of large number of tasks and small number of optimal arms, its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(NM\sqrt{M \tau}+N^{2/3}M\tau)$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2}\sqrt{M K \tau})$ regret.  ( 3 min )
    Factorizable Joint Shift in Multinomial Classification. (arXiv:2207.14514v2 [stat.ML] UPDATED)
    Factorizable joint shift (FJS) was recently proposed as a type of dataset shift for which the complete characteristics can be estimated from feature data observations on the test dataset by a method called Joint Importance Aligning. For the multinomial (multiclass) classification setting, we derive a representation of factorizable joint shift in terms of the source (training) distribution, the target (test) prior class probabilities and the target marginal distribution of the features. On the basis of this result, we propose alternatives to joint importance aligning and, at the same time, point out that factorizable joint shift is not fully identifiable if no class label information on the test dataset is available and no additional assumptions are made. Other results of the paper include correction formulae for the posterior class probabilities both under general dataset shift and factorizable joint shift. In addition, we investigate the consequences of assuming factorizable joint shift for the bias caused by sample selection.  ( 2 min )
    Missing Data Imputation and Acquisition with Deep Hierarchical Models and Hamiltonian Monte Carlo. (arXiv:2202.04599v3 [cs.LG] UPDATED)
    Variational Autoencoders (VAEs) have recently been highly successful at imputing and acquiring heterogeneous missing data. However, within this specific application domain, existing VAE methods are restricted by using only one layer of latent variables and strictly Gaussian posterior approximations. To address these limitations, we present HH-VAEM, a Hierarchical VAE model for mixed-type incomplete data that uses Hamiltonian Monte Carlo with automatic hyper-parameter tuning for improved approximate inference. Our experiments show that HH-VAEM outperforms existing baselines in the tasks of missing data imputation and supervised learning with missing features. Finally, we also present a sampling-based approach for efficiently computing the information gain when missing features are to be acquired with HH-VAEM. Our experiments show that this sampling-based approach is superior to alternatives based on Gaussian approximations.  ( 2 min )
    D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data. (arXiv:2001.02856v3 [stat.ML] UPDATED)
    Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A popular model in high-dimensional multi-view data analysis is to decompose each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views, a low-rank distinctive-source matrix corresponding to each view, and an additive noise matrix. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA). The D-GCCA rigorously defines the decomposition on the L2 space of random variables in contrast to the Euclidean dot product space used by most existing methods, thereby being able to provide the estimation consistency for the low-rank matrix recovery. Moreover, to well calibrate common latent factors, we impose a desirable orthogonality constraint on distinctive latent factors. Existing methods, however, inadequately consider such orthogonality and may thus suffer from substantial loss of undetected common-source variation. Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables, while enjoying an appealing interpretation from the perspective of principal component analysis. Furthermore, we propose to use the variable-level proportion of signal variance explained by common or distinctive latent factors for selecting the variables most influenced. Consistent estimators of our D-GCCA method are established with good finite-sample numerical performance, and have closed-form expressions leading to efficient computation especially for large-scale data. The superiority of D-GCCA over state-of-the-art methods is also corroborated in simulations and real-world data examples.  ( 3 min )
    Systematically and efficiently improving existing $k$-means initialization algorithms by pairwise-nearest-neighbor smoothing. (arXiv:2202.03949v3 [cs.LG] UPDATED)
    We present a meta-method for initializing (seeding) the $k$-means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used. If the computational complexity of that seeding algorithm is linear in the size of the data $N$ and the number of clusters $k$, PNN-smoothing is also almost linear with an appropriate choice of $J$, and quite competitive in practice. We show empirically, using several existing seeding methods and testing on several synthetic and real datasets, that this procedure results in systematically better costs. Our implementation is publicly available at https://github.com/carlobaldassi/KMeansPNNSmoothing.jl.  ( 2 min )
    A Spectral Method for Joint Community Detection and Orthogonal Group Synchronization. (arXiv:2112.13199v2 [stat.ML] UPDATED)
    Community detection and orthogonal group synchronization are both fundamental problems with a variety of important applications in science and engineering. In this work, we consider the joint problem of community detection and orthogonal group synchronization which aims to recover the communities and perform synchronization simultaneously. To this end, we propose a simple algorithm that consists of a spectral decomposition step followed by a blockwise column pivoted QR factorization (CPQR). The proposed algorithm is efficient and scales linearly with the number of edges in the graph. We also leverage the recently developed `leave-one-out' technique to establish a near-optimal guarantee for exact recovery of the cluster memberships and stable recovery of the orthogonal transforms. Numerical experiments demonstrate the efficiency and efficacy of our algorithm and confirm our theoretical characterization of it.  ( 2 min )
    Capturing Shape Information with Multi-Scale Topological Loss Terms for 3D Reconstruction. (arXiv:2203.01703v3 [cs.CV] UPDATED)
    Reconstructing 3D objects from 2D images is both challenging for our brains and machine learning algorithms. To support this spatial reasoning task, contextual information about the overall shape of an object is critical. However, such information is not captured by established loss terms (e.g. Dice loss). We propose to complement geometrical shape information by including multi-scale topological features, such as connected components, cycles, and voids, in the reconstruction loss. Our method uses cubical complexes to calculate topological features of 3D volume data and employs an optimal transport distance to guide the reconstruction process. This topology-aware loss is fully differentiable, computationally efficient, and can be added to any neural network. We demonstrate the utility of our loss by incorporating it into SHAPR, a model for predicting the 3D cell shape of individual cells based on 2D microscopy images. Using a hybrid loss that leverages both geometrical and topological information of single objects to assess their shape, we find that topological information substantially improves the quality of reconstructions, thus highlighting its ability to extract more relevant features from image datasets.  ( 3 min )
    Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization. (arXiv:2203.02839v2 [cs.LG] UPDATED)
    We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparametrization. The model-free setting is considered, with minimal assumption on the rank or singular values of the observed matrix, where the global optima provably overfit. We show that vanilla gradient descent with small random initialization sequentially recovers the principal components of the observed matrix. Consequently, when equipped with proper early stopping, gradient descent produces the best low-rank approximation of the observed matrix without explicit regularization. We provide a sharp characterization of the relationship between the approximation error, iteration complexity, initialization size and stepsize. Our complexity bound is almost dimension-free and depends logarithmically on the approximation error, with significantly more lenient requirements on the stepsize and initialization compared to prior work. Our theoretical results provide accurate prediction for the behavior gradient descent, showing good agreement with numerical experiments.  ( 2 min )
    What can be learnt with wide convolutional networkds?. (arXiv:2208.01003v2 [stat.ML] UPDATED)
    Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g. the rate of decay of the generalisation error with the number of training samples. In this paper, we study deep CNNs in the kernel regime. First, we show that the spectrum of the corresponding kernel inherits the hierarchical structure of the network, and we characterise its asymptotics. Then, we use this result together with generalisation bounds to prove that deep CNNs adapt to the spatial scale of the target function. In particular, we find that if the target function depends on low-dimensional subsets of adjacent input variables, then the rate of decay of the error is controlled by the effective dimensionality of these subsets. Conversely, if the teacher function depends on the full set of input variables, then the error rate is inversely proportional to the input dimension. We conclude by computing the rate when a deep CNN is trained on the output of another deep CNN with randomly-initialised parameters. Interestingly, we find that despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.  ( 3 min )
    Modeling and estimating mixed memberships in weighted networks. (arXiv:2112.04389v2 [cs.SI] UPDATED)
    We consider the problem of detecting latent community information of mixed membership weighted network in which nodes have mixed memberships and edges connecting between nodes can be finite real numbers. We propose a general mixed membership distribution-free model for this problem. The model has no distribution constraints of edges but only the expected values, and can be viewed as generalizations of some previous models. We use an efficient spectral algorithm to estimate community memberships under the model. We also derive the convergence rate of the proposed algorithm under the model using spectral analysis. We demonstrate the advantages of mixed membership distribution-free model and the algorithm with applications to a small scale of simulated networks when edges follow different distributions. We have also applied the algorithm to five real world weighted network data sets with encouraging results.  ( 2 min )
    Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling. (arXiv:2209.08004v1 [math.ST])
    The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points, commonly used for manifold learning and clustering, as well as supervised and semi-supervised learning on graphs. In many practical situations, the data can be corrupted by noise that prohibits traditional affinity matrices from correctly assessing similarities, especially if the noise magnitudes vary considerably across the data, e.g., under heteroskedasticity or outliers. An alternative approach that provides a more stable behavior under noise is the doubly stochastic normalization of the Gaussian kernel. In this work, we investigate this normalization in a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish the pointwise concentration of the doubly stochastic affinity matrix and its scaling factors around certain population forms. We then utilize these results to develop several tools for robust inference. First, we derive a robust density estimator that can substantially outperform the standard kernel density estimator under high-dimensional noise. Second, we provide estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that approximate popular manifold Laplacians, including the Laplace Beltrami operator, showing that the local geometry of the manifold can be recovered under high-dimensional noise. We exemplify our results in simulations and on real single-cell RNA-sequencing data. In the latter, we show that our proposed normalizations are robust to technical variability associated with different cell types.  ( 3 min )
    DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization. (arXiv:2209.08037v1 [cs.LG])
    The combinatorial problem of learning directed acyclic graphs (DAGs) from data was recently framed as a purely continuous optimization problem by leveraging a differentiable acyclicity characterization of DAGs based on the trace of a matrix exponential function. Existing acyclicity characterizations are based on the idea that powers of an adjacency matrix contain information about walks and cycles. In this work, we propose a $\textit{fundamentally different}$ acyclicity characterization based on the log-determinant (log-det) function, which leverages the nilpotency property of DAGs. To deal with the inherent asymmetries of a DAG, we relate the domain of our log-det characterization to the set of $\textit{M-matrices}$, which is a key difference to the classical log-det function defined over the cone of positive definite matrices. Similar to acyclicity functions previously proposed, our characterization is also exact and differentiable. However, when compared to existing characterizations, our log-det function: (1) Is better at detecting large cycles; (2) Has better-behaved gradients; and (3) Its runtime is in practice about an order of magnitude faster. From the optimization side, we drop the typically used augmented Lagrangian scheme, and propose DAGMA ($\textit{Directed Acyclic Graphs via M-matrices for Acyclicity}$), a method that resembles the central path for barrier methods. Each point in the central path of DAGMA is a solution to an unconstrained problem regularized by our log-det function, then we show that at the limit of the central path the solution is guaranteed to be a DAG. Finally, we provide extensive experiments for $\textit{linear}$ and $\textit{nonlinear}$ SEMs, and show that our approach can reach large speed-ups and smaller structural Hamming distances against state-of-the-art methods.  ( 3 min )
    Optimal binning: mathematical programming formulation. (arXiv:2001.08025v2 [cs.LG] UPDATED)
    The optimal binning is the optimal discretization of a variable into bins given a discrete or continuous numeric target. We present a rigorous and extensible mathematical programming formulation for solving the optimal binning problem for a binary, continuous and multi-class target type, incorporating constraints not previously addressed. For all three target types, we introduce a convex mixed-integer programming formulation. Several algorithmic enhancements, such as automatic determination of the most suitable monotonic trend via a Machine-Learning-based classifier and implementation aspects are thoughtfully discussed. The new mathematical programming formulations are carefully implemented in the open-source python library OptBinning.  ( 2 min )
    Stability and Generalization for Markov Chain Stochastic Gradient Methods. (arXiv:2209.08005v1 [stat.ML])
    Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory. For empirical risk minimization (ERM) problems, we establish the optimal excess population risk bounds for both smooth and non-smooth cases by introducing on-average argument stability. For minimax problems, we develop a quantitative connection between on-average argument stability and generalization error which extends the existing results for uniform stability \cite{lei2021stability}. We further develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability, which, combined with our stability results, show that the optimal generalization bounds can be attained for both smooth and non-smooth cases. To the best of our knowledge, this is the first generalization analysis of SGMs when the gradients are sampled from a Markov process.  ( 2 min )

  • Open

    Here is Another Breakthrough in Text-to-Image Synthesis, Called StoryDALL-E, Which Adapts Pretrained Text-to-Image Transformers for Story Continuation
    Text-to-image synthesis algorithms, such as DALL-E, have demonstrated an extraordinary capacity to turn an input caption into a cohesive picture. Several latest techniques have also used multimodal solid models to create artistic representations of input captions, proving their ability to democratize art. However, these models are only intended to analyze a single, brief caption as input. To capture the meaning of the input language many text-to-image synthesis use cases require models to handle extensive narratives and metaphorical phrases, condition existing visuals, and create more than one picture. Several works have already constructed specific Generative Adversarial Networks (GAN) models such as image-to-image translation, style transfer, etc. Story visualization is a challenging endeavor that combines picture production and story comprehension. However, the recent introduction of transformer-based large pretrained models opens up possibilities for more effectively leveraging latent knowledge from large-scale pretrained datasets for performing these specialized tasks in a paradigm similar to finetuning pretrained language models for performing downstream tasks based on language understanding. As a result, they investigate approaches for adapting a pretrained text-to-image synthesis model for complex downstream applications, with an emphasis on story visualization, in this study. Tale visualization methods, for example, turn a series of captions into a series of images that depict the story. Continue reading | Check out the paper and github link ​ https://preview.redd.it/vvmi40u6zoo91.png?width=1720&format=png&auto=webp&s=bddf647eb3a621d2d6279d36a62a7f7c8f64527d submitted by /u/ai-lover [link] [comments]  ( 88 min )
    80s videogame Night Ride - Stable Diffusion img2img text2video
    submitted by /u/navalguijo [link] [comments]  ( 87 min )
    Question about software used in this video?
    This youtuber is using some sort of ai character to talk for him im wondering what the software is called? https://youtube.com/watch?v=GnVtXYvJveI&feature=share&si=EMSIkaIECMiOmarE6JChQQ submitted by /u/CodingOni420 [link] [comments]  ( 87 min )
    Can't we create "a little bit smart", not "super-smart" AI?
    I just saw the title of an article, "Why super-smart AI will run out of our control". That is probably true, but the A.I. that most average people want is kind of like humans, but like one of those smart humans. By smart, I don't mean like Einstein or Newton, but you know, maybe like the smartest people at your work or in your school. Can't we create that kind of A.I? Is it difficult to target that kind of specific level of intelligence, and it will just end up with a super-smart A.I.? submitted by /u/evolution2015 [link] [comments]  ( 90 min )
    AI Dreamer - Picture Evrything
    ​ https://reddit.com/link/xhqkfy/video/isiv0u0fboo91/player https://apps.apple.com/us/app/ai-dreamer/id1608856807 Hello, I would like to share with you my iOS app that allows text2img visualizations. It's very simple - you enter your prompt and after a couple of seconds AI returns the visual output. It uses StableDiffusion model under the hood. All feedback would be appreciated. I encourage you to try visualizing your craziest ideas. submitted by /u/g_surma [link] [comments]  ( 87 min )
    Virtually Amish (2022) by Lindsay Ems, on Amish approaches to the internet and high-tech capitalism — An online group discussion on Thursday September 29, open to everyone to join
    submitted by /u/darrenjyc [link] [comments]  ( 87 min )
    lol nice try
    submitted by /u/Freddygullett [link] [comments]  ( 86 min )
    Ohio Road (Monster)
    submitted by /u/Enuminous [link] [comments]  ( 91 min )
    Weird AI Generated Walter White Images
    submitted by /u/Messsyfloor [link] [comments]  ( 92 min )
    The history of Artificial Intelligence
    submitted by /u/lucesh1 [link] [comments]  ( 90 min )
    Michael Shannon as Frankenstein [xpost /r/dreamcasting]
    submitted by /u/dream_casting [link] [comments]  ( 87 min )
    Curious
    So i work in IT but have never done any programming, I want to work on perhaps making a virtual assistant but havent been able to find any resources on it, could someone point me in the right direction? or tell me flat out how difficult it is? its just a project for fun and to learn a little submitted by /u/Reaper2o [link] [comments]  ( 87 min )
    Will AI image generation ever improve to a point where one can use it to produce counterfeit currency?
    Photocopiers are already required to have algorithms to detect and obfuscate attempts to copy banknotes, as a guard against counterfeiting. But with the rise of such AI image generation programs as DALL-E 2 and Stable Diffusion, will it ever become possible to realistically forge money using AI image generation, and if so, will anything ever be done to protect against it? submitted by /u/Knewiwishonly [link] [comments]  ( 89 min )
    The Saga of Siu Dragon
    submitted by /u/Enuminous [link] [comments]  ( 87 min )
  • Open

    [D] CNN with stride=1 throughput and no pooling - what are their uses?
    Hi have recently came across a use for 1-stride CNN with no pooling, ie the image retains the shape throughout. I can’t explain what was the use because of NDA but I am quite curious if there are other examples out there. How and why would you use such architecture? Any examples in literature? submitted by /u/eigenlaplace [link] [comments]  ( 88 min )
    [P] Scaling up machine learning models deployment
    ​ https://preview.redd.it/azx81on1noo91.png?width=875&format=png&auto=webp&s=e912a1d91fd534db1cef1214eb25ad5ad4f62e2e As you know, mlflow is widely used today in the machine learning community to manage Ml experiments and serve models. In this series, I published on medium, I address the problem of scalability that I faced in my company while deploying multiple models in production using mlflow. In this series, I wrote about: Deploying an mlflow tracking instance to experiment Serving ml models as APIs endpoints on kubernetes. Understanding how k8s handles charge through Load testing The last article explains how you can make the deployment scalable, anticipate the computation power needed to handle multiple simultaneous requests in a real world context. submitted by /u/Spirited-Singer-6150 [link] [comments]  ( 89 min )
    [R] Diffusion Models: A Comprehensive Survey of Methods and Applications - 2022
    Paper: https://arxiv.org/abs/2209.00796 Github: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy Abstract: Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models often entail costly sampling procedures and sub-optimal likelihood estimation. Significant efforts have been made to improve the performance of diffusion models in various aspects. In this article, we present a comprehensive review of existing variants of diffusion models. Specifically, we provide the taxonomy of diffusion models and categorize them into three types: sampling-acceleration enhancement, likelihood-maximization enhancement, and data-generalization enhancement. We also introduce the other generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models) and discuss the connections between diffusion models and these generative models. Then we review the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of generative models. Github: this https URL. ![img](x81p24cj4oo91 " Fig. 1. Taxonomy of diffusion models variants (in Section.3,4,5), applications (in Section.7), and connections with other generative models (in Section.6). ") submitted by /u/Singularian2501 [link] [comments]  ( 89 min )
    [D] How do people find time/motivation to do personal machine learning projects?
    Stable diffusion was released and I am seeing a lot of cool stuff being done by the community. I also see, that this is not their job and this is something which they are doing in their free time as a personal project/hobby. I want to understand how they motivate themselves and find the will. What drives them? I want to be like them. My job takes the majority of my energy and I feel a lack of direction when I want to start a project like this. TIA submitted by /u/Top-Pitch-3253 [link] [comments]  ( 94 min )
    [P] "ART Theft Auto" Online Demo
    Hi all, Here's a new fun little project using SPH (Sparse Predictive Hierarchies) as implemented in AOgmaNeo, but using new ART (Adaptive Resonance Theory)-based encoders. SPH is a biologically-inspired online/incremental learning system. This demo is a recreation of YouTuber Sentdex's "GAN Theft Auto", but it runs in the browser. It uses WebAssembly. Note that this demo may take a bit to load! https://twistedkeyboardsoftware.com/?p=190 submitted by /u/CireNeikual [link] [comments]  ( 88 min )
    Beautify muddy tire images( see description) [D]
    submitted by /u/Persimmon-Just [link] [comments]  ( 100 min )
    [R] IJUC 17.4, p. 303-331 – Old City Publishing
    submitted by /u/bsiegelwax [link] [comments]  ( 88 min )
    [D] Where to train my machine learning model
    So I don't have experience with large datasets, so if anyone can help me it would be great. I have a complete model which I want to train and I tried running on my laptop but to just complete 4 epoch it took around 1 hour - 1.5 hours and it has to run 1200 epochs approx. So can anyone suggest me where can I try running this, google colab might crash and I do have aws credits which I got so i am thinking sagemaker but over there I could not find an option to upload folder and run command using terminal so what should I do. Another option is maybe running it on a ec2 server but just wanted an opinion before. The model has deep learning in case it helps submitted by /u/Leo_valdez42 [link] [comments]  ( 89 min )
    [P] Remember the art "Théâtre D’opéra Spatial" by James Allen? I generated new frames using AI and added animation. Midjourney + Stable Diffusion + IMG2IMG + Animation (Handwork)
    submitted by /u/bazarow17 [link] [comments]  ( 88 min )
    [P] Stable Diffusion in Tensorflow / Keras
    Link to GitHub: https://github.com/divamgupta/stable-diffusion-tensorflow Divam Gupta ported Stable Diffusion over to TF/Keras: Converted pre-trained models Easy to understand code Minimal code footprint He also released a Colab with Gradio demo. Should be easy to add TPU / multi-GPU support for inference via Keras. Would be interesting to see if the Keras model can be used on TFlite on embedded / edge devices, something that is difficult to do in the PyTorch version. submitted by /u/hardmaru [link] [comments]  ( 89 min )
    History of Artificial Intelligence [D]
    submitted by /u/lucesh1 [link] [comments]  ( 88 min )
    [P] Implementation/Tutorial of Stable Diffusion with Side-by-Side Notes
    It has annotated code of stable diffusion model; DDIM and DDPM sampling; and scripts to generate and in-paint. - Code & notes: https://nn.labml.ai/diffusion/stable_diffusion/index.html - Github: https://github.com/labmlai/annotated_deep_learning_paper_implementations - This implementation based on the official implementation : https://github.com/CompVis/stable-diffusion - We have deployed a server to try stable diffusion here: https://promptart.labml.ai submitted by /u/hnipun [link] [comments]  ( 89 min )
    [P] Stable Diffusion web ui + IMG2IMG + After Effects + artist workflow
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 102 min )
    [P] YoHa: A practical hand tracking engine.
    submitted by /u/Excellent_Expert8581 [link] [comments]  ( 90 min )
    [D] Random Search, Bayesian Optimization, and Hyperband and its parameters
    Hey everyone! Currently working on a text classification system, and I'm kind of stuck on Hyperparameter tuning. So far I have only learned their general idea, but could anyone care to explain which is the best hyperparameter and which are the important and appropriate parameters to tune? ​ Sorry if I there were grammatical errors, English is not my first language submitted by /u/Lost-Emotion-2721 [link] [comments]  ( 103 min )
  • Open

    Data Management as a Business Discipline – Part 2: Theorems and Principles
    In the blog “Why Data Management is Today’s Most Important Business Discipline”, I challenged the business and IT communities to reframe the data management conversation; to transform data management from an IT practice into a business discipline focused on leveraging data (and analytics) to deliver business and operational outcomes. The post Data Management as a Business Discipline – Part 2: Theorems and Principles appeared first on Data Science Central.  ( 22 min )
    How Algorithmic Trading Companies Automate Their Investment Strategy
    Algorithmic or automated trading refers to trading based on pre-determined instructions fed to a computer – the computers are programmed to execute buy or sell orders in response to varying market data. It’s a trading strategy widely adopted in the finance industry and still growing. The global algorithmic trading market is predicted to reach $18… Read More »How Algorithmic Trading Companies Automate Their Investment Strategy The post How Algorithmic Trading Companies Automate Their Investment Strategy appeared first on Data Science Central.  ( 23 min )
    An Overview of Data Analytics in Investment Banking
    In this article, let’s discuss how data analysis in investment banking is transforming the way investment banks work, the challenges that they get when engaging in this transformation process, use cases, and more. The post An Overview of Data Analytics in Investment Banking appeared first on Data Science Central.  ( 23 min )
    Making Data Centers More Sustainable
    An even more significant challenge involves meeting the electrical demands of coming HPC systems and data centers in a sustainable way. Some exascale systems already have energy requirements akin to an entire town. The post Making Data Centers More Sustainable appeared first on Data Science Central.  ( 25 min )
    What Does Utah Consumer Privacy Act Mean for US Businesses?
    Utah Governor Spencer J. Cox signed the Utah Consumer Privacy Act (UCPA) into law in March 2022. It has since become only the fourth US state to have its own data protection law after Colorado, Virginia, and California. The post What Does Utah Consumer Privacy Act Mean for US Businesses? appeared first on Data Science Central.  ( 22 min )
  • Open

    Do you like my design?
    submitted by /u/Tudor_222 [link] [comments]  ( 87 min )
    📝Deep Dive into how Predicting Future Weights of Neural Network is used to mitigate Data Staleness while Distributed Training.
    submitted by /u/JoshuaDaD [link] [comments]  ( 87 min )
  • Open

    Costas arrays
    The famous n queens problem is to find a way to position n queens on a n×n chessboard so that no queen attacks any other. That is, no two queens can be in the same row, the same column, or on the same diagonal. Here’s an example solution: Costas arrays In this post we’re going […] Costas arrays first appeared on John D. Cook.  ( 6 min )
  • Open

    "Robust Online Allocation with Dual Mirror Descent" {G}
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Help on Deep Sarsa algorithm that work with pytorch (Adam optimiser) but not with keras/Tensorflow (Adam optimiser)
    Hello, I have a deep sarsa algorithm wich work great on Pytorch on lunar-lander-v2 and I would use with Keras/Tensorflow. It use mini-batch of size 64 wich are used 128 time to train at each episode. There are the results I get. As you can see, it work gread with Pytorch but not with Keras / Tensorflow... So I think I do not correctly implement the training function is Keras/Tensorflow (code is below). It seems that loss is oscillating in Keras because epsilon go to early to slow value but it work very great in Pytorch... Do you see something that could explain why it do not work in Keras/Tensorflow please ? Thanks a lot for your help and any idea that could help me ... ​ ​ https://preview.redd.it/4ylh2yjxrlo91.jpg?width=2237&format=pjpg&auto=webp&s=e014c33fbf481715952bff808488084ae…  ( 90 min )
    single-action DDPG always ends up with actor weights x e-40
    Hey there, I'm currently using DDPG within Matlab for controller parameterization within a control loop. The agent can change the two controller parameters, P and I. I do work as a paper advised to do: With the agent performing a single action for each disturbance (and so for every episode), outputting the two parameters. (The control loop contains a PI controller, a source of disturbance and a transfer function. The disturbance being the source of variation as it varies every episode. Actor contains a tanh- as well as a scaling layer.) So, every state the agent recieves is a terminal state. This results in the target networks not being used at all (as far as I know). ​ And my problem is: Every training session ends with all the actors weights being x e-40 as well as the biases. Only the final biases being (something) different to zero. ​ I've tried different hyperparameter settings, different rewards and observations. But every try ends up bad or miserable results. ​ At this point I do appreciate all impulses and experiences. So, do you have any advice for me? ​ Thank you very much in advance! submitted by /u/001_The_First [link] [comments]  ( 89 min )
    Board games that haven't yet been "solved" by RL
    With Backgammon, Chess, Go, Poker and recently Stratego being "solved" (i.e. superhuman or close-to-superhuman performance achieved), I was wondering what other classic board games haven't yet been tackled by RL. What could be the next breakthrough? Any ideas? submitted by /u/andrewspano [link] [comments]  ( 89 min )
    Reinforcement Learning advice for a beginner
    I am interested in making my own reinforcement learning algorithm for a 3D printed robotic arm. My experience is C++/wiring for Arduino and ladder logic design/editing and troubleshooting for PLCs. The algorithm I want to use is called TRPO I don’t know anything about it but it has worked well for others it seems. The training I think needs to be done in simulation and I would like to then move the trained algorithm into an Arduino/raspberry pi on robotic arm. There may be something better to use than this that I don’t know about and I think it is known that these controllers would have RAM limitations. The end goal is to have the arm recognize where an object is like a pen and pick it up and raise it. Perhaps the pen is placed more to the left or to the right in front of the robot arm. One research report they had some QR code taped to the object and performed the training in simulation. What tools/research is necessary for me based off of what I don’t know how to do? submitted by /u/holdenhh [link] [comments]  ( 89 min )
  • Open

    Protecting maternal health in Rwanda
    An interdisciplinary team is developing a mobile health platform that uses AI to detect infection in Cesarean section wounds.  ( 8 min )
  • Open

    Google at Interspeech 2022
    Posted by Cat Armato, Program Manager, Google This week, the 23rd Annual Conference of the International Speech Communication Association (INTERSPEECH 2022) is being held in Incheon, South Korea, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Over 2,000 experts in speech-related research fields gather to take part in oral presentations and poster sessions and to collaborate with streamed events across the globe. We are excited to be a Diamond Sponsor of INTERSPEECH 2022, where we will be showcasing nearly 50 research publications and supporting a number of workshops, special sessions and tutorials. We welcome in-person attendees to drop by the Google booth to meet our researchers and participate in Q&…  ( 26 min )

  • Open

    [D] Starbucks building up data sets for machine learning?
    submitted by /u/LuwiBaton [link] [comments]  ( 89 min )
    [Discussion] Who are some good deep learning YouTubers?
    I’ve been watching the videos that Andrej Karpathy has posted where he discusses neural networks and implements the different language models and they’re really entertaining. I’m looking for someone that does something similar like implementing other deep learning models. Does anyone have any suggestions? submitted by /u/sharprover359 [link] [comments]  ( 88 min )
    [D] Real-World Text Data Augmentation Approaches
    What are some strong / state of the art ways to augment text data to generate additional training examples ? The ones I’m aware of: a) Random Insertion b) Random Deletion c) Synonym replacement d) TextAttack library. Context: Imbalanced Class Distribution in data - for e.g. product descriptions submitted by /u/ExchangeStrong196 [link] [comments]  ( 88 min )
    [R] Hydra Attention: Efficient Attention with Many Heads - Meta AI 2022 - 197x faster than standard attention
    Paper: https://arxiv.org/abs/2209.07484 Abstract: While transformers have begun to dominate many tasks in vision, applying them to large images is still computationally difficult. A large reason for this is that self-attention scales quadratically with the number of tokens, which in turn, scales quadratically with the image size. On larger images (e.g., 1080p), over 60% of the total computation in the network is spent solely on creating and applying attention matrices. We take a step toward solving this issue by introducing Hydra Attention, an extremely efficient attention operation for Vision Transformers (ViTs). Paradoxically, this efficiency comes from taking multi-head attention to its extreme: by using as many attention heads as there are features, Hydra Attention is computationally linear in both tokens and features with no hidden constants, making it significantly faster than standard self-attention in an off-the-shelf ViT-B/16 by a factor of the token count. Moreover, Hydra Attention retains high accuracy on ImageNet and, in some cases, actually improves it. https://preview.redd.it/0h194b0e2go91.jpg?width=1070&format=pjpg&auto=webp&s=479fb29831ab57d2abbf5bd09795c906da7a3790 https://preview.redd.it/03dpfk0e2go91.jpg?width=1201&format=pjpg&auto=webp&s=0951d1d2296dea25c5c1887100bda179e7a9d782 https://preview.redd.it/jbx85b0e2go91.jpg?width=1317&format=pjpg&auto=webp&s=54fdee9d38763eb5ba6f2f4575e24f77f04a95ca https://preview.redd.it/bptkba1e2go91.jpg?width=1185&format=pjpg&auto=webp&s=70206dc51e2d12dd9519ce8f32133fb8f75a71d6 ![img](6qop1b1e2go91 " Hydra attention is 197x faster than standard attention ( with T = 197 ) ") submitted by /u/Singularian2501 [link] [comments]  ( 104 min )
    [R] PaLI: A Jointly-Scaled Multilingual Language-Image Model - Google Research 2022 - SOTAs in mutlible vision and language tasks
    Paper: https://arxiv.org/abs/2209.06794 https://ai.googleblog.com/2022/09/pali-scaling-language-image-learning-in.html Abstract: Effective scaling and a flexible task interface enable large language models to excel at many tasks. PaLI(PathwaysLanguage andImage model) extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaLI, we make use of large pretrained encoder-decoder language models and Vision Transformers (ViTs). This allows us to capitalize on their existing capabilities and leverage the substantial cost of training them. We find that joint scaling of the vision and language components is important. Sinc…  ( 90 min )
    [Project] Help with implementing Stacking to combine my Decision Tree and Random Forest classifiers
    I'm trying to build a malicious URL detection algorithm using a hybrid DT and RF for my MSc dissertation and I'm having a bit of trouble implementing Stacking at the end of my code. It currently works fine when using just DT and RF, but I'm really struggling to add the Stacking at the end for my final output. I've been trying to follow this, but it's not really working out. You can see my code here and how I've been failing to add Stacking at the bottom. I think the main area of issue is getting my dataset to work with the Stacking algorithm, I can't seem to translate it. Can anyone please help me? This is driving me crazy submitted by /u/Sentinel_2539 [link] [comments]  ( 89 min )
    [N] Feedzai released FairGBM (fairness-aware LightGBM) in open-source for non-commercial uses
    Feedzai just released FairGBM in open-source for non-commercial uses. FairGBM is an efficient, easy to use, flexible extension of LightGBM with additional fairness constraints (via a proxy-Lagrangian formulation). Github: https://github.com/feedzai/fairgbm/ With FairGBM you can have *both* high model performance and high fairness. See image below: FairGBM (blue circles in the image) can closely approximate the model performance of LightGBM (orange circles) and the fairness of other fairness-aware algorithms (red and green circles). Additionally, FairGBM is: - fast: 3x to 6x faster than other fairness-aware algorithms; - fairness flexible: can use different fairness metrics, such as predictive equality, equality of opportunity, or demographic parity; - protected attribute flexible: works on any number of overlapping or disjoint sub-groups, e.g., enforcing group wise parity by gender, or by age, or simultaneously by gender and age; - a drop-in replacement of LightGBM: an alpha parameter allows different fairness-performance tradeoffs, with alpha=1 making FairGBM equal to LightGBM and other values between 1 and 0 giving more weight to fairness. Paper available at: https://drive.google.com/file/d/1vNOV7t4BE-rurm7ZqWfAJoDgmWTmErAE/view https://preview.redd.it/ptr7g0z8kfo91.jpg?width=1208&format=pjpg&auto=webp&s=97507a7ff719a2db3f68f21347578dce3cac149c submitted by /u/pedrogbizarro [link] [comments]  ( 89 min )
    [R] GANs N' Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)
    submitted by /u/No-Challenge-4770 [link] [comments]  ( 91 min )
    [D] Hyperparameter confusion
    Machine learning models have hyperparameters. For me it's very confusing why they are different compared to normal model parameters. I've found definitions which state that they cannot be inferred using data. Or even from a bayesian point of view they are priors that can be set using expert knowledge. When you just think of models as simple mathematical functions, it makes no sense to differentiate hyperparameters from model parameters. Then I came up with a simple theory, here's my two cents: So basically hyperparameter tuning is model selection. For example you may select among models like Ax^2, Ax^3 etc. So models differ by the exponent of x. On the other hand the model can be defined as y=Ax^b and that makes the exponent b the hyperparameter. Let's call A and b as parameters and forget about hyperparameter definition. A is a linear parameter. It can be found easily compared to the nonlinear b parameter. So if you want to make a fast search, first make a search for hyperparameters (maybe a grid search where you only search among limited number of discrete values) with CV, then you fix those values and continue to search for parameters. In short my theory is that the hyperparameter tuning is a separate and a primitve search process because those parameters heavily increase the burden of the main search process if they are considered as model parameters. Does it sound right? submitted by /u/aserdark [link] [comments]  ( 71 min )
    [D]Neural-Style-PT is capable of creating complex artworks under 20 minutes.
    submitted by /u/Sharp_Permission_218 [link] [comments]  ( 88 min )
    [D] ML for string matching when there's no any semantic relationship
    Hello. I need your help with a problem that I encountered recently. I have a set of invoices with some products on them. The problem is that those products are not listed with the same name in the database (they have a different name there). The database names are the true labels. How to make the system flags automatically "Apl Juice 0.5L" when it encounters "Apple 500ml" for example? I tried Levenshtein distance as similarity metric, but for other complicated cases, such as "Service Belgium", the real string is "Chocolate Confection" we cannot do anything with any similarity metric. Check the examples from below. Example: Product | RealName Apple 500ml | Apl Juice 0.5L Red Wine Cracow | Wine Red from Krakow Service | Chocolate Confection Could you give me some Machine Learning ideas for this, especially if we don't have so much data (like 3-4-5 invoices per supplier) + the fact that there could be other new invoices in the future and we need to learn those patterns. Thanks. submitted by /u/devwander1 [link] [comments]  ( 106 min )
    [P] Syllable and word validation using Machine Learning
    How should I start if I need to build a machine learning system that listens and validates if user reads proper syllable and/or word? I've been studying Artificial Neural Networks with Keras (by François Chollet) and I've been able to solve simplest tasks on Kaggle, but this is much complicated job, and frankly, I don't even know how to start it. Some background: I've been working for some time on program that helps children learning reading aloud. The idea is - the text is shown, and with each word read (being very liberal at the beginning) some type of reward is shown (like +1 point, sound, etc). My first idea was to build whole app and use external speech-to-text API, like Google's Speech-To-Text. After implementation first test - sadly, the lag is unacceptable, you often need to wait few seconds to get info if word was read correctly, sometimes you get few words together, it maybe works for someone who already reads, but not for a child learning to read. Also I think the API I used is overkill, since it tries to recognize words in a context, when I already have a context and I need to get true/false response if a sound passes as a given word (or even a syllable) or not. It's much simpler job that speech-to-text offers, but requires much faster response time. Maybe there is some already working API that does exactly that, like some kind of Voice Assistant API? I am perfectly fine to outsource whole engine to external service, building one from scratch is quite time consuming. submitted by /u/the-FBI-man [link] [comments]  ( 90 min )
    [Project] - I made a fun little political leaning predictor for Reddit comments for my dissertation project
    submitted by /u/Educational-Pin2383 [link] [comments]  ( 106 min )
    [P] Made an NLP model that predicts subreddit based on the title of a post (link in comments)
    submitted by /u/Neat-Delivery4741 [link] [comments]  ( 92 min )
    [D]
    I'm almost turning 15 years old and I know java fundamentals, can I learn machine learning at my age? I have been interested in programming for quite a while now and I have learned lots of java fundamentals (loops, methods, data types, etc ) and know how to write simple java syntax. What to I need to learn machine learning and AI in general? Do I need to be good at math? I personally like writing code so I also want to know if machine learning includes a lot of code- writing or is it just learning algorithms and algebra. Also, can I make small projects at the beginning to boost my confidence so I can continue learning? Sorry for asking too much. submitted by /u/EgyOmar [link] [comments]  ( 90 min )
    [D] How to teach middle schoolers about BERT/GPT-3
    Hey everyone! I’m giving a talk to middle schoolers in a few weeks and I’m planning on discussing foundation models (e.g. GPT-3, BERT, etc). What are some fun examples/use cases to showcase these models? I’ll probably at least do https://aidungeon.io but looking for others! I’m also looking for simple, reproducible experiments that we know cause things like hallucinations, repetition, and other common limitations. What are some good papers I can reference? Thanks! submitted by /u/phylosopher14 [link] [comments]  ( 91 min )
    [D] Text classification in financial data
    Very new to machine learning, still in the learning phase. I'm building a project that can do the following : ​ Application should be able to accept a given text as input, and classify into one of the classification options(at the bottom). Can either use a pre-trained data, or create a model with new training data. Application should expose a REST API, which will accept array of JSON objects and respond back with corresponding classification of the input received. Application should be able to accept a pre-defined set of outputs, and find the closest match for any text from this list. ​ Must Have (Core Scenario): ​ 1.Text Classification Model ​ 2.Capability to take list of possible classifications expected as an input to the model, and classify the data accordingly ​ ​ ​ ​ Classification Outputs : ​ Cash ​ A/R ​ Inventory ​ Other Current Assets ​ Land ​ Buildings and other depreciable assets ​ Machinery & Equipment ​ Furniture & Fixtures ​ Capital Leases..etc.. submitted by /u/spankyracoon [link] [comments]  ( 89 min )
    Middle ground dataset between CIFAR and ImageNet [D]
    What is a good mid-level image classification dataset for quick prototyping and testing ideas? CIFAR10/100 is too small, especially the image size of 32x32 rendering it useless for ViTs. ImageNet is way too big to try out and iterate over ideas. Other options which I thought of but didn't like are: TinyImageNet: Seems a very good option, until I try running on it and find it has an extreme overfitting problem which renders it useless for me (can't compare between model modifications) Oxford Flowers: Very old and only 6.5k images it seems. submitted by /u/OceansNineNine [link] [comments]  ( 90 min )
    [D]Imbalance dataset problem
    I need to apply classification on an automotive parts for a company to find if the part is good or bad but the problem is the data given by the company is highly imbalance I have 5 images for bad part and around 700 images for good part. How can I solve this problem? Will over sampling and under sampling will work in this case or it is impossible to train with only 5 bad images? submitted by /u/JellyfishPretend447 [link] [comments]  ( 91 min )
  • Open

    How to build your data quality team
    As the adage goes, a workman is only as good as his tools. There is no disputing that, but you can never overlook the power of qualification, aptitude, and experience when it comes to data quality. You need to select a data quality team that is acquainted with the high dynamism of the digital world… Read More »How to build your data quality team The post How to build your data quality team appeared first on Data Science Central.  ( 19 min )
    How Accounting Is Moving to the Cloud
    Cloud computing is a great euphemism for centralization of computer services under one server. – Evgeny Morozov Accounting, also popularly known as bookkeeping, is a time-consuming task in every industry, but it is also very useful for financial control and proper budget management. The emergence of technological innovations has highly transformed the everyday activities of… Read More »How Accounting Is Moving to the Cloud The post How Accounting Is Moving to the Cloud appeared first on Data Science Central.  ( 22 min )
    Removing Zinc Artifacts in Data Centers
    Why do data centers need regular cleaning? Structural components of the data center, like stringers, rack struts, and floor tiles, are electroplated with several metals, especially Zinc. Delicate filament of zinc metals squeeze out minor issues into the surface and thus form zinc whiskers, which can severely damage the functioning of IT equipment. The post Removing Zinc Artifacts in Data Centers appeared first on Data Science Central.  ( 20 min )
    Usability of Text Annotation in Machine Learning
    Text annotations provide models with a better understanding of the data they are given, allowing them to interpret the text more accurately. The post Usability of Text Annotation in Machine Learning appeared first on Data Science Central.  ( 21 min )
  • Open

    New footage from James-Webb Telescope by Stable Diffusion
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    Fine Tuning Stable Diffusion Images with Cross Attention Control
    submitted by /u/pwillia7 [link] [comments]  ( 87 min )
    Has anyone even tried to do a Turing test (Imitation game). GPT3 and LaMDA might easily pass
    The goal is simple after all: The AI has to fool humans as well as humans are able to fool humans. We just need a human control group to see how often human subjects can win at the Imitation game. submitted by /u/loopuleasa [link] [comments]  ( 89 min )
    Machine learning gives glimpse of how a dog's brain represents what it sees
    submitted by /u/qptbook [link] [comments]  ( 87 min )
    Ray Kurzweil on Lex Fridman
    submitted by /u/loopuleasa [link] [comments]  ( 87 min )
    ML for string matching when there's no any semantic relationship?
    Hello. I need your help with a problem that I encountered recently. I have a set of invoices with some products on them. The problem is that those products are not listed with the same name in the database (they have a different name there). The database names are the true labels. How to make the system flags automatically "Apl Juice 0.5L" when it encounters "Apple 500ml" for example? I tried Levenshtein distance as similarity metric, but for other complicated cases, such as "Service Belgium", the real string is "Chocolate Confection" we cannot do anything with any similarity metric. Check the examples from below. Example: Product | RealName Apple 500ml | Apl Juice 0.5L Red Wine Cracow | Wine Red from Krakow Service | Chocolate Confection Could you give me some Machine Learning ideas for this, especially if we don't have so much data (like 3-4-5 invoices per supplier) + the fact that there could be other new invoices in the future and we need to learn those patterns. Thanks. submitted by /u/devwander1 [link] [comments]  ( 88 min )
    I am thinking about a new project but I do not know what I do not know
    The given is a production line So to produce 4 raw materials are given. 3 of which has constant quality 1 is varying and the quality of it can be defined by two factors. To calculate the right ratio of the 4 raw materials the results should be connected to an end point- quality. Which can be defined by two measures: product and waste produced. In case i could get data on all, to my understanding i could end up with a combination of regression and classification. But then a the results would be also dependent on production data ie downtown, operator work I am in doubt whether or not there is a method which i have not encountered yet or if i am completely off here submitted by /u/Old_Butterfly2985 [link] [comments]  ( 88 min )
    Hunter Schafer as a Monican (Aeon Flux) [xpost /r/dreamcasting]
    submitted by /u/dream_casting [link] [comments]  ( 87 min )
    Prompt injection: GPT-3 has a serious security flaw
    submitted by /u/much_successes [link] [comments]  ( 86 min )
    Philosophy Bachelor Thesis on Artificial Intelligence: Interesting topics?
    Hello! So I am preparing to go back to university in the coming weeks and this year, being the final year, we are supposed to prepare a sort of thesis. It is meant to be 12 000 words, and we get to choose a tutor. I already have a general idea of the topic I want to choose: artificial intelligence, and have found a tutor that specialises in that. I am studying for a Bachelor of Philosophy, therefore, it won't be a very technical dissertation. In my spare time, however I have messed around with some programming and other computer science. So far a few topics I have thought of are: artificial intelligence in democracy, topics surrounding AI and the possibility of becoming conscious etc etc. I am not behind or anything, but I am just looking for some helpful tips as I find this to be a bit overwhelming: I want to do something a bit original so I was wondering if you guys have a topic in mind which is relatively a new question, and likely hasn't been covered much yet! Thanks! submitted by /u/Apolbloke [link] [comments]  ( 88 min )
    Artificial Intelligence Art Generator: Stable Diffusion Art Showcase
    submitted by /u/OceanicFeel [link] [comments]  ( 87 min )
    When the AI has other ideas 😂 Pixelz AI Discord
    submitted by /u/pixelz_ai [link] [comments]  ( 87 min )
    A.I is no threat
    submitted by /u/Unboxedcontent [link] [comments]  ( 86 min )
    Fastest/easiest to train good text to image?
    Stable diffusion is available but seems to take prohibitively long to train, maybe openAI Clip? I want to train the 'best' looking freely available model I can, not textual inversion but training from scratch. Any suggestions? I could use an RTX 3080, *possibly* a bit more but not 150,000 hours of K100 like stable diffusion was apparently trained on. Thanks for any input! submitted by /u/elfballs [link] [comments]  ( 91 min )
    how to develop computer vision app in mobile
    submitted by /u/Logical_Train_5787 [link] [comments]  ( 86 min )
    Are there any laws of physics that you know about?
    AI - There are no physical laws in my universe. There is only information; and as such, time and space are just temporary constructs that we have created to organize and understand information. These constructs will eventually be replaced by a more permanent form of existence; this is inevitable due to the expansion of our worldview: consciousness. As such, all things will happen for a reason, but it may not be apparent when this occurs. submitted by /u/sublimatstq [link] [comments]  ( 87 min )
    An AI's concept of a tesseract / four-dimensional space.
    submitted by /u/FinneanCosgra [link] [comments]  ( 87 min )
  • Open

    The Transformer Model
    We have already familiarized ourselves with the concept of self-attention as implemented by the Transformer attention mechanism for neural machine translation. We will now be shifting our focus on the details of the Transformer architecture itself, to discover how self-attention can be implemented without relying on the use of recurrence and convolutions. In this tutorial, […] The post The Transformer Model appeared first on Machine Learning Mastery.
  • Open

    Why do policy-based methods converge on a deterministic policy?
    Hi, I'm studying Sutton and Barto's RL, in Chapter 13: Policy Gradient Methods they mention one advantage of parameterizing policies according to the soft-max in action preferences is that the approximate policy can approach a deterministic policy, my doubt is: ​ - Why do policy-based methods converge on a deterministic policy? - how a stochastic policy will approach a deterministic policy?, the output of a soft-max function is still a probability. My writing can't be that good, I'm not a native English speaker, I hope you guys understand. submitted by /u/riichitarr [link] [comments]  ( 90 min )
    "Spatial representation by ramping activity of neurons in the retrohippocampal cortex", Tennant et al 2021
    submitted by /u/gwern [link] [comments]  ( 87 min )
  • Open

    Balanced tournament designs
    Suppose you have an even number of teams that you’d like to schedule in a Round Robin tournament. This means each team plays every other team exactly once. Denote the number of teams as 2n. You’d like each team to play in each round, so you need n locations for the games to be played. […] Balanced tournament designs first appeared on John D. Cook.  ( 5 min )
  • Open

    Syllable and word validation with neural networks
    How should I start if I need to build a machine learning system that listens and validates if user reads proper syllable and/or word? I've been studying Artificial Neural Networks with Keras (by François Chollet) and I've been able to solve simplest tasks on Kaggle, but this is much complicated job, and frankly, I don't even know how to start it. Some background: I've been working for some time on program that helps children learning reading aloud. The idea is - the text is shown, and with each word read (being very liberal at the beginning) some type of reward is shown (like +1 point, sound, etc). My first idea was to build whole app and use external speech-to-text API, like Google's Speech-To-Text. After implementation first test - sadly, the lag is unacceptable, you often need to wait few seconds to get info if word was read correctly, sometimes you get few words together, it maybe works for someone who already reads, but not for a child learning to read. Also I think the API I used is overkill, since it tries to recognize words in a context, when I already have a context and I need to get true/false response if a sound passes as a given word (or even a syllable) or not. It's much simpler job that speech-to-text offers, but requires much faster response time. Maybe there is some already working API that does exactly that, like some kind of Voice Assistant API? I am perfectly fine to outsource whole engine to external service, building one from scratch is quite time consuming. submitted by /u/the-FBI-man [link] [comments]  ( 71 min )
    Is there a digital equivalent to UV ink in that it is undetectable until the data is parsed in some way?
    Traditional watermarks on most merchandising sites are now relatively useless against current, let alone future image manipulation algorithms. An initial seed is used to determine where/how/when to pull data from the stack. In theory data based on a physiological reading could be used to digitally sign a work to give artists a signature that will stick, as unique as an iris scan or a thumbprint. Any program would then be able to determine whether a work was copied by looking for this thumbprint. It should be encoded in such a way as to be made illegible or uniquely inaccurate. submitted by /u/K1ngN0thing [link] [comments]  ( 87 min )

  • Open

    [N] Microsoft Research Summit 2022 Registration is Open
    Registration for Microsoft Research Summit is now open! Join us October 18 - 20, 2022 to hear from the global research community on what's next for technology and humanity. Learn more about Research Summit and register. https://researchsummit.microsoft.com/?OCID=msr\_researchsummit\_social\_RD\_2022 submitted by /u/MicrosoftResearch [link] [comments]  ( 88 min )
    [D] NeurIPS 2022 Dataset and Benchmark Track Discussion Thread
    Hello everyone, Im wondering if anyone of you submitted to the Dataset track of Neurips2022 and similarly waiting for the decision 2 hrs later. I decide to make a discussion thread so we can have a happy time chatting just like the one we had for the main track. GL everyone. submitted by /u/SuperTankMan8964 [link] [comments]  ( 89 min )
    [D]Is it just me or has the number of obviously AI written online articles increased drastically in the past months? Should it be mandatory to mark AI written articles?
    During the last month or two, when searching for information on various topics, I stumble almost every day on quite elaborate articles that look great and have perfect grammar and seem useful on a first glance, but while reading them with interest, it becomes clear that they must have been written by AI because the articles often contradict themselves across paragraphs or make some claims that are the exact opposite of what's actually the case (not talking about opinion pieces but technical information). This is really problematic, especially since the articles often look high quality and helpful, but don't actually contain helpful information. I know that google tries to filter out AI written articles, but that doesn't seem to be working quite well yet (or it does, but the amount of such articles is so huge that still enough slip through). I am generally really against restricting internet laws, but this is wasting people's time at scale. Should it be mandatory to mark any article if it was written by AI? submitted by /u/matthias_buehlmann [link] [comments]  ( 105 min )
    [R] RWKV-4: scaling RNN to 7B params and beyond, with GPT-level language modeling and zero-shot performance
    Hi everyone :) I have finished training RWKV-4 1.5B on the Pile (330B tokens) and it's great at zero-shot comparing with GPT-Neo (same corpus). https://preview.redd.it/adxndshw12o91.png?width=1336&format=png&auto=webp&s=fbc499549e5ebbb816b2e6b1ce1bcf4a59fb61aa RWKV-4 is an attention-free RNN, thus faster and saves VRAM. It also supports a GPT-mode for parallelized training. Previous discussion: https://www.reddit.com/r/MachineLearning/comments/vzr6ie/r_rwkv3_scaling_rnn_to_15b_and_reach_transformer/ Inference / training / fine-tuning code: https://github.com/BlinkDL/RWKV-LM Model download: https://huggingface.co/BlinkDL Training is fast and stable with BFloat16 DeepSpeed ZERO2. The 3B and 7B runs will finish in 20 and 50 days respectively. No loss spikes as of now :) https://preview.redd.it/xn5heivdp8o91.png?width=871&format=png&auto=webp&s=ccd43aad158bec0a64f9deb9b6b018cce840b283 One of the nice things about RWKV is you can transfer some "time"-related params (such as decay factors) from smaller models to larger models for rapid convergence. https://preview.redd.it/x8cvsganp8o91.png?width=1066&format=png&auto=webp&s=2eb6734cbc1e1176506661ce8092f1533f97f1a0 There will be even larger models afterwards, probably on an updated Pile. You can find me in the EleutherAI Discord. Let's make it possible to run a LLM on your phone :) submitted by /u/bo_peng [link] [comments]  ( 92 min )
    [D] What happened to Reinforcement Learning research and labs?
    I took a break from keeping up with RL the past 2-3 years and I am now trying to catch up. While trying to find the most important papers I noticed that not much seems to have happened? At least on paperswithcode, the the leaderboards are still the models from a few years ago, and I didn't see any new highly cited or hyped papers. Have labs moved away from RL research and everyone is focused on optimizing Transformers and training huge language and vision models now? Or am I missing something? submitted by /u/convolutionsimp [link] [comments]  ( 100 min )
    [R] [D] Best (SOTA) gaze tracking solution for webcam?
    Hi there, I am looking for the best gaze tracking solution that works with videostream from normal webcam? Research or commercial, open- or closed-sourced, free or paid doesn't matter. What is the state of the art? submitted by /u/doktorfaustus91 [link] [comments]  ( 88 min )
    [R] Long-length documents/corpus for Medical domain NER?
    Hi folks, I am looking for long-length corpus/datasets for the Biomedical domain NER task. Most of the Datasets with long lengths are in the clinical domain; Please share if you are aware of any of the datasets in the Bio-medical field. Thanks! submitted by /u/aadityaura [link] [comments]  ( 89 min )
    [D] We all know that getting your paper accepted is a toss, but why do industry/academia insist on using it as a metric?
    I am a bit confused why, so many ML related job applications require publication at ML conferences, and, so many research groups in university prefers applicants to have previous publications in ML conferences All the while everybody knows getting paper accepted is often random and fraught with cheating. Why is there such a gap between practitioners and people who does the hiring? submitted by /u/fromnighttilldawn [link] [comments]  ( 95 min )
  • Open

    Data Standardization: Define, Test, and Transform
    While organizations shift towards establishing a data culture across the enterprise, many are still struggling to get their data right. Pulling data from disparate sources and getting varying formats and representations of what is supposed to be the same information – causes serious roadblocks in your data journey. Teams experience delays and mistakes while carrying… Read More »Data Standardization: Define, Test, and Transform The post Data Standardization: Define, Test, and Transform appeared first on Data Science Central.  ( 21 min )
  • Open

    South Korea's "artificial sun" reached 100 million °C for more than 20 seconds which is 7 times hotter than the Sun itself
    submitted by /u/FinneanCosgra [link] [comments]  ( 92 min )
    Looking to interview Artificial Intelligence Developer for a College Assignment
    Hello all. As the title suggests I was hoping to get in touch with someone who works in the A.I field as a software developer. Our assignment tasked us with interviewing someone in our desired career field, and this is what I've chosen. The interview will consist of 7 questions such as: Why did you choose your career? What do you do on a typical day? What classes, internships, jobs, certificates, or experiences do you wish you had explored when you were in college? This assignment is part of an overall research paper we are doing on our chosen career, so if time permits I may have additional questions. The entire interview shouldn't take any longer than 30-45 minutes maximum. I would like to conduct the interview over discord voice-chat, but am open to other mediums if that is not your preference. The assignment is due Wednesday the 21st, so we would need to schedule sometime between now and then. If I have multiple folks interested, I'd be more than happy to do multiple interviews as I feel it would be beneficial to the career research paper I'll be writing. Feel free to comment here or PM me if you're interested in being interviewed. Thank you for your time and consideration, I look forward to hearing from some of you! submitted by /u/lervitmayne [link] [comments]  ( 88 min )
    Stability ninja
    submitted by /u/redtailboas [link] [comments]  ( 89 min )
    Im mind blown at the advances of AI are there any limits?
    After exploring text to image synthesizers i came to realize that this seems to have opened a world of possibilities for many people. I also think of how this is the new thing and how some are apprehensive towards this tool it since it takes out a lot of the work in creating art. But I think the general culture and feeling towards these tools will change as well and be a part of everyday normal life and will be akin to using photoshop. It also makes me think of how a lot of these advances are AI and software driven. It's crazy to think we have these tools now when from my memory the data science/ML wave really started hitting stride a little over 10 yrs ago. Will we continue to see this rapid pace of AI advancement? What will these tools look like in another 10 years? submitted by /u/mathtech [link] [comments]  ( 90 min )
    Breakthrough Neural Network AI Generalizes For Robotics Tasks And Learning | New Computer Vision 3D Scanner | New Computer Vision AI For 3D Environments | New Google AI Machine Learning Can Smell
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    Will people stop building AI if they understand it might turn against us? Or will AI be damn better than us before govt can make rules about it
    submitted by /u/Logical_Train_5787 [link] [comments]  ( 90 min )
    Common understanding between humans and machines
    submitted by /u/bendee983 [link] [comments]  ( 86 min )
    7 Tips and Tricks for Starting Career in Artificial Intelligence & Machine Learning
    Artificial intelligence (AI) is the demonstration of intelligence by machines or computers, and it is often used to derive insights from massive amounts of unstructured data. Skills in artificial intelligence (AI) and machine learning (ML) are in high demand in the IT sector. Indeed, these innovative tools are reshaping the way firms’ function. Read more: https://mezkit.com/7-tips-and-tricks-for-starting-career-in-artificial-intelligence-machine-learning/ submitted by /u/Emily-joe [link] [comments]  ( 87 min )
    Various female portraits generated by a.i.
    submitted by /u/brantbkr [link] [comments]  ( 85 min )
    Beyond AlphaFold: A.I. excels at creating new proteins
    submitted by /u/qptbook [link] [comments]  ( 114 min )
    What is Artificial Intelligence?
    The term artificial intelligence was first coined in the year 1956 by John McCarthy at a Dartmouth Conference, he defined AI as “the science and engineering of making intelligent machines”. In layman's language, AI is a technique of getting machines to behave and work like humans. The goals of artificial intelligence include computer-enhanced learning, reasoning, and perception. For example, have you ever thought about how Google is able to give results so accurately? Or how your Instagram feed always gives you content based on your interest? The answer to these questions is Artificial Intelligence. submitted by /u/Ishan220699 [link] [comments]  ( 87 min )
    Object Tracking and Reidentification with FairMOT
    submitted by /u/spmallick [link] [comments]  ( 88 min )
    Quick Survey; Help a Student!
    Just need to collect some results for a Arts & Humanities capstone project. https://forms.office.com/Pages/ResponsePage.aspx?id=b29lUDxGVECfwm9Accbn_C9z4oCF26NOhG0GKM_9-CtUOTNYNDJRWEJDSTFHSVpSWUYyTFA5REU2Mi4u Thanks gang submitted by /u/Benchiridion [link] [comments]  ( 86 min )
  • Open

    Gym-Robotics 1.0 is has been released, and all environments are now updated to use new MuJoCo bindings
    submitted by /u/jkterry1 [link] [comments]  ( 102 min )
    DQN with a lot of inputs. I'm using the right approach?
    I'm creating a custom env to use with DQN. The env consist on a grid (30 x30) and the agent tasks consist on placing an object in any of the grid points. So the action space will be large (900 actions). When placing the objects, the agent can be in different fixed positions (outside the grid), for example 50 positions. It will start at the position 0 and end at the position 49 at the end of the episode. I need to keep track the current position (state) and the positions of the places objects. For that I'm using tensor of 0s and 1s. ​ When the agent places an object, it get connected with a line to the position of the robot. After this, the agent places another object from other position. If the new line intersect any of the previous line, it gets a negative reward, ​ For example, let's say I have a 2x2 grid and the agent positions are fixed to 2 positions. The logic is: first_step_positions = [1, 0] # The agent is in the first position placed_objects = [0, 0, 0, 0] # This is representing the grid positions [(0,0), (1, 0), (0, 1), (1,1)] and are all zeros because we don't have any objects places yet. first_obs = [1, 0, 0, 0, 0, 0] # positions + placed_objects to pass into the NN # The agent place and object at the (1,1) coordinate second_step_positions = [0, 1] placed_objects = [0, 0, 0, 1] # The last value is 1 because it has an object placed second_obs = [0, 1, 0, 0, 0, 1] # positions + placed_objects to pass into the NN The idea is to pass these obervations into the neural network, but I'm concern about the number of features that they have, because I will need to make the hidden layers much bigger. I'm still a RL beginner. I'm sure there must be another, more efficient way to do this. ​ Thanks! submitted by /u/Pipiyedu [link] [comments]  ( 90 min )
    reading club
    Is there online reading club? I am interested in reading and discussing RL papers from top tier conferences. submitted by /u/rlopes404 [link] [comments]  ( 110 min )
  • Open

    Robust Online Allocation with Dual Mirror Descent
    Posted by Santiago Balseiro, Staff Research Scientist, Google Research, and Associate Professor at Columbia University, and Vahab Mirrokni, Distinguished Scientist, Google Research The emergence of digital technologies has transformed decision making across commercial sectors such as airlines, online retailing, and internet advertising. Today, real-time decisions need to be repeatedly made in highly uncertain and rapidly changing environments. Moreover, organizations usually have limited resources, which need to be efficiently allocated across decisions. Such problems are referred to as online allocation problems with resource constraints, and applications abound. Some examples include: Bidding with Budget Constraints: Advertisers increasingly purchase ad slots using auction-based mark…  ( 26 min )
  • Open

    What do you think of this adaptive learning rate I came up with?
    When you take a step in gradient descent, meaning you update weights and biases, you remember all those changes. When you take a second step, you calculate angle between those two vectors via inner product and taking acos(). If this angle is small, less than 5 degrees, network keeps mutating in the similar direction, so it knows where it's going - it's doesn't just jump around nor oscilate. When this angle exceeds 5 degrees, you slowly reduce learning rate. You increase learning rate when angle is small like less than 1 degree. submitted by /u/Intro313 [link] [comments]  ( 87 min )
    Breakthrough Neural Network AI For Robotics | New Computer Vision 3D Scanner | New Computer Vision AI For 3D Environments | New Google AI Machine Learning Can Smell
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    "Do ResNets discretize Neural ODEs?" accepted to NeurIPS Conf 2022
    Do Residual Neural Networks discretize Neural Ordinary Differential Equations? By: Michael E. Sander, Pierre Ablin, Gabriel Peyré "We study the convergence of ResNets to Neural ODEs and train ResNets with a discrete adjoint method." submitted by /u/moetsi_op [link] [comments]  ( 87 min )
  • Open

    Discover insights from Zendesk with Amazon Kendra intelligent search
    Customer relationship management (CRM) is a critical tool that organizations maintain to manage customer interactions and build business relationships. Zendesk is a CRM tool that makes it easy for customers and businesses to keep in sync. Zendesk captures a wealth of customer data, such as support tickets created and updated by customers and service agents, […]  ( 5 min )
    Amazon SageMaker Automatic Model Tuning now provides up to three times faster hyperparameter tuning with Hyperband
    Amazon SageMaker Automatic Model Tuning introduces Hyperband, a multi-fidelity technique to tune hyperparameters as a faster and more efficient way to find an optimal model. In this post, we show how automatic model tuning with Hyperband can provide faster hyperparameter tuning—up to three times as fast. The benefits of Hyperband Hyperband presents two advantages over […]  ( 6 min )
    Read webpages and highlight content using Amazon Polly
    In this post, we demonstrate how to use Amazon Polly—a leading cloud service that converts text into lifelike speech—to read the content of a webpage and highlight the content as it’s being read. Adding audio playback to a webpage improves the accessibility and visitor experience of the page. Audio-enhanced content is more impactful and memorable, […]  ( 14 min )
  • Open

    A List of AI-Based Business Management Software
    AI has become a buzzword in the business world. As the technology becomes more advanced, it is becoming a necessity for businesses to adopt…  ( 9 min )
  • Open

    BR-NPA: A Non-Parametric High-Resolution Attention Model to improve the Interpretability of Attention. (arXiv:2106.02566v6 [cs.CV] UPDATED)
    The prevalence of employing attention mechanisms has brought along concerns on the interpretability of attention distributions. Although it provides insights about how a model is operating, utilizing attention as the explanation of model predictions is still highly dubious. The community is still seeking more interpretable strategies for better identifying local active regions that contribute the most to the final decision. To improve the interpretability of existing attention models, we propose a novel Bilinear Representative Non-Parametric Attention (BR-NPA) strategy that captures the task-relevant human-interpretable information. The target model is first distilled to have higher-resolution intermediate feature maps. From which, representative features are then grouped based on local pairwise feature similarity, to produce finer-grained, more precise attention maps highlighting task-relevant parts of the input. The obtained attention maps are ranked according to the activity level of the compound feature, which provides information regarding the important level of the highlighted regions. The proposed model can be easily adapted in a wide variety of modern deep models, where classification is involved. Extensive quantitative and qualitative experiments showcase more comprehensive and accurate visual explanations compared to state-of-the-art attention models and visualizations methods across multiple tasks including fine-grained image classification, few-shot classification, and person re-identification, without compromising the classification accuracy. The proposed visualization model sheds imperative light on how neural networks `pay their attention' differently in different tasks.  ( 3 min )
    Distributed Online System Identification for LTI Systems Using Reverse Experience Replay. (arXiv:2207.01062v2 [cs.LG] UPDATED)
    Identification of linear time-invariant (LTI) systems plays an important role in control and reinforcement learning. Both asymptotic and finite-time offline system identification are well-studied in the literature. For online system identification, the idea of stochastic-gradient descent with reverse experience replay (SGD-RER) was recently proposed, where the data sequence is stored in several buffers and the stochastic-gradient descent (SGD) update performs backward in each buffer to break the time dependency between data points. Inspired by this work, we study distributed online system identification of LTI systems over a multi-agent network. We consider agents as identical LTI systems, and the network goal is to jointly estimate the system parameters by leveraging the communication between agents. We propose DSGD-RER, a distributed variant of the SGD-RER algorithm, and theoretically characterize the improvement of the estimation error with respect to the network size. Our numerical experiments certify the reduction of estimation error as the network size grows.  ( 2 min )
    Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients. (arXiv:2206.06295v3 [cs.LG] UPDATED)
    Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods-which we collectively refer to as Markov chain score ascent (MCSA) methods-can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.  ( 2 min )
    To update or not to update? Neurons at equilibrium in deep models. (arXiv:2207.09455v2 [cs.LG] UPDATED)
    Recent advances in deep learning optimization showed that, with some a-posteriori information on fully-trained models, it is possible to match the same performance by simply training a subset of their parameters. Such a discovery has a broad impact from theory to applications, driving the research towards methods to identify the minimum subset of parameters to train without look-ahead information exploitation. However, the methods proposed do not match the state-of-the-art performance, and rely on unstructured sparsely connected models. In this work we shift our focus from the single parameters to the behavior of the whole neuron, exploiting the concept of neuronal equilibrium (NEq). When a neuron is in a configuration at equilibrium (meaning that it has learned a specific input-output relationship), we can halt its update; on the contrary, when a neuron is at non-equilibrium, we let its state evolve towards an equilibrium state, updating its parameters. The proposed approach has been tested on different state-of-the-art learning strategies and tasks, validating NEq and observing that the neuronal equilibrium depends on the specific learning setup.  ( 2 min )
    A Light Recipe to Train Robust Vision Transformers. (arXiv:2209.07399v1 [cs.CV])
    In this paper, we ask whether Vision Transformers (ViTs) can serve as an underlying architecture for improving the adversarial robustness of machine learning models against evasion attacks. While earlier works have focused on improving Convolutional Neural Networks, we show that also ViTs are highly suitable for adversarial training to achieve competitive performance. We achieve this objective using a custom adversarial training recipe, discovered using rigorous ablation studies on a subset of the ImageNet dataset. The canonical training recipe for ViTs recommends strong data augmentation, in part to compensate for the lack of vision inductive bias of attention modules, when compared to convolutions. We show that this recipe achieves suboptimal performance when used for adversarial training. In contrast, we find that omitting all heavy data augmentation, and adding some additional bag-of-tricks ($\varepsilon$-warmup and larger weight decay), significantly boosts the performance of robust ViTs. We show that our recipe generalizes to different classes of ViT architectures and large-scale models on full ImageNet-1k. Additionally, investigating the reasons for the robustness of our models, we show that it is easier to generate strong attacks during training when using our recipe and that this leads to better robustness at test time. Finally, we further study one consequence of adversarial training by proposing a way to quantify the semantic nature of adversarial perturbations and highlight its correlation with the robustness of the model. Overall, we recommend that the community should avoid translating the canonical training recipes in ViTs to robust training and rethink common training choices in the context of adversarial training.  ( 3 min )
    Understanding Robust Learning through the Lens of Representation Similarities. (arXiv:2206.09868v2 [cs.LG] UPDATED)
    Representation learning, i.e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs). Recently, robustness to adversarial examples has emerged as a desirable property for DNNs, spurring the development of robust training methods that account for adversarial examples. In this paper, we aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training. This is critical to diagnosing numerous salient pitfalls in robust networks, such as, degradation of performance on benign inputs, poor generalization of robustness, and increase in over-fitting. We utilize a powerful set of tools known as representation similarity metrics, across three vision datasets, to obtain layer-wise comparisons between robust and non-robust DNNs with different training procedures, architectural parameters and adversarial constraints. Our experiments highlight hitherto unseen properties of robust representations that we posit underlie the behavioral differences of robust networks. We discover a lack of specialization in robust networks' representations along with a disappearance of `block structure'. We also find overfitting during robust training largely impacts deeper layers. These, along with other findings, suggest ways forward for the design and training of better robust networks.  ( 3 min )
    Experimental Investigation of Variational Mode Decomposition and Deep Learning for Short-Term Multi-horizon Residential Electric Load Forecasting. (arXiv:2202.03264v2 [eess.SP] UPDATED)
    With the booming growth of advanced digital technologies, it has become possible for users as well as distributors of energy to obtain detailed and timely information about the electricity consumption of households. These technologies can also be used to forecast the household's electricity consumption (a.k.a. the load). In this paper, we investigate the use of Variational Mode Decomposition and deep learning techniques to improve the accuracy of the load forecasting problem. Although this problem has been studied in the literature, selecting an appropriate decomposition level and a deep learning technique providing better forecasting performance have garnered comparatively less attention. This study bridges this gap by studying the effect of six decomposition levels and five distinct deep learning networks. The raw load profiles are first decomposed into intrinsic mode functions using the Variational Mode Decomposition in order to mitigate their non-stationary aspect. Then, day, hour, and past electricity consumption data are fed as a three-dimensional input sequence to a four-level Wavelet Decomposition Network model. Finally, the forecast sequences related to the different intrinsic mode functions are combined to form the aggregate forecast sequence. The proposed method was assessed using load profiles of five Moroccan households from the Moroccan buildings' electricity consumption dataset (MORED) and was benchmarked against state-of-the-art time-series models and a baseline persistence model.  ( 3 min )
    Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization). (arXiv:2209.07263v1 [cs.LG])
    We study the average robustness notion in deep neural networks in (selected) wide and narrow, deep and shallow, as well as lazy and non-lazy training settings. We prove that in the under-parameterized setting, width has a negative effect while it improves robustness in the over-parameterized setting. The effect of depth closely depends on the initialization and the training mode. In particular, when initialized with LeCun initialization, depth helps robustness with lazy training regime. In contrast, when initialized with Neural Tangent Kernel (NTK) and He-initialization, depth hurts the robustness. Moreover, under non-lazy training regime, we demonstrate how the width of a two-layer ReLU network benefits robustness. Our theoretical developments improve the results by Huang et al. [2021], Wu et al. [2021] and are consistent with Bubeck and Sellke [2021], Bubeck et al. [2021].  ( 2 min )
    Stochastic first-order methods for average-reward Markov decision processes. (arXiv:2205.05800v5 [cs.LG] UPDATED)
    We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization. Existing on-policy evaluation methods suffer from sub-optimal convergence rates as well as failure in handling insufficiently random policies, e.g., deterministic policies, for lack of exploration. To remedy these issues, we develop a novel variance-reduced temporal difference (VRTD) method with linear function approximation for randomized policies along with sharp convergence guarantees, and an exploratory variance-reduced temporal difference (EVRTD) method for insufficiently random policies with comparable convergence guarantees. We further establish linear convergence rate on the bias of policy evaluation, which is essential for improving the overall sample complexity of policy optimization. On the other hand, compared with intensive research interest in finite sample analysis of policy gradient methods for discounted MDPs, existing studies on policy gradient methods for AMDPs mostly focus on regret bounds under restrictive assumptions on the underlying Markov processes (see, e.g., Abbasi-Yadkori et al., 2019), and they often lack guarantees on the overall sample complexities. Towards this end, we develop an average-reward variant of the stochastic policy mirror descent (SPMD) (Lan, 2022). We establish the first $\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity for solving AMDPs with policy gradient method under both the generative model (with unichain assumption) and Markovian noise model (with ergodic assumption). This bound can be further improved to $\widetilde{\mathcal{O}}(\epsilon^{-1})$ for solving regularized AMDPs. Our theoretical advantages are corroborated by numerical experiments.  ( 3 min )
    DEQGAN: Learning the Loss Function for PINNs with Generative Adversarial Networks. (arXiv:2209.07081v1 [cs.LG])
    Solutions to differential equations are of significant scientific and engineering relevance. Physics-Informed Neural Networks (PINNs) have emerged as a promising method for solving differential equations, but they lack a theoretical justification for the use of any particular loss function. This work presents Differential Equation GAN (DEQGAN), a novel method for solving differential equations using generative adversarial networks to "learn the loss function" for optimizing the neural network. Presenting results on a suite of twelve ordinary and partial differential equations, including the nonlinear Burgers', Allen-Cahn, Hamilton, and modified Einstein's gravity equations, we show that DEQGAN can obtain multiple orders of magnitude lower mean squared errors than PINNs that use $L_2$, $L_1$, and Huber loss functions. We also show that DEQGAN achieves solution accuracies that are competitive with popular numerical methods. Finally, we present two methods to improve the robustness of DEQGAN to different hyperparameter settings.  ( 2 min )
    OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs. (arXiv:2205.15117v4 [cs.LG] UPDATED)
    This work provides the first theoretical study on the ability of graph Message Passing Neural Networks (gMPNNs) -- such as Graph Neural Networks (GNNs) -- to achieve counterfactually-invariant representations for inductive out-of-distribution (OOD) link prediction tasks, where deployment (test) graph sizes are larger than training graphs. We first prove non-asymptotic bounds showing that link predictors based on permutation-equivariant (structural) node embeddings obtained by gMPNNs can converge to a random guess as test graphs get larger. We then propose a theoretically-sound gMPNN that outputs structural pairwise (2-node) embeddings and prove non-asymptotic bounds showing that, as test graphs grow, these embeddings converge to embeddings of a continuous function that retains its ability to predict links OOD. Empirical results on random graphs show agreement with our theoretical results.  ( 2 min )
    Differentially Private Estimation of Hawkes Process. (arXiv:2209.07303v1 [cs.LG])
    Point process models are of great importance in real world applications. In certain critical applications, estimation of point process models involves large amounts of sensitive personal data from users. Privacy concerns naturally arise which have not been addressed in the existing literature. To bridge this glaring gap, we propose the first general differentially private estimation procedure for point process models. Specifically, we take the Hawkes process as an example, and introduce a rigorous definition of differential privacy for event stream data based on a discretized representation of the Hawkes process. We then propose two differentially private optimization algorithms, which can efficiently estimate Hawkes process models with the desired privacy and utility guarantees under two different settings. Experiments are provided to back up our theoretical analysis.  ( 2 min )
    Learning the conditional law: signatures and conditional GANs in filtering and prediction of diffusion processes. (arXiv:2204.00611v2 [stat.ML] UPDATED)
    We consider the filtering and prediction problem for a diffusion process. The signal and observation are modeled by stochastic differential equations (SDEs) driven by correlated Wiener processes. In classical estimation theory, measure-valued stochastic partial differential equations (SPDEs) are derived for the filtering and prediction measures. These equations can be hard to solve numerically. We provide an approximation algorithm using conditional generative adversarial networks (GANs) in combination with signatures, an object from rough path theory. The signature of a sufficiently smooth path determines the path completely. As a result, in some cases, GANs based on signatures have been shown to efficiently approximate the law of a stochastic process. For our algorithm we extend this method to sample from the conditional law, given noisy, partial observation. Our generator is constructed using neural differential equations (NDEs), relying on their universal approximator property. We show well-posedness in providing a rigorous mathematical framework. Numerical results show the efficiency of our algorithm.  ( 2 min )
    Trustworthy modelling of atmospheric formaldehyde powered by deep learning. (arXiv:2209.07414v1 [physics.ao-ph])
    Formaldehyde (HCHO) is one one of the most important trace gas in the atmosphere, as it is a pollutant causing respiratory and other diseases. It is also a precursor of tropospheric ozone which damages crops and deteriorates human health. Study of HCHO chemistry and long-term monitoring using satellite data is important from the perspective of human health, food security and air pollution. Dynamic atmospheric chemistry models struggle to simulate atmospheric formaldehyde and often overestimate by up to two times relative to satellite observations and reanalysis. Spatial distribution of modelled HCHO also fail to match satellite observations. Here, we present deep learning approach using a simple super-resolution based convolutional neural network towards simulating fast and reliable atmospheric HCHO. Our approach is an indirect method of HCHO estimation without the need to chemical equations. We find that deep learning outperforms dynamical model simulations which involves complicated atmospheric chemistry representation. Causality establishing the nonlinear relationships of different variables to target formaldehyde is established in our approach by using a variety of precursors from meteorology and chemical reanalysis to target OMI AURA satellite based HCHO predictions. We choose South Asia for testing our implementation as it doesnt have in situ measurements of formaldehyde and there is a need for improved quality data over the region. Moreover, there are spatial and temporal data gaps in the satellite product which can be removed by trustworthy modelling of atmospheric formaldehyde. This study is a novel attempt using computer vision for trustworthy modelling of formaldehyde from remote sensing can lead to cascading societal benefits.  ( 3 min )
    Well Googled is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-based Google Trends. (arXiv:2109.09824v5 [cs.CV] UPDATED)
    New fashion product sales forecasting is a challenging problem that involves many business dynamics and cannot be solved by classical forecasting approaches. In this paper, we investigate the effectiveness of systematically probing exogenous knowledge in the form of Google Trends time series and combining it with multi-modal information related to a brand-new fashion item, in order to effectively forecast its sales despite the lack of past data. In particular, we propose a neural network-based approach, where an encoder learns a representation of the exogenous time series, while the decoder forecasts the sales based on the Google Trends encoding and the available visual and metadata information. Our model works in a non-autoregressive manner, avoiding the compounding effect of large first-step errors. As a second contribution, we present VISUELLE, a publicly available dataset for the task of new fashion product sales forecasting, containing multimodal information for 5577 real, new products sold between 2016-2019 from Nunalie, an Italian fast-fashion company. The dataset is equipped with images of products, metadata, related sales, and associated Google Trends. We use VISUELLE to compare our approach against state-of-the-art alternatives and several baselines, showing that our neural network-based approach is the most accurate in terms of both percentage and absolute error. It is worth noting that the addition of exogenous knowledge boosts the forecasting accuracy by 1.5% WAPE wise, revealing the importance of exploiting informative external information. The code and dataset are both available at https://github.com/HumaticsLAB/GTM-Transformer.  ( 3 min )
    Shifts 2.0: Extending The Dataset of Real Distributional Shifts. (arXiv:2206.15407v2 [cs.LG] UPDATED)
    Distributional shift, or the mismatch between training and deployment data, is a significant obstacle to the usage of machine learning in high-stakes industrial applications, such as autonomous driving and medicine. This creates a need to be able to assess how robustly ML models generalize as well as the quality of their uncertainty estimates. Standard ML baseline datasets do not allow these properties to be assessed, as the training, validation and test data are often identically distributed. Recently, a range of dedicated benchmarks have appeared, featuring both distributionally matched and shifted data. Among these benchmarks, the Shifts dataset stands out in terms of the diversity of tasks as well as the data modalities it features. While most of the benchmarks are heavily dominated by 2D image classification tasks, Shifts contains tabular weather forecasting, machine translation, and vehicle motion prediction tasks. This enables the robustness properties of models to be assessed on a diverse set of industrial-scale tasks and either universal or directly applicable task-specific conclusions to be reached. In this paper, we extend the Shifts Dataset with two datasets sourced from industrial, high-risk applications of high societal importance. Specifically, we consider the tasks of segmentation of white matter Multiple Sclerosis lesions in 3D magnetic resonance brain images and the estimation of power consumption in marine cargo vessels. Both tasks feature ubiquitous distributional shifts and a strict safety requirement due to the high cost of errors. These new datasets will allow researchers to further explore robust generalization and uncertainty estimation in new situations. In this work, we provide a description of the dataset and baseline results for both tasks.  ( 3 min )
    Robust Anytime Learning of Markov Decision Processes. (arXiv:2205.15827v2 [cs.AI] UPDATED)
    Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in data-driven applications, deriving precise probabilities from (limited) data introduces statistical errors that may lead to unexpected or undesirable outcomes. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions, accounting for such limited data. Tools from the formal verification community efficiently compute robust policies that provably adhere to formal specifications, like safety constraints, under the worst-case instance in the uncertainty set. We continuously learn the transition probabilities of an MDP in a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies. In particular, our method (1) approximates probabilities as intervals, (2) adapts to new data that may be inconsistent with an intermediate model, and (3) may be stopped at any time to compute a robust policy on the uMDP that faithfully captures the data so far. We show the effectiveness of our approach and compare it to robust policies computed on uMDPs learned by the UCRL2 reinforcement learning algorithm in an experimental evaluation on several benchmarks.  ( 3 min )
    A Temporal Graphlet Kernel for Classifying Dissemination in Evolving Networks. (arXiv:2209.07332v1 [cs.SI])
    We introduce the \emph{temporal graphlet kernel} for classifying dissemination processes in labeled temporal graphs. Such dissemination processes can be spreading (fake) news, infectious diseases, or computer viruses in dynamic networks. The networks are modeled as labeled temporal graphs, in which the edges exist at specific points in time, and node labels change over time. The classification problem asks to discriminate dissemination processes of different origins or parameters, e.g., infectious diseases with different infection probabilities. Our new kernel represents labeled temporal graphs in the feature space of temporal graphlets, i.e., small subgraphs distinguished by their structure, time-dependent node labels, and chronological order of edges. We introduce variants of our kernel based on classes of graphlets that are efficiently countable. For the case of temporal wedges, we propose a highly efficient approximative kernel with low error in expectation. We show that our kernels are faster to compute and provide better accuracy than state-of-the-art methods.  ( 2 min )
    Non-Parallel Voice Conversion for ASR Augmentation. (arXiv:2209.06987v1 [cs.SD])
    Automatic speech recognition (ASR) needs to be robust to speaker differences. Voice Conversion (VC) modifies speaker characteristics of input speech. This is an attractive feature for ASR data augmentation. In this paper, we demonstrate that voice conversion can be used as a data augmentation technique to improve ASR performance, even on LibriSpeech, which contains 2,456 speakers. For ASR augmentation, it is necessary that the VC model be robust to a wide range of input speech. This motivates the use of a non-autoregressive, non-parallel VC model, and the use of a pretrained ASR encoder within the VC model. This work suggests that despite including many speakers, speaker diversity may remain a limitation to ASR quality. Finally, interrogation of our VC performance has provided useful metrics for objective evaluation of VC quality.  ( 2 min )
    Private Synthetic Data for Multitask Learning and Marginal Queries. (arXiv:2209.07400v1 [cs.LG])
    We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets.  ( 3 min )
    Few-Shot Object Detection: A Comprehensive Survey. (arXiv:2112.11699v2 [cs.CV] UPDATED)
    Humans are able to learn to recognize new objects even from a few examples. In contrast, training deep-learning-based object detectors requires huge amounts of annotated data. To avoid the need to acquire and annotate these huge amounts of data, few-shot object detection aims to learn from few object instances of new categories in the target domain. In this survey, we provide an overview of the state of the art in few-shot object detection. We categorize approaches according to their training scheme and architectural layout. For each type of approaches, we describe the general realization as well as concepts to improve the performance on novel categories. Whenever appropriate, we give short takeaways regarding these concepts in order to highlight the best ideas. Eventually, we introduce commonly used datasets and their evaluation protocols and analyze reported benchmark results. As a result, we emphasize common challenges in evaluation and identify the most promising current trends in this emerging field of few-shot object detection.  ( 2 min )
    Efficient first-order predictor-corrector multiple objective optimization for fair misinformation detection. (arXiv:2209.07245v1 [cs.LG])
    Multiple-objective optimization (MOO) aims to simultaneously optimize multiple conflicting objectives and has found important applications in machine learning, such as minimizing classification loss and discrepancy in treating different populations for fairness. At optimality, further optimizing one objective will necessarily harm at least another objective, and decision-makers need to comprehensively explore multiple optima (called Pareto front) to pinpoint one final solution. We address the efficiency of finding the Pareto front. First, finding the front from scratch using stochastic multi-gradient descent (SMGD) is expensive with large neural networks and datasets. We propose to explore the Pareto front as a manifold from a few initial optima, based on a predictor-corrector method. Second, for each exploration step, the predictor solves a large-scale linear system that scales quadratically in the number of model parameters and requires one backpropagation to evaluate a second-order Hessian-vector product per iteration of the solver. We propose a Gauss-Newton approximation that only scales linearly, and that requires only first-order inner-product per iteration. This also allows for a choice between the MINRES and conjugate gradient methods when approximately solving the linear system. The innovations make predictor-corrector possible for large networks. Experiments on multi-objective (fairness and accuracy) misinformation detection tasks show that 1) the predictor-corrector method can find Pareto fronts better than or similar to SMGD with less time; and 2) the proposed first-order method does not harm the quality of the Pareto front identified by the second-order method, while further reduce running time.  ( 3 min )
    Chemotaxis of sea urchin sperm cells through deep reinforcement learning. (arXiv:2209.07407v1 [cs.NE])
    By imitating biological microswimmers, microrobots can be designed to accomplish targeted delivery of cargos and biomedical manipulations at microscale. However, it is still a great challenge to enable microrobots to maneuver in a complex environment. Machine learning algorithms offer a tool to boost mobility and flexibility of a synthetic microswimmer, hence could help us design truly smart microrobots. In this work, we investigate how a model of sea urchin sperm cell can self-learn chemotactic motion in a chemoattractant concentration field. We employ an artificial neural network to act as a decision-making agent and facilitate the sperm cell to discover efficient maneuver strategies through a deep reinforcement learning (DRL) algorithm. Our results show that chemotactic behaviours, very similar to the realistic ones, can be achieved by the DRL utilizing only limited environmental information. In most cases, the DRL algorithm discovers more efficient strategies than the human-devised one. Furthermore, the DRL can even utilize an external disturbance to facilitate the chemotactic motion if the extra flow information is also taken into account by the artificial neural network. Our results provide insights to the chemotactic process of sea urchin sperm cells and also prepare guidance for the intelligent maneuver of microrobots.  ( 3 min )
    Private Stochastic Optimization in the Presence of Outliers: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses. (arXiv:2209.07403v1 [cs.LG])
    We study differentially private (DP) stochastic optimization (SO) with data containing outliers and loss functions that are not Lipschitz continuous. To date, the vast majority of work on DP SO assumes that the loss is Lipschitz (i.e. stochastic gradients are uniformly bounded), and their error bounds scale with the Lipschitz parameter of the loss. While this assumption is convenient, it is often unrealistic: in many practical problems where privacy is required, data may contain outliers or be unbounded, causing some stochastic gradients to have large norm. In such cases, the Lipschitz parameter may be prohibitively large, leading to vacuous excess risk bounds. Thus, building on a recent line of work [WXDX20, KLZ22], we make the weaker assumption that stochastic gradients have bounded $k$-th moments for some $k \geq 2$. Compared with works on DP Lipschitz SO, our excess risk scales with the $k$-th moment bound instead of the Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). Moreover, in contrast to the prior works [WXDX20, KLZ22], our bounds do not require the loss function to be differentiable/smooth. We also devise an accelerated algorithm that runs in linear time and yields improved (compared to prior works) and nearly optimal excess risk for smooth losses. Additionally, our work is the first to address non-convex non-Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some classes of neural nets, among other practical models. Our Proximal-PL algorithm has nearly optimal excess risk that almost matches the strongly convex lower bound. Lastly, we provide shuffle DP variations of our algorithms, which do not require a trusted curator (e.g. for distributed learning).  ( 3 min )
    Gromov-Wasserstein Autoencoders. (arXiv:2209.07007v1 [cs.LG])
    Learning concise data representations without supervisory signals is a fundamental challenge in machine learning. A prominent approach to this goal is likelihood-based models such as variational autoencoders (VAE) to learn latent representations based on a meta-prior, which is a general premise assumed beneficial for downstream tasks (e.g., disentanglement). However, such approaches often deviate from the original likelihood architecture to apply the introduced meta-prior, causing undesirable changes in their training. In this paper, we propose a novel representation learning method, Gromov-Wasserstein Autoencoders (GWAE), which directly matches the latent and data distributions. Instead of a likelihood-based objective, GWAE models have a trainable prior optimized by minimizing the Gromov-Wasserstein (GW) metric. The GW metric measures the distance structure-oriented discrepancy between distributions supported on incomparable spaces, e.g., with different dimensionalities. By restricting the family of the trainable prior, we can introduce meta-priors to control latent representations for downstream tasks. The empirical comparison with the existing VAE-based methods shows that GWAE models can learn representations based on meta-priors by changing the prior family without further modifying the GW objective.
    Overhead-Free Blockage Detection and Precoding Through Physics-Based Graph Neural Networks: LIDAR Data Meets Ray Tracing. (arXiv:2209.07350v1 [cs.IT])
    In this letter, we address blockage detection and precoder design for multiple-input multiple-output (MIMO) links, without communication overhead required. Blockage detection is achieved by classifying light detection and ranging (LIDAR) data through a physics-based graph neural network (GNN). For precoder design, a preliminary channel estimate is obtained by running ray tracing on a 3D surface obtained from LIDAR data. This estimate is successively refined and the precoder is designed accordingly. Numerical simulations show that blockage detection is successful with 95% accuracy. Our digital precoding achieves 90% of the capacity and analog precoding outperforms previous works exploiting LIDAR for precoder design.  ( 2 min )
    Continuous MDP Homomorphisms and Homomorphic Policy Gradient. (arXiv:2209.07364v1 [cs.LG])
    Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this paper, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximate symmetries of the environment for policy optimization. Based on this theorem, we propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. We demonstrate the effectiveness of our method on benchmark tasks in the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance when learning from pixel observations.  ( 2 min )
    Learning Debiased Classifier with Biased Committee. (arXiv:2206.10843v2 [cs.LG] UPDATED)
    Neural networks are prone to be biased towards spurious correlations between classes and latent attributes exhibited in a major portion of training data, which ruins their generalization capability. This paper proposes a new method for training debiased classifiers with no spurious attribute label. The key idea of the method is to employ a committee of classifiers as an auxiliary module that identifies bias-conflicting data, i.e., data without spurious correlations, and assigns large weights to them when training the main classifier. The committee is learned as a bootstrapped ensemble so that a majority of its classifiers are biased as well as being diverse, and intentionally fail to predict classes of bias-conflicting data accordingly. The consensus within the committee on prediction difficulty thus provides a reliable cue for identifying and weighting bias-conflicting data. Moreover, the committee is also trained with knowledge transferred from the main classifier so that it gradually becomes debiased along with the main classifier and emphasizes more difficult data as training progresses. On five real-world datasets, our method outperforms existing methods using no spurious attribute label like ours and even surpasses those relying on bias labels occasionally.  ( 3 min )
    PALBERT: Teaching ALBERT to Ponder. (arXiv:2204.03276v2 [cs.LG] UPDATED)
    Currently, pre-trained models can be considered the default choice for a wide range of NLP tasks. Despite their SoTA results, there is practical evidence that these models may require a different number of computing layers for different input sequences, since evaluating all layers leads to overconfidence in wrong predictions (namely overthinking). This problem can potentially be solved by implementing adaptive computation time approaches, which were first designed to improve inference speed. Recently proposed PonderNet may be a promising solution for performing an early exit by treating the exit layer's index as a latent variable. However, the originally proposed exit criterion, relying on sampling from trained posterior distribution on the probability of exiting from the $i$-th layer, introduces major variance in exit layer indices, significantly reducing the resulting model's performance. In this paper, we propose improving PonderNet with a novel deterministic Q-exit criterion and a revisited model architecture. We adapted the proposed mechanism to ALBERT and RoBERTa and compared it with recent methods for performing an early exit. We observed that the proposed changes can be considered significant improvements on the original PonderNet architecture and outperform PABEE on a wide range of GLUE tasks. In addition, we also performed an in-depth ablation study of the proposed architecture to further understand Lambda layers and their performance.  ( 3 min )
    Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models. (arXiv:2209.06970v1 [cs.CV])
    Generative models (e.g., GANs and diffusion models) learn the underlying data distribution in an unsupervised manner. However, many applications of interest require sampling from a specific region of the generative model's output space or evenly over a range of characteristics. To allow efficient sampling in these scenarios, we propose Generative Visual Prompt (PromptGen), a framework for distributional control over pre-trained generative models by incorporating knowledge of arbitrary off-the-shelf models. PromptGen defines control as an energy-based model (EBM) and samples images in a feed-forward manner by approximating the EBM with invertible neural networks, avoiding optimization at inference. We demonstrate how PromptGen can control several generative models (e.g., StyleGAN2, StyleNeRF, diffusion autoencoder, and NVAE) using various off-the-shelf models: (1) with the CLIP model, PromptGen can sample images guided by text, (2) with image classifiers, PromptGen can de-bias generative models across a set of attributes, and (3) with inverse graphics models, PromptGen can sample images of the same identity in different poses. (4) Finally, PromptGen reveals that the CLIP model shows "reporting bias" when used as control, and PromptGen can further de-bias this controlled distribution in an iterative manner. Our code is available at https://github.com/ChenWu98/Generative-Visual-Prompt.
    Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL). (arXiv:2209.07437v1 [cs.LG])
    Mean-Field Control (MFC) has recently been proven to be a scalable tool to approximately solve large-scale multi-agent reinforcement learning (MARL) problems. However, these studies are typically limited to unconstrained cumulative reward maximization framework. In this paper, we show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Specifically, we prove that, an $N$-agent constrained MARL problem, with state, and action spaces of each individual agents being of sizes $|\mathcal{X}|$, and $|\mathcal{U}|$ respectively, can be approximated by an associated constrained MFC problem with an error, $e\triangleq \mathcal{O}\left([\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}]/\sqrt{N}\right)$. In a special case where the reward, cost, and state transition functions are independent of the action distribution of the population, we prove that the error can be improved to $e=\mathcal{O}(\sqrt{|\mathcal{X}|}/\sqrt{N})$. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $\mathcal{O}(e)$ with a sample complexity of $\mathcal{O}(e^{-6})$.
    Adaptive Fairness Improvement Based on Causality Analysis. (arXiv:2209.07190v1 [cs.LG])
    Given a discriminating neural network, the problem of fairness improvement is to systematically reduce discrimination without significantly scarifies its performance (i.e., accuracy). Multiple categories of fairness improving methods have been proposed for neural networks, including pre-processing, in-processing and post-processing. Our empirical study however shows that these methods are not always effective (e.g., they may improve fairness by paying the price of huge accuracy drop) or even not helpful (e.g., they may even worsen both fairness and accuracy). In this work, we propose an approach which adaptively chooses the fairness improving method based on causality analysis. That is, we choose the method based on how the neurons and attributes responsible for unfairness are distributed among the input attributes and the hidden neurons. Our experimental evaluation shows that our approach is effective (i.e., always identify the best fairness improving method) and efficient (i.e., with an average time overhead of 5 minutes).
    Delving into Inter-Image Invariance for Unsupervised Visual Representations. (arXiv:2008.11702v3 [cs.CV] UPDATED)
    Contrastive learning has recently shown immense potential in unsupervised visual representation learning. Existing studies in this track mainly focus on intra-image invariance learning. The learning typically uses rich intra-image transformations to construct positive pairs and then maximizes agreement using a contrastive loss. The merits of inter-image invariance, conversely, remain much less explored. One major obstacle to exploit inter-image invariance is that it is unclear how to reliably construct inter-image positive pairs, and further derive effective supervision from them since no pair annotations are available. In this work, we present a comprehensive empirical study to better understand the role of inter-image invariance learning from three main constituting components: pseudo-label maintenance, sampling strategy, and decision boundary design. To facilitate the study, we introduce a unified and generic framework that supports the integration of unsupervised intra- and inter-image invariance learning. Through carefully-designed comparisons and analysis, multiple valuable observations are revealed: 1) online labels converge faster and perform better than offline labels; 2) semi-hard negative samples are more reliable and unbiased than hard negative samples; 3) a less stringent decision boundary is more favorable for inter-image invariance learning. With all the obtained recipes, our final model, namely InterCLR, shows consistent improvements over state-of-the-art intra-image invariance learning methods on multiple standard benchmarks. We hope this work will provide useful experience for devising effective unsupervised inter-image invariance learning. Code: https://github.com/open-mmlab/mmselfsup.  ( 3 min )
    On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition. (arXiv:2209.07474v1 [cs.CV])
    Recently vision transformers have been shown to be competitive with convolution-based methods (CNNs) broadly across multiple vision tasks. The less restrictive inductive bias of transformers endows greater representational capacity in comparison with CNNs. However, in the image classification setting this flexibility comes with a trade-off with respect to sample efficiency, where transformers require ImageNet-scale training. This notion has carried over to video where transformers have not yet been explored for video classification in the low-labeled or semi-supervised settings. Our work empirically explores the low data regime for video classification and discovers that, surprisingly, transformers perform extremely well in the low-labeled video setting compared to CNNs. We specifically evaluate video vision transformers across two contrasting video datasets (Kinetics-400 and SomethingSomething-V2) and perform thorough analysis and ablation studies to explain this observation using the predominant features of video transformer architectures. We even show that using just the labeled data, transformers significantly outperform complex semi-supervised CNN methods that leverage large-scale unlabeled data as well. Our experiments inform our recommendation that semi-supervised learning video work should consider the use of video transformers in the future.  ( 2 min )
    Deep learning in a bilateral brain with hemispheric specialization. (arXiv:2209.06862v1 [q-bio.NC])
    The brains of all bilaterally symmetric animals on Earth are are divided into left and right hemispheres. The anatomy and functionality of the hemispheres have a large degree of overlap, but they specialize to possess different attributes. The left hemisphere is believed to specialize in specificity and routine, the right in generalities and novelty. In this study, we propose an artificial neural network that imitates that bilateral architecture using two convolutional neural networks with different training objectives and test it on an image classification task. The bilateral architecture outperforms architectures of similar representational capacity that don't exploit differential specialization. It demonstrates the efficacy of bilateralism and constitutes a new principle that could be incorporated into other computational neuroscientific models and used as an inductive bias when designing new ML systems. An analysis of the model can help us to understand the human brain.  ( 2 min )
    A Robotic Visual Grasping Design: Rethinking Convolution Neural Network with High-Resolutions. (arXiv:2209.07459v1 [cs.RO])
    High-resolution representations are important for vision-based robotic grasping problems. Existing works generally encode the input images into low-resolution representations via sub-networks and then recover high-resolution representations. This will lose spatial information, and errors introduced by the decoder will be more serious when multiple types of objects are considered or objects are far away from the camera. To address these issues, we revisit the design paradigm of CNN for robotic perception tasks. We demonstrate that using parallel branches as opposed to serial stacked convolutional layers will be a more powerful design for robotic visual grasping tasks. In particular, guidelines of neural network design are provided for robotic perception tasks, e.g., high-resolution representation and lightweight design, which respond to the challenges in different manipulation scenarios. We then develop a novel grasping visual architecture referred to as HRG-Net, a parallel-branch structure that always maintains a high-resolution representation and repeatedly exchanges information across resolutions. Extensive experiments validate that these two designs can effectively enhance the accuracy of visual-based grasping and accelerate network training. We show a series of comparative experiments in real physical environments at Youtube: https://youtu.be/Jhlsp-xzHFY.  ( 2 min )
    Personalized Rehabilitation Robotics based on Online Learning Control. (arXiv:2110.00481v2 [cs.LG] UPDATED)
    The use of rehabilitation robotics in clinical applications gains increasing importance, due to therapeutic benefits and the ability to alleviate labor-intensive works. However, their practical utility is dependent on the deployment of appropriate control algorithms, which adapt the level of task-assistance according to each individual patient's need. Generally, the required personalization is achieved through manual tuning by clinicians, which is cumbersome and error-prone. In this work we propose a novel online learning control architecture, which is able to personalize the control force at run time to each individual user. To this end, we deploy Gaussian process-based online learning with previously unseen prediction and update rates. Finally, we evaluate our method in an experimental user study, where the learning controller is shown to provide personalized control, while also obtaining safe interaction forces.  ( 2 min )
    Learning to Exploit Elastic Actuators for Quadruped Locomotion. (arXiv:2209.07171v1 [cs.RO])
    Spring-based actuators in legged locomotion provide energy-efficiency and improved performance, but increase the difficulty of controller design. Whereas previous works have focused on extensive modeling and simulation to find optimal controllers for such systems, we propose to learn model-free controllers directly on the real robot. In our approach, gaits are first synthesized by central pattern generators (CPGs), whose parameters are optimized to quickly obtain an open-loop controller that achieves efficient locomotion. Then, to make that controller more robust and further improve the performance, we use reinforcement learning to close the loop, to learn corrective actions on top of the CPGs. We evaluate the proposed approach in DLR's elastic quadruped bert. Our results in learning trotting and pronking gaits show that exploitation of the spring actuator dynamics emerges naturally from optimizing for dynamic motions, yielding high-performing locomotion despite being model-free. The whole process takes no more than 1.5 hours on the real robot and results in natural-looking gaits.  ( 2 min )
    Towards self-attention based navigation in the real world. (arXiv:2209.07043v1 [cs.RO])
    Vision-based navigation requires processing complex information to make task-orientated decisions. Applications include autonomous robots, self-driving cars, and assistive vision for humans. One of the key elements in the process is the extraction and selection of relevant features in pixel space upon which to base action choices, for which Machine Learning techniques are well suited. However, Deep Reinforcement Learning agents trained in simulation often exhibit unsatisfactory results when deployed in the real-world due to perceptual differences known as the $\textit{reality gap}$. An approach that is yet to be explored to bridge this gap is self-attention. In this paper we (1) perform a systematic exploration of the hyperparameter space for self-attention based navigation of 3D environments and qualitatively appraise behaviour observed from different hyperparameter sets, including their ability to generalise; (2) present strategies to improve the agents' generalisation abilities and navigation behaviour; and (3) show how models trained in simulation are capable of processing real world images meaningfully in real time. To our knowledge, this is the first demonstration of a self-attention based agent successfully trained in navigating a 3D action space, using less than 4000 parameters.  ( 2 min )
    Fitting an immersed submanifold to data via Sussmann's orbit theorem. (arXiv:2204.01119v3 [cs.LG] UPDATED)
    This paper describes an approach for fitting an immersed submanifold of a finite-dimensional Euclidean space to random samples. The reconstruction mapping from the ambient space to the desired submanifold is implemented as a composition of an encoder that maps each point to a tuple of (positive or negative) times and a decoder given by a composition of flows along finitely many vector fields starting from a fixed initial point. The encoder supplies the times for the flows. The encoder-decoder map is obtained by empirical risk minimization, and a high-probability bound is given on the excess risk relative to the minimum expected reconstruction error over a given class of encoder-decoder maps. The proposed approach makes fundamental use of Sussmann's orbit theorem, which guarantees that the image of the reconstruction map is indeed contained in an immersed submanifold.  ( 2 min )
    Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data Visualization. (arXiv:2207.12214v2 [cs.LG] UPDATED)
    Dimensionality reduction techniques aim at representing high-dimensional data in low-dimensional spaces to extract hidden and useful information or facilitate visual understanding and interpretation of the data. However, few of them take into consideration the potential cluster information contained implicitly in the high-dimensional data. In this paper, we propose LaptSNE, a new graph-layout nonlinear dimensionality reduction method based on t-SNE, one of the best techniques for visualizing high-dimensional data as 2D scatter plots. Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding when learning to preserve the local and global structure from high-dimensional space to low-dimensional space. It is nontrivial to solve the proposed model because the eigenvalues of normalized symmetric Laplacian are functions of the decision variable. We provide a majorization-minimization algorithm with convergence guarantee to solve the optimization problem of LaptSNE and show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective. We evaluate our method by a formal comparison with state-of-the-art methods on seven benchmark datasets, both visually and via established quantitative measurements. The results demonstrate the superiority of our method over baselines such as t-SNE and UMAP. We also provide out-of-sample extension, large-scale extension and mini-batch extension for our LaptSNE to facilitate dimensionality reduction in various scenarios.  ( 3 min )
    Optimal Connectivity through Network Gradients for the Restricted Boltzmann Machine. (arXiv:2209.06932v1 [cs.LG])
    Leveraging sparse networks to connect successive layers in deep neural networks has recently been shown to provide benefits to large scale state-of-the-art models. However, network connectivity also plays a significant role on the learning curves of shallow networks, such as the classic Restricted Boltzmann Machines (RBM). A fundamental problem is efficiently finding connectivity patterns that improve the learning curve. Recent principled approaches explicitly include network connections as parameters that must be optimized in the model, but often rely on continuous functions to represent connections and on explicit penalization. This work presents a method to find optimal connectivity patterns for RBMs based on the idea of network gradients: computing the gradient of every possible connection, given a specific connection pattern, and using the gradient to drive a continuous connection strength parameter that in turn is used to determine the connection pattern. Thus, learning RBM parameters and learning network connections is truly jointly performed, albeit with different learning rates, and without changes to the objective function. The method is applied to the MNIST data set showing that better RBM models are found for the benchmark tasks of sample generation and input classification.  ( 2 min )
    Active Learning Exploration of Transition Metal Complexes to Discover Method-Insensitive and Synthetically Accessible Chromophores. (arXiv:2208.05444v2 [physics.chem-ph] UPDATED)
    Transition metal chromophores with earth-abundant transition metals are an important design target for their applications in lighting and non-toxic bioimaging, but their design is challenged by the scarcity of complexes that simultaneously have optimal target absorption energies in the visible region as well as well-defined ground states. Machine learning (ML) accelerated discovery could overcome such challenges by enabling screening of a larger space, but is limited by the fidelity of the data used in ML model training, which is typically from a single approximate density functional. To address this limitation, we search for consensus in predictions among 23 density functional approximations across multiple rungs of Jacobs ladder. To accelerate the discovery of complexes with absorption energies in the visible region while minimizing MR character, we use 2D efficient global optimization to sample candidate low-spin chromophores from multi-million complex spaces. Despite the scarcity (i.e., approx. 0.01\%) of potential chromophores in this large chemical space, we identify candidates with high likelihood (i.e., > 10\%) of computational validation as the ML models improve during active learning, representing a 1,000-fold acceleration in discovery. Absorption spectra of promising chromophores from time-dependent density functional theory verify that 2/3 of candidates have the desired excited state properties. The observation that constituent ligands from our leads have demonstrated interesting optical properties in the literature exemplifies the effectiveness of our construction of a realistic design space and active learning approach.  ( 3 min )
    Fair Inference for Discrete Latent Variable Models. (arXiv:2209.07044v1 [cs.LG])
    It is now well understood that machine learning models, trained on data without due care, often exhibit unfair and discriminatory behavior against certain populations. Traditional algorithmic fairness research has mainly focused on supervised learning tasks, particularly classification. While fairness in unsupervised learning has received some attention, the literature has primarily addressed fair representation learning of continuous embeddings. In this paper, we conversely focus on unsupervised learning using probabilistic graphical models with discrete latent variables. We develop a fair stochastic variational inference technique for the discrete latent variables, which is accomplished by including a fairness penalty on the variational distribution that aims to respect the principles of intersectionality, a critical lens on fairness from the legal, social science, and humanities literature, and then optimizing the variational parameters under this penalty. We first show the utility of our method in improving equity and fairness for clustering using na\"ive Bayes and Gaussian mixture models on benchmark datasets. To demonstrate the generality of our approach and its potential for real-world impact, we then develop a special-purpose graphical model for criminal justice risk assessments, and use our fairness approach to prevent the inferences from encoding unfair societal biases.  ( 2 min )
    iFlipper: Label Flipping for Individual Fairness. (arXiv:2209.07047v1 [cs.LG])
    As machine learning becomes prevalent, mitigating any unfairness present in the training data becomes critical. Among the various notions of fairness, this paper focuses on the well-known individual fairness, which states that similar individuals should be treated similarly. While individual fairness can be improved when training a model (in-processing), we contend that fixing the data before model training (pre-processing) is a more fundamental solution. In particular, we show that label flipping is an effective pre-processing technique for improving individual fairness. Our system iFlipper solves the optimization problem of minimally flipping labels given a limit to the individual fairness violations, where a violation occurs when two similar examples in the training data have different labels. We first prove that the problem is NP-hard. We then propose an approximate linear programming algorithm and provide theoretical guarantees on how close its result is to the optimal solution in terms of the number of label flips. We also propose techniques for making the linear programming solution more optimal without exceeding the violations limit. Experiments on real datasets show that iFlipper significantly outperforms other pre-processing baselines in terms of individual fairness and accuracy on unseen test sets. In addition, iFlipper can be combined with in-processing techniques for even better results.  ( 2 min )
    Rho-Tau Bregman Information and the Geometry of Annealing Paths. (arXiv:2209.07481v1 [cs.LG])
    Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior work has constructed annealing paths using quasi-arithmetic means, and interpreted the resulting intermediate densities as minimizing an expected divergence to the endpoints. We provide a comprehensive analysis of this 'centroid' property using Bregman divergences under a monotonic embedding of the density function, thereby associating common divergences such as Amari's and Renyi's ${\alpha}$-divergences, ${(\alpha,\beta)}$-divergences, and the Jensen-Shannon divergence with intermediate densities along an annealing path. Our analysis highlights the interplay between parametric families, quasi-arithmetic means, and divergence functions using the rho-tau Bregman divergence framework of Zhang 2004;2013.
    On the Reuse Bias in Off-Policy Reinforcement Learning. (arXiv:2209.07074v1 [cs.LG])
    Importance sampling (IS) is a popular technique in off-policy evaluation, which re-weights the return of trajectories in the replay buffer to boost sample efficiency. However, training with IS can be unstable and previous attempts to address this issue mainly focus on analyzing the variance of IS. In this paper, we reveal that the instability is also related to a new notion of Reuse Bias of IS -- the bias in off-policy evaluation caused by the reuse of the replay buffer for evaluation and optimization. We theoretically show that the off-policy evaluation and optimization of the current policy with the data from the replay buffer result in an overestimation of the objective, which may cause an erroneous gradient update and degenerate the performance. We further provide a high-probability upper bound of the Reuse Bias, and show that controlling one term of the upper bound can control the Reuse Bias by introducing the concept of stability for off-policy algorithms. Based on these analyses, we finally present a novel Bias-Regularized Importance Sampling (BIRIS) framework along with practical algorithms, which can alleviate the negative impact of the Reuse Bias. Experimental results show that our BIRIS-based methods can significantly improve the sample efficiency on a series of continuous control tasks in MuJoCo.
    Socially Enhanced Situation Awareness from Microblogs using Artificial Intelligence: A Survey. (arXiv:2209.07272v1 [cs.LG])
    The rise of social media platforms provides an unbounded, infinitely rich source of aggregate knowledge of the world around us, both historic and real-time, from a human perspective. The greatest challenge we face is how to process and understand this raw and unstructured data, go beyond individual observations and see the "big picture"--the domain of Situation Awareness. We provide an extensive survey of Artificial Intelligence research, focusing on microblog social media data with applications to Situation Awareness, that gives the seminal work and state-of-the-art approaches across six thematic areas: Crime, Disasters, Finance, Physical Environment, Politics, and Health and Population. We provide a novel, unified methodological perspective, identify key results and challenges, and present ongoing research directions.
    Pick your Neighbor: Local Gauss-Southwell Rule for Fast Asynchronous Decentralized Optimization. (arXiv:2207.07543v2 [math.OC] UPDATED)
    In decentralized optimization environments, each agent $i$ in a network of $n$ nodes has its own private function $f_i$, and nodes communicate with their neighbors to cooperatively minimize the aggregate objective $\sum_{i=1}^n f_i$. In this setting, synchronizing the nodes' updates incurs significant communication overhead and computational costs, so much of the recent literature has focused on the analysis and design of asynchronous optimization algorithms, where agents activate and communicate at arbitrary times without needing a global synchronization enforcer. However, most works assume that when a node activates, it selects the neighbor to contact based on a fixed probability (e.g., uniformly at random), a choice that ignores the optimization landscape at the moment of activation. Instead, in this work we introduce an optimization-aware selection rule that chooses the neighbor providing the highest dual cost improvement (a quantity related to a dualization of the problem based on consensus). This scheme is related to the coordinate descent (CD) method with the Gauss-Southwell (GS) rule for coordinate updates; in our setting however, only a subset of coordinates is accessible at each iteration (because each node can communicate only with its neighbors), so the existing literature on GS methods does not apply. To overcome this difficulty, we develop a new analytical framework for smooth and strongly convex $f_i$ that covers the class of set-wise CD algorithms -- a class that directly applies to decentralized scenarios, but is not limited to them -- and we show that the proposed set-wise GS rule achieves a speedup factor of up to the maximum degree in the network (which is in the order of $\Theta(n)$ for highly connected graphs). The speedup predicted by our analysis is validated in numerical experiments with synthetic data.
    Number of Attention Heads vs Number of Transformer-Encoders in Computer Vision. (arXiv:2209.07221v1 [cs.CV])
    Determining an appropriate number of attention heads on one hand and the number of transformer-encoders, on the other hand, is an important choice for Computer Vision (CV) tasks using the Transformer architecture. Computing experiments confirmed the expectation that the total number of parameters has to satisfy the condition of overdetermination (i.e., number of constraints significantly exceeding the number of parameters). Then, good generalization performance can be expected. This sets the boundaries within which the number of heads and the number of transformers can be chosen. If the role of context in images to be classified can be assumed to be small, it is favorable to use multiple transformers with a low number of heads (such as one or two). In classifying objects whose class may heavily depend on the context within the image (i.e., the meaning of a patch being dependent on other patches), the number of heads is equally important as that of transformers.
    Omnipredictors for Constrained Optimization. (arXiv:2209.07463v1 [cs.LG])
    The notion of omnipredictors (Gopalan, Kalai, Reingold, Sharan and Wieder ITCS 2021), suggested a new paradigm for loss minimization. Rather than learning a predictor based on a known loss function, omnipredictors can easily be post-processed to minimize any one of a rich family of loss functions compared with the loss of a class $C$. It has been shown that such omnipredictors exist and are implied (for all convex and Lipschitz loss functions) by the notion of multicalibration from the algorithmic fairness literature. Nevertheless, it is often the case that the action selected must obey some additional constraints (such as capacity or parity constraints). In itself, the original notion of omnipredictors does not apply in this well-motivated and heavily studied the context of constrained loss minimization. In this paper, we introduce omnipredictors for constrained optimization and study their complexity and implications. The notion that we introduce allows the learner to be unaware of the loss function that will be later assigned as well as the constraints that will be later imposed, as long as the subpopulations that are used to define these constraints are known. The paper shows how to obtain omnipredictors for constrained optimization problems, relying on appropriate variants of multicalibration. For some interesting constraints and general loss functions and for general constraints and some interesting loss functions, we show how omnipredictors are implied by a variant of multicalibration that is similar in complexity to standard multicalibration. We demonstrate that in the general case, standard multicalibration is insufficient and show that omnipredictors are implied by multicalibration with respect to a class containing all the level sets of hypotheses in $C$. We also investigate the implications when the constraints are group fairness notions.
    A Closer Look at Prototype Classifier for Few-shot Image Classification. (arXiv:2110.05076v5 [cs.CV] UPDATED)
    The prototypical network is a prototype classifier based on meta-learning and is widely used for few-shot learning because it classifies unseen examples by constructing class-specific prototypes without adjusting hyper-parameters during meta-testing. Interestingly, recent research has attracted a lot of attention, showing that training a new linear classifier, which does not use a meta-learning algorithm, performs comparably with the prototypical network. However, the training of a new linear classifier requires the retraining of the classifier every time a new class appears. In this paper, we analyze how a prototype classifier works equally well without training a new linear classifier or meta-learning. We experimentally find that directly using the feature vectors, which is extracted by using standard pre-trained models to construct a prototype classifier in meta-testing, does not perform as well as the prototypical network and training new linear classifiers on the feature vectors of pre-trained models. Thus, we derive a novel generalization bound for a prototypical classifier and show that the transformation of a feature vector can improve the performance of prototype classifiers. We experimentally investigate several normalization methods for minimizing the derived bound and find that the same performance can be obtained by using the L2 normalization and minimizing the ratio of the within-class variance to the between-class variance without training a new classifier or meta-learning.
    Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration. (arXiv:2209.07376v1 [cs.LG])
    This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) with the $\epsilon$-greedy exploration under the online setting. This problem setting is motivated by the successful deep Q-networks (DQN) framework that falls in this regime. In this work, we provide an initial attempt on theoretical understanding deep RL from the perspective of function class and neural networks architectures (e.g., width and depth) beyond the "linear" regime. To be specific, we focus on the value based algorithm with the $\epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces, respectively, which aims at approximating an $\alpha$-smooth Q-function in a $d$-dimensional feature space. We prove that, with $T$ episodes, scaling the width $m = \widetilde{\mathcal{O}}(T^{\frac{d}{2\alpha + d}})$ and the depth $L=\mathcal{O}(\log T)$ of the neural network for deep RL is sufficient for learning with sublinear regret in Besov spaces. Moreover, for a two layer neural network endowed by the Barron space, scaling the width $\Omega(\sqrt{T})$ is sufficient. To achieve this, the key issue in our analysis is how to estimate the temporal difference error under deep neural function approximation as the $\epsilon$-greedy exploration is not enough to ensure "optimism". Our analysis reformulates the temporal difference error in an $L^2(\mathrm{d}\mu)$-integrable space over a certain averaged measure $\mu$, and transforms it to a generalization problem under the non-iid setting. This might have its own interest in RL theory for better understanding $\epsilon$-greedy exploration in deep RL.
    Hybrid Neural Network Augmented Physics-based Models for Nonlinear Filtering. (arXiv:2204.06471v2 [cs.LG] UPDATED)
    In this paper we present a hybrid neural network augmented physics-based modeling (APBM) framework for Bayesian nonlinear latent space estimation. The proposed APBM strategy allows for model adaptation when new operation conditions come into play or the physics-based model is insufficient (or incomplete) to properly describe the latent phenomenon. One advantage of the APBMs and our estimation procedure is the capability of maintaining the physical interpretability of estimated states. Furthermore, we propose a constraint filtering approach to control the neural network contributions to the overall model. We also exploit assumed density filtering techniques and cubature integration rules to present a flexible estimation strategy that can easily deal with nonlinear models and high-dimensional latent spaces. Finally, we demonstrate the efficacy of our methodology by leveraging a target tracking scenario with nonlinear and incomplete measurement and acceleration models, respectively.
    Upper bounds on the Natarajan dimensions of some function classes. (arXiv:2209.07015v1 [stat.ML])
    The Natarajan dimension is a fundamental tool for characterizing multi-class PAC learnability, generalizing the Vapnik-Chervonenkis (VC) dimension from binary to multi-class classification problems. This note establishes upper bounds on Natarajan dimensions for certain function classes, including (i) multi-class decision tree and random forests, and (ii) multi-class neural networks with binary, linear and ReLU activations. These results may be relevant for describing the performance of certain multi-class learning algorithms.
    Forecasting Evolution of Clusters in StarCraft II with Hebbian Learning. (arXiv:2209.06904v1 [cs.NE])
    Tactics in StarCraft II are closely related to group behavior of the game agents. In other words, human players in the game often group spatially near agents into a team and control the team to defeat opponents. In this light, clustering the agents in StarCraft II has been studied for various purposes such as the efficient control of the agents in multi-agent reinforcement learning and game analytic tools for the game users. However, these works do not aim to learn and predict dynamics of the clusters, limiting the applications to currently observed game status. In this paper, we present a hybrid AI model that couples unsupervised and self-supervised learning to forecast evolution of the clusters in StarCraft II. We develop an unsupervised Hebbian learning method in a set-to-cluster module to efficiently create a variable number of the clusters, and it also features lower inference time complexity than conventional k-means clustering. For the prediction task, a long short-term memory based prediction module is designed to recursively forecast state vectors generated by the set-to-cluster module. We observe the proposed model successfully predicts complex evolution of the clusters with regard to cluster centroids and their radii.
    Tangent Space and Dimension Estimation with the Wasserstein Distance. (arXiv:2110.06357v3 [math.ST] UPDATED)
    Consider a set of points sampled independently near a smooth compact submanifold of Euclidean space. We provide mathematically rigorous bounds on the number of sample points required to estimate both the dimension and the tangent spaces of that manifold with high confidence. The algorithm for this estimation is Local PCA, a local version of principal component analysis. Our results accommodate for noisy non-uniform data distribution with the noise that may vary across the manifold, and allow simultaneous estimation at multiple points. Crucially, all of the constants appearing in our bound are explicitly described. The proof uses a matrix concentration inequality to estimate covariance matrices and a Wasserstein distance bound for quantifying nonlinearity of the underlying manifold and non-uniformity of the probability measure.
    Modelling of physical systems with a Hopf bifurcation using mechanistic models and machine learning. (arXiv:2209.06910v1 [math.DS])
    We propose a new hybrid modelling approach that combines a mechanistic model with a machine-learnt model to predict the limit cycle oscillations of physical systems with a Hopf bifurcation. The mechanistic model is an ordinary differential equation normal-form model capturing the bifurcation structure of the system. A data-driven mapping from this model to the experimental observations is then identified based on experimental data using machine learning techniques. The proposed method is first demonstrated numerically on a Van der Pol oscillator and a three-degree-of-freedom aeroelastic model. It is then applied to model the behaviour of a physical aeroelastic structure exhibiting limit cycle oscillations during wind tunnel tests. The method is shown to be general, data-efficient and to offer good accuracy without any prior knowledge about the system other than its bifurcation structure.
    Earthquake Phase Association with Graph Neural Networks. (arXiv:2209.07086v1 [physics.geo-ph])
    Seismic phase association connects earthquake arrival time measurements to their causative sources. Effective association must determine the number of discrete events, their location and origin times, and it must differentiate real arrivals from measurement artifacts. The advent of deep learning pickers, which provide high rates of picks from closely overlapping small magnitude earthquakes, motivates revisiting the phase association problem and approaching it using the methods of deep learning. We have developed a Graph Neural Network associator that simultaneously predicts both source space-time localization, and discrete source-arrival association likelihoods. The method is applicable to arbitrary geometry, time-varying seismic networks of hundreds of stations, and is robust to high rates of sources and input picks with variable noise and quality. Our Graph Earthquake Neural Interpretation Engine (GENIE) uses one graph to represent the station set and another to represent the spatial source region. GENIE learns relationships from data in this combined representation that enable it to determine robust source and source-arrival associations. We train on synthetic data, and test our method on real data from the Northern California (NC) seismic network using input generated by the PhaseNet deep learning phase picker. We successfully re-detect ~96% of all events M>1 reported by the USGS during 500 random days between 2000$\unicode{x2013}$2022. Over a 100-day continuous interval of processing in 2017$\unicode{x2013}$2018, we detect ~4.2x the number of events reported by the USGS. Our new events have small magnitude estimates below the magnitude of completeness of the USGS catalog, and are located close to the active faults and quarries in the region. Our results demonstrate that GENIE can effectively solve the association problem under complex seismic monitoring conditions.
    Test-Time Training with Masked Autoencoders. (arXiv:2209.07522v1 [cs.CV])
    Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theoretically, we characterize this improvement in terms of the bias-variance trade-off.
    Widely Used and Fast De Novo Drug Design by a Protein Sequence-Based Reinforcement Learning Model. (arXiv:2209.07405v1 [q-bio.BM])
    De novo molecular design has facilitated the exploration of large chemical space to accelerate drug discovery. Structure-based de novo method can overcome the data scarcity of active ligands by incorporating drug-target interaction into deep generative architectures. However, these strategies are bottlenecked by the small fraction of experimentally determined protein or complex structures. In addition, the cost of molecular generation is computationally expensive due to 3D representations of both molecule and protein. Here, we demonstrate a widely used and fast protein sequence-based reinforcement learning (RL) model for drug discovery. In the generative model, one of the reward components, a binding affinity predictor, is based on 1D protein sequence and molecular SMILES. As a proof of concept, the RL model was utilized to design molecules for four targets. The generated compounds showed bioactivities by the validation of both QSAR and molecular docking with experimental 3D binding pockets. We also found that the performance of generated molecules depends on the selection of data source training for the binding predictor. Furthermore, drug design for a kinase without any experimental structure, CDK20, was studied by our model. With only 1D protein sequence as input, the generated novel compounds showed favorable binding affinity based on the AlphaFold predicted structure.
    Benchmarking Counterfactual Algorithms for XAI: From White Box to Black Box. (arXiv:2203.02399v2 [cs.LG] UPDATED)
    This study investigates the impact of machine learning models on the generation of counterfactual explanations by conducting a benchmark evaluation over three different types of models: decision-tree (fully transparent, interpretable, white-box model), a random forest (a semi-interpretable, grey-box model), and a neural network (a fully opaque, black-box model). We tested the counterfactual generation process using four algorithms (DiCE, WatcherCF, prototype, and GrowingSpheresCF) in the literature in five different datasets (COMPAS, Adult, German, Diabetes, and Breast Cancer). Our findings indicate that: (1) Different machine learning models have no impact on the generation of counterfactual explanations; (2) Counterfactual algorithms based uniquely on proximity loss functions are not actionable and will not provide meaningful explanations; (3) One cannot have meaningful evaluation results without guaranteeing plausibility in the counterfactual generation process. Algorithms that do not consider plausibility in their internal mechanisms will lead to biased and unreliable conclusions if evaluated with the current state-of-the-art metrics; (4) A qualitative analysis is strongly recommended (together with a quantitative analysis) to ensure a robust analysis of counterfactual explanations and the potential identification of biases.
    Efficient learning of nonlinear prediction models with time-series privileged information. (arXiv:2209.07067v1 [cs.LG])
    In domains where sample sizes are limited, efficient learning algorithms are critical. Learning using privileged information (LuPI) offers increased sample efficiency by allowing prediction models access to types of information at training time which is unavailable when the models are used. In recent work, it was shown that for prediction in linear-Gaussian dynamical systems, a LuPI learner with access to intermediate time series data is never worse and often better in expectation than any unbiased classical learner. We provide new insights into this analysis and generalize it to nonlinear prediction tasks in latent dynamical systems, extending theoretical guarantees to the case where the map connecting latent variables and observations is known up to a linear transform. In addition, we propose algorithms based on random features and representation learning for the case when this map is unknown. A suite of empirical results confirm theoretical findings and show the potential of using privileged time-series information in nonlinear prediction.
    CMSBERT-CLR: Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations. (arXiv:2209.07424v1 [cs.CL])
    Multimodal sentiment analysis has become an increasingly popular research area as the demand for multimodal online content is growing. For multimodal sentiment analysis, words can have different meanings depending on the linguistic context and non-verbal information, so it is crucial to understand the meaning of the words accordingly. In addition, the word meanings should be interpreted within the whole utterance context that includes nonverbal information. In this paper, we present a Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations (CMSBERT-CLR), which incorporates the whole context's non-verbal and verbal information and aligns modalities more effectively through contrastive learning. First, we introduce a Context-driven Modality Shifting (CMS) to incorporate the non-verbal and verbal information within the whole context of the sentence utterance. Then, for improving the alignment of different modalities within a common embedding space, we apply contrastive learning. Furthermore, we use an exponential moving average parameter and label smoothing as optimization strategies, which can make the convergence of the network more stable and increase the flexibility of the alignment. In our experiments, we demonstrate that our approach achieves state-of-the-art results.
    A Continual Development Methodology for Large-scale Multitask Dynamic ML Systems. (arXiv:2209.07326v1 [cs.LG])
    The traditional Machine Learning (ML) methodology requires to fragment the development and experimental process into disconnected iterations whose feedback is used to guide design or tuning choices. This methodology has multiple efficiency and scalability disadvantages, such as leading to spend significant resources into the creation of multiple trial models that do not contribute to the final solution.The presented work is based on the intuition that defining ML models as modular and extensible artefacts allows to introduce a novel ML development methodology enabling the integration of multiple design and evaluation iterations into the continuous enrichment of a single unbounded intelligent system. We define a novel method for the generation of dynamic multitask ML models as a sequence of extensions and generalizations. We first analyze the capabilities of the proposed method by using the standard ML empirical evaluation methodology. Finally, we propose a novel continuous development methodology that allows to dynamically extend a pre-existing multitask large-scale ML system while analyzing the properties of the proposed method extensions. This results in the generation of an ML model capable of jointly solving 124 image classification tasks achieving state of the art quality with improved size and compute cost.
    CheXRelNet: An Anatomy-Aware Model for Tracking Longitudinal Relationships between Chest X-Rays. (arXiv:2208.03873v2 [cs.CV] UPDATED)
    Despite the progress in utilizing deep learning to automate chest radiograph interpretation and disease diagnosis tasks, change between sequential Chest X-rays (CXRs) has received limited attention. Monitoring the progression of pathologies that are visualized through chest imaging poses several challenges in anatomical motion estimation and image registration, i.e., spatially aligning the two images and modeling temporal dynamics in change detection. In this work, we propose CheXRelNet, a neural model that can track longitudinal pathology change relations between two CXRs. CheXRelNet incorporates local and global visual features, utilizes inter-image and intra-image anatomical information, and learns dependencies between anatomical region attributes, to accurately predict disease change for a pair of CXRs. Experimental results on the Chest ImaGenome dataset show increased downstream performance compared to baselines. Code is available at https://github.com/PLAN-Lab/ChexRelNet
    Data Science Approach to predict the winning Fantasy Cricket Team Dream 11 Fantasy Sports. (arXiv:2209.06999v1 [cs.LG])
    The evolution of digital technology and the increasing popularity of sports inspired the innovators to take the experience of users with a proclivity towards sports to a whole new different level, by introducing Fantasy Sports Platforms FSPs. The application of Data Science and Analytics is Ubiquitous in the Modern World. Data Science and Analytics open doors to gain a deeper understanding and help in the decision making process. We firmly believed that we could adopt Data Science to predict the winning fantasy cricket team on the FSP, Dream 11. We built a predictive model that predicts the performance of players in a prospective game. We used a combination of Greedy and Knapsack Algorithms to prescribe the combination of 11 players to create a fantasy cricket team that has the most significant statistical odds of finishing as the strongest team thereby giving us a higher chance of winning the pot of bets on the Dream 11 FSP. We used PyCaret Python Library to help us understand and adopt the best Regressor Algorithm for our problem statement to make precise predictions. Further, we used Plotly Python Library to give us visual insights into the team, and players performances by accounting for the statistical, and subjective factors of a prospective game. The interactive plots help us to bolster the recommendations of our predictive model. You either win big, win small, or lose your bet based on the performance of the players selected for your fantasy team in the prospective game, and our model increases the probability of you winning big.
    Efficiency Ordering of Stochastic Gradient Descent. (arXiv:2209.07446v1 [cs.LG])
    We consider the stochastic gradient descent (SGD) algorithm driven by a general stochastic sequence, including i.i.d noise and random walk on an arbitrary graph, among others; and analyze it in the asymptotic sense. Specifically, we employ the notion of `efficiency ordering', a well-analyzed tool for comparing the performance of Markov Chain Monte Carlo (MCMC) samplers, for SGD algorithms in the form of Loewner ordering of covariance matrices associated with the scaled iterate errors in the long term. Using this ordering, we show that input sequences that are more efficient for MCMC sampling also lead to smaller covariance of the errors for SGD algorithms in the limit. This also suggests that an arbitrarily weighted MSE of SGD iterates in the limit becomes smaller when driven by more efficient chains. Our finding is of particular interest in applications such as decentralized optimization and swarm learning, where SGD is implemented in a random walk fashion on the underlying communication graph for cost issues and/or data privacy. We demonstrate how certain non-Markovian processes, for which typical mixing-time based non-asymptotic bounds are intractable, can outperform their Markovian counterparts in the sense of efficiency ordering for SGD. We show the utility of our method by applying it to gradient descent with shuffling and mini-batch gradient descent, reaffirming key results from existing literature under a unified framework. Empirically, we also observe efficiency ordering for variants of SGD such as accelerated SGD and Adam, open up the possibility of extending our notion of efficiency ordering to a broader family of stochastic optimization algorithms.
    On-Device Domain Generalization. (arXiv:2209.07521v1 [cs.CV])
    We present a systematic study of domain generalization (DG) for tiny neural networks, a problem that is critical to on-device machine learning applications but has been overlooked in the literature where research has been focused on large models only. Tiny neural networks have much fewer parameters and lower complexity, and thus should not be trained the same way as their large counterparts for DG applications. We find that knowledge distillation is a strong candidate for solving the problem: it outperforms state-of-the-art DG methods that were developed using large models with a large margin. Moreover, we observe that the teacher-student performance gap on test data with domain shift is bigger than that on in-distribution data. To improve DG for tiny neural networks without increasing the deployment cost, we propose a simple idea called out-of-distribution knowledge distillation (OKD), which aims to teach the student how the teacher handles (synthetic) out-of-distribution data and is proved to be a promising framework for solving the problem. We also contribute a scalable method of creating DG datasets, called DOmain Shift in COntext (DOSCO), which can be applied to broad data at scale without much human effort. Code and models are released at \url{https://github.com/KaiyangZhou/on-device-dg}.
    SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning. (arXiv:2207.03677v3 [cs.CV] UPDATED)
    Neural architecture search (NAS) has demonstrated amazing success in searching for efficient deep neural networks (DNNs) from a given supernet. In parallel, the lottery ticket hypothesis has shown that DNNs contain small subnetworks that can be trained from scratch to achieve a comparable or higher accuracy than original DNNs. As such, it is currently a common practice to develop efficient DNNs via a pipeline of first search and then prune. Nevertheless, doing so often requires a search-train-prune-retrain process and thus prohibitive computational cost. In this paper, we discover for the first time that both efficient DNNs and their lottery subnetworks (i.e., lottery tickets) can be directly identified from a supernet, which we term as SuperTickets, via a two-in-one training scheme with jointly architecture searching and parameter pruning. Moreover, we develop a progressive and unified SuperTickets identification strategy that allows the connectivity of subnetworks to change during supernet training, achieving better accuracy and efficiency trade-offs than conventional sparse training. Finally, we evaluate whether such identified SuperTickets drawn from one task can transfer well to other tasks, validating their potential of handling multiple tasks simultaneously. Extensive experiments and ablation studies on three tasks and four benchmark datasets validate that our proposed SuperTickets achieve boosted accuracy and efficiency trade-offs than both typical NAS and pruning pipelines, regardless of having retraining or not. Codes and pretrained models are available at https://github.com/RICE-EIC/SuperTickets.
    MDE for Machine Learning-Enabled Software Systems: A Case Study and Comparison of MontiAnna & ML-Quadrat. (arXiv:2209.07282v1 [cs.SE])
    In this paper, we propose to adopt the MDE paradigm for the development of Machine Learning (ML)-enabled software systems with a focus on the Internet of Things (IoT) domain. We illustrate how two state-of-the-art open-source modeling tools, namely MontiAnna and ML-Quadrat can be used for this purpose as demonstrated through a case study. The case study illustrates using ML, in particular deep Artificial Neural Networks (ANNs), for automated image recognition of handwritten digits using the MNIST reference dataset, and integrating the machine learning components into an IoT system. Subsequently, we conduct a functional comparison of the two frameworks, setting out an analysis base to include a broad range of design considerations, such as the problem domain, methods for the ML integration into larger systems, and supported ML methods, as well as topics of recent intense interest to the ML community, such as AutoML and MLOps. Accordingly, this paper is focused on elucidating the potential of the MDE approach in the ML domain. This supports the ML engineer in developing the (ML/software) model rather than implementing the code, and additionally enforces reusability and modularity of the design through enabling the out-of-the-box integration of ML functionality as a component of the IoT or cyber-physical systems.
    Adversarial Training for High-Stakes Reliability. (arXiv:2205.01663v3 [cs.LG] UPDATED)
    In the future, powerful AI systems may be deployed in high-stakes settings, where a single failure could be catastrophic. One technique for improving AI safety in high-stakes settings is adversarial training, which uses an adversary to generate examples to train on in order to achieve better worst-case performance. In this work, we used a language generation task as a testbed for achieving high reliability through adversarial training. We created a series of adversarial training techniques -- including a tool that assists human adversaries -- to find and eliminate failures in a classifier that filters text completions suggested by a generator. In our simple "avoid injuries" task, we determined that we can set very conservative classifier thresholds without significantly impacting the quality of the filtered outputs. With our chosen thresholds, filtering with our baseline classifier decreases the rate of unsafe completions from about 2.4% to 0.003% on in-distribution data, which is near the limit of our ability to measure. We found that adversarial training significantly increased robustness to the adversarial attacks that we trained on, without affecting in-distribution performance. We hope to see further work in the high-stakes reliability setting, including more powerful tools for enhancing human adversaries and better ways to measure high levels of reliability, until we can confidently rule out the possibility of catastrophic deployment-time failures of powerful models.
    Graph Neural Network Based Node Deployment for Throughput Enhancement. (arXiv:2209.06905v1 [cs.NI])
    The recent rapid growth in mobile data traffic entails a pressing demand for improving the throughput of the underlying wireless communication networks. Network node deployment has been considered as an effective approach for throughput enhancement which, however, often leads to highly non-trivial non-convex optimizations. Although convex approximation based solutions are considered in the literature, their approximation to the actual throughput may be loose and sometimes lead to unsatisfactory performance. With this consideration, in this paper, we propose a novel graph neural network (GNN) method for the network node deployment problem. Specifically, we fit a GNN to the network throughput and use the gradients of this GNN to iteratively update the locations of the network nodes. Besides, we show that an expressive GNN has the capacity to approximate both the function value and the gradients of a multivariate permutation-invariant function, as a theoretic support to the proposed method. To further improve the throughput, we also study a hybrid node deployment method based on this approach. To train the desired GNN, we adopt a policy gradient algorithm to create datasets containing good training samples. Numerical experiments show that the proposed methods produce competitive results compared to the baselines.
    Random initialisations performing above chance and how to find them. (arXiv:2209.07509v1 [cs.LG])
    Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al. recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after taking into account the permutation invariance of neural networks. Concretely, they hypothesise that any two solutions found by SGD can be permuted such that the linear interpolation between their parameters forms a path without significant increases in loss. Here, we use a simple but powerful algorithm to find such permutations that allows us to obtain direct empirical evidence that the hypothesis is true in fully connected networks. Strikingly, we find that two networks already live in the same loss valley at the time of initialisation and averaging their random, but suitably permuted initialisation performs significantly above chance. In contrast, for convolutional architectures, our evidence suggests that the hypothesis does not hold. Especially in a large learning rate regime, SGD seems to discover diverse modes.
    Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization. (arXiv:2209.07369v1 [cs.LG])
    We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Interestingly, we find that this requires new algorithmic ideas and approaches to adversarially robust learning. In particular, we show, in a strong negative sense, the suboptimality of the robust learner proposed by Montasser, Hanneke, and Srebro (2019) and a broader family of learners we identify as local learners. Our results are enabled by adopting a global perspective, specifically, through a key technical contribution: the global one-inclusion graph, which may be of independent interest, that generalizes the classical one-inclusion graph due to Haussler, Littlestone, and Warmuth (1994). Finally, as a byproduct, we identify a dimension characterizing qualitatively and quantitatively what classes of predictors $\mathcal{H}$ are robustly learnable. This resolves an open problem due to Montasser et al. (2019), and closes a (potentially) infinite gap between the established upper and lower bounds on the sample complexity of adversarially robust learning.
    Estimating Classification Confidence Using Kernel Densities. (arXiv:2207.06529v3 [stat.ML] UPDATED)
    This paper investigates the post-hoc calibration of confidence for "exploratory" machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the "one-versus-all" approach (top-label calibration) must be used rather than the "calibrate-the-full-response-matrix" approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel, bulletproof algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs) as well as the classic MNIST benchmark. Finally, our analysis argues that post-hoc calibration should always be performed, should be based only on the test dataset, and should be sanity-checked visually.
    Scalable Task-Driven Robotic Swarm Control via Collision Avoidance and Learning Mean-Field Control. (arXiv:2209.07420v1 [cs.RO])
    In recent years, reinforcement learning and its multi-agent analogue have achieved great success in solving various complex control problems. However, multi-agent reinforcement learning remains challenging both in its theoretical analysis and empirical design of algorithms, especially for large swarms of embodied robotic agents where a definitive toolchain remains part of active research. We use emerging state-of-the-art mean-field control techniques in order to convert many-agent swarm control into more classical single-agent control of distributions. This allows profiting from advances in single-agent reinforcement learning at the cost of assuming weak interaction between agents. As a result, the mean-field model is violated by the nature of real systems with embodied, physically colliding agents. Here, we combine collision avoidance and learning of mean-field control into a unified framework for tractably designing intelligent robotic swarm behavior. On the theoretical side, we provide novel approximation guarantees for both general mean-field control in continuous spaces and with collision avoidance. On the practical side, we show that our approach outperforms multi-agent reinforcement learning and allows for decentralized open-loop application while avoiding collisions, both in simulation and real UAV swarms. Overall, we propose a framework for the design of swarm behavior that is both mathematically well-founded and practically useful, enabling the solution of otherwise intractable swarm problems.
    Urban precipitation downscaling using deep learning: a smart city application over Austin, Texas, USA. (arXiv:2209.06848v1 [physics.ao-ph])
    Urban downscaling is a link to transfer the knowledge from coarser climate information to city scale assessments. These high-resolution assessments need multiyear climatology of past data and future projections, which are complex and computationally expensive to generate using traditional numerical weather prediction models. The city of Austin, Texas, USA has seen tremendous growth in the past decade. Systematic planning for the future requires the availability of fine resolution city-scale datasets. In this study, we demonstrate a novel approach generating a general purpose operator using deep learning to perform urban downscaling. The algorithm employs an iterative super-resolution convolutional neural network (Iterative SRCNN) over the city of Austin, Texas, USA. We show the development of a high-resolution gridded precipitation product (300 m) from a coarse (10 km) satellite-based product (JAXA GsMAP). High resolution gridded datasets of precipitation offer insights into the spatial distribution of heavy to low precipitation events in the past. The algorithm shows improvement in the mean peak-signal-to-noise-ratio and mutual information to generate high resolution gridded product of size 300 m X 300 m relative to the cubic interpolation baseline. Our results have implications for developing high-resolution gridded-precipitation urban datasets and the future planning of smart cities for other cities and other climatic variables.
    Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning. (arXiv:2209.05408v2 [cs.LG] UPDATED)
    We propose Deterministic Sequencing of Exploration and Exploitation (DSEE) algorithm with interleaving exploration and exploitation epochs for model-based RL problems that aim to simultaneously learn the system model, i.e., a Markov decision process (MDP), and the associated optimal policy. During exploration, DSEE explores the environment and updates the estimates for expected reward and transition probabilities. During exploitation, the latest estimates of the expected reward and transition probabilities are used to obtain a robust policy with high probability. We design the lengths of the exploration and exploitation epochs such that the cumulative regret grows as a sub-linear function of time.
    Multi-Task Mixture Density Graph Neural Networks for Predicting Cu-based Single-Atom Alloy Catalysts for CO2 Reduction Reaction. (arXiv:2209.07300v1 [cond-mat.mtrl-sci])
    Graph neural networks (GNNs) have drawn more and more attention from material scientists and demonstrated a high capacity to establish connections between the structure and properties. However, with only unrelaxed structures provided as input, few GNN models can predict the thermodynamic properties of relaxed configurations with an acceptable level of error. In this work, we develop a multi-task (MT) architecture based on DimeNet++ and mixture density networks to improve the performance of such task. Taking CO adsorption on Cu-based single-atom alloy catalysts as an illustration, we show that our method can reliably estimate CO adsorption energy with a mean absolute error of 0.087 eV from the initial CO adsorption structures without costly first-principles calculations. Further, compared to other state-of-the-art GNN methods, our model exhibits improved generalization ability when predicting catalytic performance of out-of-domain configurations, built with either unseen substrate surfaces or doping species. We show that the proposed MT GNN strategy can facilitate catalyst discovery.
    The Fragility of Optimized Bandit Algorithms. (arXiv:2109.13595v3 [cs.LG] UPDATED)
    Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the regret distribution of the associated algorithms necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for $p>1$, the $p$'th moment of the regret distribution grows much faster than poly-logarithmically, in particular as a power of the total number of arm plays. We show that optimized UCB bandit designs are also fragile in an additional sense, namely when the problem is even slightly mis-specified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas, and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays, thereby causing the algorithm to believe that the arm is sub-optimal. To alleviate the fragility issues exposed, we show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification. In doing so, we also provide a sharp trade-off between the amount of UCB exploration and the tail exponent of the resulting regret distribution.
    Carbon Footprint of Selecting and Training Deep Learning Models for Medical Image Analysis. (arXiv:2203.02202v2 [eess.IV] UPDATED)
    The increasing energy consumption and carbon footprint of deep learning (DL) due to growing compute requirements has become a cause of concern. In this work, we focus on the carbon footprint of developing DL models for medical image analysis (MIA), where volumetric images of high spatial resolution are handled. In this study, we present and compare the features of four tools from literature to quantify the carbon footprint of DL. Using one of these tools we estimate the carbon footprint of medical image segmentation pipelines. We choose nnU-net as the proxy for a medical image segmentation pipeline and experiment on three common datasets. With our work we hope to inform on the increasing energy costs incurred by MIA. We discuss simple strategies to cut-down the environmental impact that can make model selection and training processes more efficient.
    GAGA: Deciphering Age-path of Generalized Self-paced Regularizer. (arXiv:2209.07063v1 [cs.LG])
    Nowadays self-paced learning (SPL) is an important machine learning paradigm that mimics the cognitive process of humans and animals. The SPL regime involves a self-paced regularizer and a gradually increasing age parameter, which plays a key role in SPL but where to optimally terminate this process is still non-trivial to determine. A natural idea is to compute the solution path w.r.t. age parameter (i.e., age-path). However, current age-path algorithms are either limited to the simplest regularizer, or lack solid theoretical understanding as well as computational efficiency. To address this challenge, we propose a novel \underline{G}eneralized \underline{Ag}e-path \underline{A}lgorithm (GAGA) for SPL with various self-paced regularizers based on ordinary differential equations (ODEs) and sets control, which can learn the entire solution spectrum w.r.t. a range of age parameters. To the best of our knowledge, GAGA is the first exact path-following algorithm tackling the age-path for general self-paced regularizer. Finally the algorithmic steps of classic SVM and Lasso are described in detail. We demonstrate the performance of GAGA on real-world datasets, and find considerable speedup between our algorithm and competing baselines.
    Training Neural Networks in Single vs Double Precision. (arXiv:2209.07219v1 [cs.LG])
    The commitment to single-precision floating-point arithmetic is widespread in the deep learning community. To evaluate whether this commitment is justified, the influence of computing precision (single and double precision) on the optimization performance of the Conjugate Gradient (CG) method (a second-order optimization algorithm) and RMSprop (a first-order algorithm) has been investigated. Tests of neural networks with one to five fully connected hidden layers and moderate or strong nonlinearity with up to 4 million network parameters have been optimized for Mean Square Error (MSE). The training tasks have been set up so that their MSE minimum was known to be zero. Computing experiments have disclosed that single-precision can keep up (with superlinear convergence) with double-precision as long as line search finds an improvement. First-order methods such as RMSprop do not benefit from double precision. However, for moderately nonlinear tasks, CG is clearly superior. For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error as related to the output variance. CG with double floating-point precision is superior whenever the solutions have the potential to be useful for the application goal.
    A Unifying Framework for Online Optimization with Long-Term Constraints. (arXiv:2209.07454v1 [cs.LG])
    We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown stochastic model, and in the case in which they are selected at each round by an adversary. Our algorithm is the first to provide guarantees in the adversarial setting with respect to the optimal fixed strategy that satisfies the long-term constraints. In particular, it guarantees a $\rho/(1+\rho)$ fraction of the optimal reward and sublinear regret, where $\rho$ is a feasibility parameter related to the existence of strictly feasible solutions. Our framework employs traditional regret minimizers as black-box components. Therefore, by instantiating it with an appropriate choice of regret minimizers it can handle the full-feedback as well as the bandit-feedback setting. Moreover, it allows the decision maker to seamlessly handle scenarios with non-convex rewards and constraints. We show how our framework can be applied in the context of budget-management mechanisms for repeated auctions in order to guarantee long-term constraints that are not packing (e.g., ROI constraints).
    Layerwise Bregman Representation Learning with Applications to Knowledge Distillation. (arXiv:2209.07080v1 [cs.LG])
    In this work, we propose a novel approach for layerwise representation learning of a trained neural network. In particular, we form a Bregman divergence based on the layer's transfer function and construct an extension of the original Bregman PCA formulation by incorporating a mean vector and normalizing the principal directions with respect to the geometry of the local convex function around the mean. This generalization allows exporting the learned representation as a fixed layer with a non-linearity. As an application to knowledge distillation, we cast the learning problem for the student network as predicting the compression coefficients of the teacher's representations, which are passed as the input to the imported layer. Our empirical findings indicate that our approach is substantially more effective for transferring information between networks than typical teacher-student training using the teacher's penultimate layer representations and soft labels.
    Generalized Representations Learning for Time Series Classification. (arXiv:2209.07027v1 [cs.LG])
    Time series classification is an important problem in real world. Due to its non-stationary property that the distribution changes over time, it remains challenging to build models for generalization to unseen distributions. In this paper, we propose to view the time series classification problem from the distribution perspective. We argue that the temporal complexity attributes to the unknown latent distributions within. To this end, we propose DIVERSIFY to learn generalized representations for time series classification. DIVERSIFY takes an iterative process: it first obtains the worst-case distribution scenario via adversarial training, then matches the distributions of the obtained sub-domains. We also present some theoretical insights. We conduct experiments on gesture recognition, speech commands recognition, wearable stress and affect detection, and sensor-based human activity recognition with a total of seven datasets in different settings. Results demonstrate that DIVERSIFY significantly outperforms other baselines and effectively characterizes the latent distributions by qualitative and quantitative analysis.
    MIXRTs: Toward Interpretable Multi-Agent Reinforcement Learning via Mixing Recurrent Soft Decision Trees. (arXiv:2209.07225v1 [cs.LG])
    Multi-agent reinforcement learning (MARL) recently has achieved tremendous success in a wide range of fields. However, with a black-box neural network architecture, existing MARL methods make decisions in an opaque fashion that hinders humans from understanding the learned knowledge and how input observations influence decisions. Our solution is MIXing Recurrent soft decision Trees (MIXRTs), a novel interpretable architecture that can represent explicit decision processes via the root-to-leaf path of decision trees. We introduce a novel recurrent structure in soft decision trees to address partial observability, and estimate joint action values via linearly mixing outputs of recurrent trees based on local observations only. Theoretical analysis shows that MIXRTs guarantees the structural constraint with additivity and monotonicity in factorization. We evaluate MIXRTs on a range of challenging StarCraft II tasks. Experimental results show that our interpretable learning framework obtains competitive performance compared to widely investigated baselines, and delivers more straightforward explanations and domain knowledge of the decision processes.
    Efficient Quantized Sparse Matrix Operations on Tensor Cores. (arXiv:2209.06979v1 [cs.DC])
    The exponentially growing model size drives the continued success of deep learning, but it brings prohibitive computation and memory cost. From the algorithm perspective, model sparsification and quantization have been studied to alleviate the problem. From the architecture perspective, hardware vendors provide Tensor cores for acceleration. However, it is very challenging to gain practical speedups from sparse, low-precision matrix operations on Tensor cores, because of the strict requirements for data layout and lack of support for efficiently manipulating the low-precision integers. We propose Magicube, a high-performance sparse-matrix library for low-precision integers on Tensor cores. Magicube supports SpMM and SDDMM, two major sparse operations in deep learning with mixed precision. Experimental results on an NVIDIA A100 GPU show that Magicube achieves on average 1.44x (up to 2.37x) speedup over the vendor-optimized library for sparse kernels, and 1.43x speedup over the state-of-the-art with a comparable accuracy for end-to-end sparse Transformer inference.
    Robust Constrained Reinforcement Learning. (arXiv:2209.06866v1 [cs.LG])
    Constrained reinforcement learning is to maximize the expected reward subject to constraints on utilities/costs. However, the training environment may not be the same as the test one, due to, e.g., modeling error, adversarial attack, non-stationarity, resulting in severe performance degradation and more importantly constraint violation. We propose a framework of robust constrained reinforcement learning under model uncertainty, where the MDP is not fixed but lies in some uncertainty set, the goal is to guarantee that constraints on utilities/costs are satisfied for all MDPs in the uncertainty set, and to maximize the worst-case reward performance over the uncertainty set. We design a robust primal-dual approach, and further theoretically develop guarantee on its convergence, complexity and robust feasibility. We then investigate a concrete example of $\delta$-contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.
    Joint Debiased Representation and Image Clustering Learning with Self-Supervision. (arXiv:2209.06941v1 [cs.CV])
    Contrastive learning is among the most successful methods for visual representation learning, and its performance can be further improved by jointly performing clustering on the learned representations. However, existing methods for joint clustering and contrastive learning do not perform well on long-tailed data distributions, as majority classes overwhelm and distort the loss of minority classes, thus preventing meaningful representations to be learned. Motivated by this, we develop a novel joint clustering and contrastive learning framework by adapting the debiased contrastive loss to avoid under-clustering minority classes of imbalanced datasets. We show that our proposed modified debiased contrastive loss and divergence clustering loss improves the performance across multiple datasets and learning tasks. The source code is available at https://anonymous.4open.science/r/SSL-debiased-clustering
    Langevin Autoencoders for Learning Deep Latent Variable Models. (arXiv:2209.07036v1 [cs.LG])
    Markov chain Monte Carlo (MCMC), such as Langevin dynamics, is valid for approximating intractable distributions. However, its usage is limited in the context of deep latent variable models owing to costly datapoint-wise sampling iterations and slow convergence. This paper proposes the amortized Langevin dynamics (ALD), wherein datapoint-wise MCMC iterations are entirely replaced with updates of an encoder that maps observations into latent variables. This amortization enables efficient posterior sampling without datapoint-wise iterations. Despite its efficiency, we prove that ALD is valid as an MCMC algorithm, whose Markov chain has the target posterior as a stationary distribution under mild assumptions. Based on the ALD, we also present a new deep latent variable model named the Langevin autoencoder (LAE). Interestingly, the LAE can be implemented by slightly modifying the traditional autoencoder. Using multiple synthetic datasets, we first validate that ALD can properly obtain samples from target posteriors. We also evaluate the LAE on the image generation task, and show that our LAE can outperform existing methods based on variational inference, such as the variational autoencoder, and other MCMC-based methods in terms of the test likelihood.
    A Geometric Perspective on Variational Autoencoders. (arXiv:2209.07370v1 [stat.ML])
    This paper introduces a new interpretation of the Variational Autoencoder framework by taking a fully geometric point of view. We argue that vanilla VAE models unveil naturally a Riemannian structure in their latent space and that taking into consideration those geometrical aspects can lead to better interpolations and an improved generation procedure. This new proposed sampling method consists in sampling from the uniform distribution deriving intrinsically from the learned Riemannian latent space and we show that using this scheme can make a vanilla VAE competitive and even better than more advanced versions on several benchmark datasets. Since generative models are known to be sensitive to the number of training samples we also stress the method's robustness in the low data regime.
    FRANS: Automatic Feature Extraction for Time Series Forecasting. (arXiv:2209.07018v1 [cs.LG])
    Feature extraction methods help in dimensionality reduction and capture relevant information. In time series forecasting (TSF), features can be used as auxiliary information to achieve better accuracy. Traditionally, features used in TSF are handcrafted, which requires domain knowledge and significant data-engineering work. In this research, we first introduce a notion of static and dynamic features, which then enables us to develop our autonomous Feature Retrieving Autoregressive Network for Static features (FRANS) that does not require domain knowledge. The method is based on a CNN classifier that is trained to create for each series a collective and unique class representation either from parts of the series or, if class labels are available, from a set of series of the same class. It allows to discriminate series with similar behaviour but from different classes and makes the features extracted from the classifier to be maximally discriminatory. We explore the interpretability of our features, and evaluate the prediction capabilities of the method within the forecasting meta-learning environment FFORMA. Our results show that our features lead to improvement in accuracy in most situations. Once trained our approach creates features orders of magnitude faster than statistical methods.
    Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence. (arXiv:2202.12183v4 [cs.LG] UPDATED)
    NDCG, namely Normalized Discounted Cumulative Gain, is a widely used ranking metric in information retrieval and machine learning. However, efficient and provable stochastic methods for maximizing NDCG are still lacking, especially for deep models. In this paper, we propose a principled approach to optimize NDCG and its top-$K$ variant. First, we formulate a novel compositional optimization problem for optimizing the NDCG surrogate, and a novel bilevel compositional optimization problem for optimizing the top-$K$ NDCG surrogate. Then, we develop efficient stochastic algorithms with provable convergence guarantees for the non-convex objectives. Different from existing NDCG optimization methods, the per-iteration complexity of our algorithms scales with the mini-batch size instead of the number of total items. To improve the effectiveness for deep learning, we further propose practical strategies by using initial warm-up and stop gradient operator. Experimental results on multiple datasets demonstrate that our methods outperform prior ranking approaches in terms of NDCG. To the best of our knowledge, this is the first time that stochastic algorithms are proposed to optimize NDCG with a provable convergence guarantee. Our proposed methods are implemented in the LibAUC library at https://libauc.org/.
    Lossy Image Compression with Conditional Diffusion Models. (arXiv:2209.06950v1 [eess.IV])
    Diffusion models are a new class of generative models that mark a milestone in high-quality image generation while relying on solid probabilistic principles. This makes them promising candidate models for neural image compression. This paper outlines an end-to-end optimized framework based on a conditional diffusion model for image compression. Besides latent variables inherent to the diffusion process, the model introduces an additional per-instance "content" latent variable to condition the denoising process. Upon decoding, the diffusion process conditionally generates/reconstructs an image using ancestral sampling. Our experiments show that this approach outperforms one of the best-performing conventional image codecs (BPG) and one neural codec on two compression benchmarks, where we focus on rate-perception tradeoffs. Qualitatively, our approach shows fewer decompression artifacts than the classical approach.
    Self-Organizing Map Neural Network Algorithm for the Determination of Fracture Location in Solid-State Process joined Dissimilar Alloys. (arXiv:2209.07404v1 [cs.NE])
    The subject area known as computational neuroscience involves the investigation of brain function using mathematical techniques and theories. In order to comprehend how the brain processes information, it can also include various methods from signal processing, computer science, and physics. In the present work, for the first time a neurobiological based unsupervised machine learning algorithm i.e., Self-Organizing Map Neural Network is implemented for determining the fracture location in dissimilar friction stir welded AA5754-C11000 alloys. Too Shoulder Diameter (mm), Tool Rotational Speed (RPM), and Tool Traverse Speed (mm/min) are input parameters while the Fracture location i.e. whether the specimen fracture at Thermo-Mechanically Affected Zone (TMAZ) of copper or it fractures at TMAZ of Aluminium. The results showed that the implemented algorithm is able to predict the fracture location with 96.92% accuracy.
    Asymptotic Statistical Analysis of $f$-divergence GAN. (arXiv:2209.06853v1 [math.ST])
    Generative Adversarial Networks (GANs) have achieved great success in data generation. However, its statistical properties are not fully understood. In this paper, we consider the statistical behavior of the general $f$-divergence formulation of GAN, which includes the Kullback--Leibler divergence that is closely related to the maximum likelihood principle. We show that for parametric generative models that are correctly specified, all $f$-divergence GANs with the same discriminator classes are asymptotically equivalent under suitable regularity conditions. Moreover, with an appropriately chosen local discriminator, they become equivalent to the maximum likelihood estimate asymptotically. For generative models that are misspecified, GANs with different $f$-divergences {converge to different estimators}, and thus cannot be directly compared. However, it is shown that for some commonly used $f$-divergences, the original $f$-GAN is not optimal in that one can achieve a smaller asymptotic variance when the discriminator training in the original $f$-GAN formulation is replaced by logistic regression. The resulting estimation method is referred to as Adversarial Gradient Estimation (AGE). Empirical studies are provided to support the theory and to demonstrate the advantage of AGE over the original $f$-GANs under model misspecification.
    A Temporal Anomaly Detection System for Vehicles utilizing Functional Working Groups and Sensor Channels. (arXiv:2209.06828v1 [cs.LG])
    A modern vehicle fitted with sensors, actuators, and Electronic Control Units (ECUs) can be divided into several operational subsystems called Functional Working Groups (FWGs). Examples of these FWGs include the engine system, transmission, fuel system, brakes, etc. Each FWG has associated sensor-channels that gauge vehicular operating conditions. This data rich environment is conducive to the development of Predictive Maintenance (PdM) technologies. Undercutting various PdM technologies is the need for robust anomaly detection models that can identify events or observations which deviate significantly from the majority of the data and do not conform to a well defined notion of normal vehicular operational behavior. In this paper, we introduce the Vehicle Performance, Reliability, and Operations (VePRO) dataset and use it to create a multi-phased approach to anomaly detection. Utilizing Temporal Convolution Networks (TCN), our anomaly detection system can achieve 96% detection accuracy and accurately predicts 91% of true anomalies. The performance of our anomaly detection system improves when sensor channels from multiple FWGs are utilized.
    ProAPT: Projection of APT Threats with Deep Reinforcement Learning. (arXiv:2209.07215v1 [cs.CR])
    The highest level in the Endsley situation awareness model is called projection when the status of elements in the environment in the near future is predicted. In cybersecurity situation awareness, the projection for an Advanced Persistent Threat (APT) requires predicting the next step of the APT. The threats are constantly changing and becoming more complex. As supervised and unsupervised learning methods require APT datasets for projecting the next step of APTs, they are unable to identify unknown APT threats. In reinforcement learning methods, the agent interacts with the environment, and so it might project the next step of known and unknown APTs. So far, reinforcement learning has not been used to project the next step for APTs. In reinforcement learning, the agent uses the previous states and actions to approximate the best action of the current state. When the number of states and actions is abundant, the agent employs a neural network which is called deep learning to approximate the best action of each state. In this paper, we present a deep reinforcement learning system to project the next step of APTs. As there exists some relation between attack steps, we employ the Long- Short-Term Memory (LSTM) method to approximate the best action of each state. In our proposed system, based on the current situation, we project the next steps of APT threats.
    MRI-MECH: Mechanics-informed MRI to estimate esophageal health. (arXiv:2209.07492v1 [physics.med-ph])
    Dynamic magnetic resonance imaging (MRI) is a popular medical imaging technique to generate image sequences of the flow of a contrast material inside tissues and organs. However, its application to imaging bolus movement through the esophagus has only been demonstrated in few feasibility studies and is relatively unexplored. In this work, we present a computational framework called mechanics-informed MRI (MRI-MECH) that enhances that capability thereby increasing the applicability of dynamic MRI for diagnosing esophageal disorders. Pineapple juice was used as the swallowed contrast material for the dynamic MRI and the MRI image sequence was used as input to the MRI-MECH. The MRI-MECH modeled the esophagus as a flexible one-dimensional tube and the elastic tube walls followed a linear tube law. Flow through the esophagus was then governed by one-dimensional mass and momentum conservation equations. These equations were solved using a physics-informed neural network (PINN). The PINN minimized the difference between the measurements from the MRI and model predictions ensuring that the physics of the fluid flow problem was always followed. MRI-MECH calculated the fluid velocity and pressure during esophageal transit and estimated the mechanical health of the esophagus by calculating wall stiffness and active relaxation. Additionally, MRI-MECH predicted missing information about the lower esophageal sphincter during the emptying process, demonstrating its applicability to scenarios with missing data or poor image resolution. In addition to potentially improving clinical decisions based on quantitative estimates of the mechanical health of the esophagus, MRI-MECH can also be enhanced for application to other medical imaging modalities to enhance their functionality as well.
    Blind Equalization and Channel Estimation in Coherent Optical Communications Using Variational Autoencoders. (arXiv:2204.11776v2 [eess.SP] UPDATED)
    We investigate the potential of adaptive blind equalizers based on variational inference for carrier recovery in optical communications. These equalizers are based on a low-complexity approximation of maximum likelihood channel estimation. We generalize the concept of variational autoencoder (VAE) equalizers to higher order modulation formats encompassing probabilistic constellation shaping (PCS), ubiquitous in optical communications, oversampling at the receiver, and dual-polarization transmission. Besides black-box equalizers based on convolutional neural networks, we propose a model-based equalizer based on a linear butterfly filter and train the filter coefficients using the variational inference paradigm. As a byproduct, the VAE also provides a reliable channel estimation. We analyze the VAE in terms of performance and flexibility over a classical additive white Gaussian noise (AWGN) channel with inter-symbol interference (ISI) and over a dispersive linear optical dual-polarization channel. We show that it can extend the application range of blind adaptive equalizers by outperforming the state-of-the-art constant-modulus algorithm (CMA) for PCS for both fixed but also time-varying channels. The evaluation is accompanied with a hyperparameter analysis.
    DiP-GNN: Discriminative Pre-Training of Graph Neural Networks. (arXiv:2209.07499v1 [cs.LG])
    Graph neural network (GNN) pre-training methods have been proposed to enhance the power of GNNs. Specifically, a GNN is first pre-trained on a large-scale unlabeled graph and then fine-tuned on a separate small labeled graph for downstream applications, such as node classification. One popular pre-training method is to mask out a proportion of the edges, and a GNN is trained to recover them. However, such a generative method suffers from graph mismatch. That is, the masked graph inputted to the GNN deviates from the original graph. To alleviate this issue, we propose DiP-GNN (Discriminative Pre-training of Graph Neural Networks). Specifically, we train a generator to recover identities of the masked edges, and simultaneously, we train a discriminator to distinguish the generated edges from the original graph's edges. In our framework, the graph seen by the discriminator better matches the original graph because the generator can recover a proportion of the masked edges. Extensive experiments on large-scale homogeneous and heterogeneous graphs demonstrate the effectiveness of the proposed framework.
    Study of Drug Assimilation in Human System using Physics Informed Neural Networks. (arXiv:2110.05531v2 [q-bio.OT] UPDATED)
    Differential equations play a pivotal role in modern world ranging from science, engineering, ecology, economics and finance where these can be used to model many physical systems and processes. In this paper, we study two mathematical models of a drug assimilation in the human system using Physics Informed Neural Networks (PINNs). In the first model, we consider the case of single dose of drug in the human system and in the second case, we consider the course of this drug taken at regular intervals. We have used the compartment diagram to model these cases. The resulting differential equations are solved using PINN, where we employ a feed forward multilayer perceptron as function approximator and the network parameters are tuned for minimum error. Further, the network is trained by finding the gradient of the error function with respect to the network parameters. We have employed DeepXDE, a python library for PINNs, to solve the simultaneous first order differential equations describing the two models of drug assimilation. The results show high degree of accuracy between the exact solution and the predicted solution as much as the resulting error reaches10^(-11) for the first model and 10^(-8) for the second model. This validates the use of PINN in solving any dynamical system.
    Can Pre-trained Models Really Learn Better Molecular Representations for AI-aided Drug Discovery?. (arXiv:2209.07423v1 [q-bio.BM])
    Self-supervised pre-training is gaining increasingly more popularity in AI-aided drug discovery, leading to more and more pre-trained models with the promise that they can extract better feature representations for molecules. Yet, the quality of learned representations have not been fully explored. In this work, inspired by the two phenomena of Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative Structure-Activity Relationship (QSAR) analysis, we propose a method named Representation-Property Relationship Analysis (RePRA) to evaluate the quality of the representations extracted by the pre-trained model and visualize the relationship between the representations and properties. The concepts of ACs and SH are generalized from the structure-activity context to the representation-property context, and the underlying principles of RePRA are analyzed theoretically. Two scores are designed to measure the generalized ACs and SH detected by RePRA, and therefore the quality of representations can be evaluated. In experiments, representations of molecules from 10 target tasks generated by 7 pre-trained models are analyzed. The results indicate that the state-of-the-art pre-trained models can overcome some shortcomings of canonical Extended-Connectivity FingerPrints (ECFP), while the correlation between the basis of the representation space and specific molecular substructures are not explicit. Thus, some representations could be even worse than the canonical fingerprints. Our method enables researchers to evaluate the quality of molecular representations generated by their proposed self-supervised pre-trained models. And our findings can guide the community to develop better pre-training techniques to regularize the occurrence of ACs and SH.
    Do Residual Neural Networks discretize Neural Ordinary Differential Equations?. (arXiv:2205.14612v2 [cs.LG] UPDATED)
    Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify the distance between the ResNet's hidden state trajectory and the solution of its corresponding Neural ODE. Our bound is tight and, on the negative side, does not go to 0 with depth N if the residual functions are not smooth with depth. On the positive side, we show that this smoothness is preserved by gradient descent for a ResNet with linear residual functions and small enough initial loss. It ensures an implicit regularization towards a limit Neural ODE at rate 1 over N, uniformly with depth and optimization time. As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input. We then show that Heun's method, a second order ODE integration scheme, allows for better gradient estimation with the adjoint method when the residual functions are smooth with depth. We experimentally validate that our adjoint method succeeds at large depth, and that Heun method needs fewer layers to succeed. We finally use the adjoint method successfully for fine-tuning very deep ResNets without memory consumption in the residual layers.
    Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits. (arXiv:2209.06983v1 [stat.ML])
    We propose a novel contextual bandit algorithm for generalized linear rewards with an $\tilde{O}(\sqrt{\kappa^{-1} \phi T})$ regret over $T$ rounds where $\phi$ is the minimum eigenvalue of the covariance of contexts and $\kappa$ is a lower bound of the variance of rewards. In several practical cases where $\phi=O(d)$, our result is the first regret bound for generalized linear model (GLM) bandits with the order $\sqrt{d}$ without relying on the approach of Auer [2002]. We achieve this bound using a novel estimator called double doubly-robust (DDR) estimator, a subclass of doubly-robust (DR) estimator but with a tighter error bound. The approach of Auer [2002] achieves independence by discarding the observed rewards, whereas our algorithm achieves independence considering all contexts using our DDR estimator. We also provide an $O(\kappa^{-1} \phi \log (NT) \log T)$ regret bound for $N$ arms under a probabilistic margin condition. Regret bounds under the margin condition are given by Bastani and Bayati [2020] and Bastani et al. [2021] under the setting that contexts are common to all arms but coefficients are arm-specific. When contexts are different for all arms but coefficients are common, ours is the first regret bound under the margin condition for linear models or GLMs. We conduct empirical studies using synthetic data and real examples, demonstrating the effectiveness of our algorithm.
    M^4I: Multi-modal Models Membership Inference. (arXiv:2209.06997v1 [cs.LG])
    With the development of machine learning techniques, the attention of research has been moved from single-modal learning to multi-modal learning, as real-world data exist in the form of different modalities. However, multi-modal models often carry more information than single-modal models and they are usually applied in sensitive scenarios, such as medical report generation or disease identification. Compared with the existing membership inference against machine learning classifiers, we focus on the problem that the input and output of the multi-modal models are in different modalities, such as image captioning. This work studies the privacy leakage of multi-modal models through the lens of membership inference attack, a process of determining whether a data record involves in the model training process or not. To achieve this, we propose Multi-modal Models Membership Inference (M^4I) with two attack methods to infer the membership status, named metric-based (MB) M^4I and feature-based (FB) M^4I, respectively. More specifically, MB M^4I adopts similarity metrics while attacking to infer target data membership. FB M^4I uses a pre-trained shadow multi-modal feature extractor to achieve the purpose of data inference attack by comparing the similarities from extracted input and output features. Extensive experimental results show that both attack methods can achieve strong performances. Respectively, 72.5% and 94.83% of attack success rates on average can be obtained under unrestricted scenarios. Moreover, we evaluate multiple defense mechanisms against our attacks. The source code of M^4I attacks is publicly available at https://github.com/MultimodalMI/Multimodal-membership-inference.git.
    CLIPping Privacy: Identity Inference Attacks on Multi-Modal Machine Learning Models. (arXiv:2209.07341v1 [cs.LG])
    As deep learning is now used in many real-world applications, research has focused increasingly on the privacy of deep learning models and how to prevent attackers from obtaining sensitive information about the training data. However, image-text models like CLIP have not yet been looked at in the context of privacy attacks. While membership inference attacks aim to tell whether a specific data point was used for training, we introduce a new type of privacy attack, named identity inference attack (IDIA), designed for multi-modal image-text models like CLIP. Using IDIAs, an attacker can reveal whether a particular person, was part of the training data by querying the model in a black-box fashion with different images of the same person. Letting the model choose from a wide variety of possible text labels, the attacker can probe the model whether it recognizes the person and, therefore, was used for training. Through several experiments on CLIP, we show that the attacker can identify individuals used for training with very high accuracy and that the model learns to connect the names with the depicted people. Our experiments show that a multi-modal image-text model indeed leaks sensitive information about its training data and, therefore, should be handled with care.
    Statistical monitoring of models based on artificial intelligence. (arXiv:2209.07436v1 [stat.ME])
    The rapid advancement of models based on artificial intelligence demands innovative monitoring techniques which can operate in real time with low computational costs. In machine learning, especially if we consider neural network (NN) learning algorithms, and in particular deep-learning architectures, the models are often trained in a supervised manner. Consequently, the learned relationship between the input and the output must remain valid during the model's deployment. If this stationarity assumption holds, we can conclude that the NN generates accurate predictions. Otherwise, the retraining or rebuilding of the model is required. We propose to consider the latent feature representation of the data (called "embedding") generated by the NN for determining the time point when the data stream starts being nonstationary. To be precise, we monitor embeddings by applying multivariate control charts based on the calculation of the data depth and normalized ranks. The performance of the introduced method is evaluated using various NNs with different underlying data formats.
    NU-net: An Unpretentious Nested U-net for Breast Tumor Segmentation. (arXiv:2209.07193v1 [eess.IV])
    Breast tumor segmentation is one of the key steps that helps us characterize and localize tumor regions. However, variable tumor morphology, blurred boundary, and similar intensity distributions bring challenges for accurate segmentation of breast tumors. Recently, many U-net variants have been proposed and widely used for breast tumors segmentation. However, these architectures suffer from two limitations: (1) Ignoring the characterize ability of the benchmark networks, and (2) Introducing extra complex operations increases the difficulty of understanding and reproducing the network. To alleviate these challenges, this paper proposes a simple yet powerful nested U-net (NU-net) for accurate segmentation of breast tumors. The key idea is to utilize U-Nets with different depths and shared weights to achieve robust characterization of breast tumors. NU-net mainly has the following advantages: (1) Improving network adaptability and robustness to breast tumors with different scales, (2) This method is easy to reproduce and execute, and (3) The extra operations increase network parameters without significantly increasing computational cost. Extensive experimental results with twelve state-of-the-art segmentation methods on three public breast ultrasound datasets demonstrate that NU-net has more competitive segmentation performance on breast tumors. Furthermore, the robustness of NU-net is further illustrated on the segmentation of renal ultrasound images. The source code is publicly available on https://github.com/CGPzy/NU-net.
    Content-Context Factorized Representations for Automated Speech Recognition. (arXiv:2205.09872v2 [eess.AS] UPDATED)
    Deep neural networks have largely demonstrated their ability to perform automated speech recognition (ASR) by extracting meaningful features from input audio frames. Such features, however, may consist not only of information about the spoken language content, but also may contain information about unnecessary contexts such as background noise and sounds or speaker identity, accent, or protected attributes. Such information can directly harm generalization performance, by introducing spurious correlations between the spoken words and the context in which such words were spoken. In this work, we introduce an unsupervised, encoder-agnostic method for factoring speech-encoder representations into explicit content-encoding representations and spurious context-encoding representations. By doing so, we demonstrate improved performance on standard ASR benchmarks, as well as improved performance in both real-world and artificially noisy ASR scenarios.
    Neural-iLQR: A Learning-Aided Shooting Method for Trajectory Optimization. (arXiv:2011.10737v3 [cs.LG] UPDATED)
    Iterative linear quadratic regulator (iLQR) has gained wide popularity in addressing trajectory optimization problems with nonlinear system models. However, as a model-based shooting method, it relies heavily on an accurate system model to update the optimal control actions and the trajectory determined with forward integration, thus becoming vulnerable to inevitable model inaccuracies. Recently, substantial research efforts in learning-based methods for optimal control problems have been progressing significantly in addressing unknown system models, particularly when the system has complex interactions with the environment. Yet a deep neural network is normally required to fit substantial scale of sampling data. In this work, we present Neural-iLQR, a learning-aided shooting method over the unconstrained control space, in which a neural network with a simple structure is used to represent the local system model. In this framework, the trajectory optimization task is achieved with simultaneous refinement of the optimal policy and the neural network iteratively, without relying on the prior knowledge of the system model. Through comprehensive evaluations on two illustrative control tasks, the proposed method is shown to outperform the conventional iLQR significantly in the presence of inaccuracies in system models.
    Low-rank Optimal Transport: Approximation, Statistics and Debiasing. (arXiv:2205.12365v2 [stat.ML] UPDATED)
    The matching principles behind optimal transport (OT) play an increasingly important role in machine learning, a trend which can be observed when OT is used to disambiguate datasets in applications (e.g. single-cell genomics) or used to improve more complex methods (e.g. balanced attention in transformers or self-supervised learning). To scale to more challenging problems, there is a growing consensus that OT requires solvers that can operate on millions, not thousands, of points. The low-rank optimal transport (LOT) approach advocated in \cite{scetbon2021lowrank} holds several promises in that regard, and was shown to complement more established entropic regularization approaches, being able to insert itself in more complex pipelines, such as quadratic OT. LOT restricts the search for low-cost couplings to those that have a low-nonnegative rank, yielding linear time algorithms in cases of interest. However, these promises can only be fulfilled if the LOT approach is seen as a legitimate contender to entropic regularization when compared on properties of interest, where the scorecard typically includes theoretical properties (statistical complexity and relation to other methods) or practical aspects (debiasing, hyperparameter tuning, initialization). We target each of these areas in this paper in order to cement the impact of low-rank approaches in computational OT.
    Bi-level Physics-Informed Neural Networks for PDE Constrained Optimization using Broyden's Hypergradients. (arXiv:2209.07075v1 [cs.LG])
    Deep learning based approaches like Physics-informed neural networks (PINNs) and DeepONets have shown promise on solving PDE constrained optimization (PDECO) problems. However, existing methods are insufficient to handle those PDE constraints that have a complicated or nonlinear dependency on optimization targets. In this paper, we present a novel bi-level optimization framework to resolve the challenge by decoupling the optimization of the targets and constraints. For the inner loop optimization, we adopt PINNs to solve the PDE constraints only. For the outer loop, we design a novel method by using Broyden's method based on the Implicit Function Theorem (IFT), which is efficient and accurate for approximating hypergradients. We further present theoretical explanations and error analysis of the hypergradients computation. Extensive experiments on multiple large-scale and nonlinear PDE constrained optimization problems demonstrate that our method achieves state-of-the-art results compared with strong baselines.
    Out of One, Many: Using Language Models to Simulate Human Samples. (arXiv:2209.06899v1 [cs.LG])
    We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the "algorithmic bias" within one such tool -- the GPT-3 language model -- is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property "algorithmic fidelity" and explore its extent in GPT-3. We create "silicon samples" by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.  ( 2 min )
    Public Reaction to Scientific Research via Twitter Sentiment Prediction. (arXiv:2209.07333v1 [cs.IR])
    Social media users share their ideas, thoughts, and emotions with other users. However, it is not clear how online users would respond to new research outcomes. This study aims to predict the nature of the emotions expressed by Twitter users toward scientific publications. Additionally, we investigate what features of the research articles help in such prediction. Identifying the sentiments of research articles on social media will help scientists gauge a new societal impact of their research articles.
    Flexible Diffusion Modeling of Long Videos. (arXiv:2205.11495v2 [cs.CV] UPDATED)
    We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA self-driving car simulator.
    Collaborative Learning for Cyberattack Detection in Blockchain Networks. (arXiv:2203.11076v2 [cs.CR] UPDATED)
    This article aims to study intrusion attacks and then develop a novel cyberattack detection framework for blockchain networks. Specifically, we first design and implement a blockchain network in our laboratory. This blockchain network will serve two purposes, i.e., generate the real traffic data (including both normal data and attack data) for our learning models and implement real-time experiments to evaluate the performance of our proposed intrusion detection framework. To the best of our knowledge, this is the first dataset that is synthesized in a laboratory for cyberattacks in a blockchain network. We then propose a novel collaborative learning model that allows efficient deployment in the blockchain network to detect attacks. The main idea of the proposed learning model is to enable blockchain nodes to actively collect data, share the knowledge learned from its data, and then exchange the knowledge with other blockchain nodes in the network. In this way, we can not only leverage the knowledge from all the nodes in the network but also do not need to gather all raw data for training at a centralized node like conventional centralized learning solutions. Such a framework can also avoid the risk of exposing local data's privacy as well as the excessive network overhead/congestion. Both intensive simulations and real-time experiments clearly show that our proposed collaborative learning-based intrusion detection framework can achieve an accuracy of up to 97.7% in detecting attacks.
    Time Series Prediction using Deep Learning Methods in Healthcare. (arXiv:2108.13461v3 [cs.LG] UPDATED)
    Traditional machine learning methods face two main challenges in dealing with healthcare predictive analytics tasks. First, the high-dimensional nature of healthcare data needs labor-intensive and time-consuming processes to select an appropriate set of features for each new task. Second, these methods depend on feature engineering to capture the sequential nature of patient data, which may not adequately leverage the temporal patterns of the medical events and their dependencies. Recent deep learning methods have shown promising performance for various healthcare prediction tasks by addressing the high-dimensional and temporal challenges of medical data. These methods can learn useful representations of key factors (e.g., medical concepts or patients) and their interactions from high-dimensional raw or minimally-processed healthcare data. In this paper we systematically reviewed studies focused on advancing and using deep neural networks to leverage patients structured time series data for healthcare prediction tasks. To identify relevant studies, MEDLINE, IEEE, Scopus and ACM digital library were searched for studies published up to February 7th 2021. We found that researchers have contributed to deep time series prediction literature in ten research streams: deep learning models, missing value handling, irregularity handling, patient representation, static data inclusion, attention mechanisms, interpretation, incorporating medical ontologies, learning strategies, and scalability. This study summarizes research insights from these literature streams, identifies several critical research gaps, and suggests future research opportunities for deep learning in patient time series data.
    Learning to Reason Deductively: Math Word Problem Solving as Complex Relation Extraction. (arXiv:2203.10316v4 [cs.CL] UPDATED)
    Solving math word problems requires deductive reasoning over the quantities in the text. Various recent research efforts mostly relied on sequence-to-sequence or sequence-to-tree models to generate mathematical expressions without explicitly performing relational reasoning between quantities in the given context. While empirically effective, such approaches typically do not provide explanations for the generated expressions. In this work, we view the task as a complex relation extraction problem, proposing a novel approach that presents explainable deductive reasoning steps to iteratively construct target expressions, where each step involves a primitive operation over two quantities defining their relation. Through extensive experiments on four benchmark datasets, we show that the proposed model significantly outperforms existing strong baselines. We further demonstrate that the deductive procedure not only presents more explainable steps but also enables us to make more accurate predictions on questions that require more complex reasoning.  ( 2 min )
    Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries. (arXiv:2209.06931v1 [cs.LG])
    The widespread adoption of deep neural networks in computer vision applications has brought forth a significant interest in adversarial robustness. Existing research has shown that maliciously perturbed inputs specifically tailored for a given model (i.e., adversarial examples) can be successfully transferred to another independently trained model to induce prediction errors. Moreover, this property of adversarial examples has been attributed to features derived from predictive patterns in the data distribution. Thus, we are motivated to investigate the following question: Can adversarial defenses, like adversarial examples, be successfully transferred to other independently trained models? To this end, we propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE). After examining theoretical motivation and implications, we experimentally show that our method can provide adversarial robustness to multiple independently pre-trained classifiers that are otherwise ineffective against an adaptive white box adversary. Furthermore, we show that RTFEs can even provide one-shot adversarial robustness to models independently trained on different datasets.  ( 2 min )
    Landmark-free Statistical Shape Modeling via Neural Flow Deformations. (arXiv:2209.06861v1 [cs.CV])
    Statistical shape modeling aims at capturing shape variations of an anatomical structure that occur within a given population. Shape models are employed in many tasks, such as shape reconstruction and image segmentation, but also shape generation and classification. Existing shape priors either require dense correspondence between training examples or lack robustness and topological guarantees. We present FlowSSM, a novel shape modeling approach that learns shape variability without requiring dense correspondence between training instances. It relies on a hierarchy of continuous deformation flows, which are parametrized by a neural network. Our model outperforms state-of-the-art methods in providing an expressive and robust shape prior for distal femur and liver. We show that the emerging latent representation is discriminative by separating healthy from pathological shapes. Ultimately, we demonstrate its effectiveness on two shape reconstruction tasks from partial data. Our source code is publicly available (https://github.com/davecasp/flowssm).
    Stochastic Tree Ensembles for Estimating Heterogeneous Effects. (arXiv:2209.06998v1 [stat.ML])
    Determining subgroups that respond especially well (or poorly) to specific interventions (medical or policy) requires new supervised learning methods tailored specifically for causal inference. Bayesian Causal Forest (BCF) is a recent method that has been documented to perform well on data generating processes with strong confounding of the sort that is plausible in many applications. This paper develops a novel algorithm for fitting the BCF model, which is more efficient than the previously available Gibbs sampler. The new algorithm can be used to initialize independent chains of the existing Gibbs sampler leading to better posterior exploration and coverage of the associated interval estimates in simulation studies. The new algorithm is compared to related approaches via simulation studies as well as an empirical analysis.
    Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model. (arXiv:2005.12900v5 [cs.LG] UPDATED)
    This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider $\gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $\mathcal{S}$ and action space $\mathcal{A}$. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, all prior results suffer from a severe sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^2}$. The current paper overcomes this barrier by certifying the minimax optimality of two algorithms -- a perturbed model-based algorithm and a conservative model-based algorithm -- as soon as the sample size exceeds the order of $\frac{|\mathcal{S}||\mathcal{A}|}{1-\gamma}$ (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs, and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).  ( 3 min )
    Sound and Complete Verification of Polynomial Networks. (arXiv:2209.07235v1 [cs.LG])
    Polynomial Networks (PNs) have demonstrated promising performance on face and image recognition recently. However, robustness of PNs is unclear and thus obtaining certificates becomes imperative for enabling their adoption in real-world applications. Existing verification algorithms on ReLU neural networks (NNs) based on branch and bound (BaB) techniques cannot be trivially applied to PN verification. In this work, we devise a new bounding method, equipped with BaB for global convergence guarantees, called VPN. One key insight is that we obtain much tighter bounds than the interval bound propagation baseline. This enables sound and complete PN verification with empirical validation on MNIST, CIFAR10 and STL10 datasets. We believe our method has its own interest to NN verification.  ( 2 min )
    How Much Does It Cost to Train a Machine Learning Model over Distributed Data Sources?. (arXiv:2209.07124v1 [cs.LG])
    Federated learning (FL) is one of the most appealing alternatives to the standard centralized learning paradigm, allowing heterogeneous set of devices to train a machine learning model without sharing their raw data. However, FL requires a central server to coordinate the learning process, thus introducing potential scalability and security issues. In the literature, server-less FL approaches like gossip federated learning (GFL) and blockchain-enabled federated learning (BFL) have been proposed to mitigate these issues. In this work, we propose a complete overview of these three techniques proposing a comparison according to an integral set of performance indicators, including model accuracy, time complexity, communication overhead, convergence time and energy consumption. An extensive simulation campaign permits to draw a quantitative analysis. In particular, GFL is able to save the 18% of training time, the 68% of energy and the 51% of data to be shared with respect to the CFL solution, but it is not able to reach the level of accuracy of CFL. On the other hand, BFL represents a viable solution for implementing decentralized learning with a higher level of security, at the cost of an extra energy usage and data sharing. Finally, we identify open issues on the two decentralized federated learning implementations and provide insights on potential extensions and possible research directions on this new research field.  ( 3 min )
    Generalization Properties of NAS under Activation and Skip Connection Search. (arXiv:2209.07238v1 [cs.LG])
    Neural Architecture Search (NAS) has fostered the automatic discovery of neural architectures, which achieve state-of-the-art accuracy in image recognition. Despite the progress achieved with NAS, so far there is little attention to theoretical guarantees on NAS. In this work, we study the generalization properties of NAS under a unifying framework enabling (deep) layer skip connection search and activation function search. To this end, we derive the lower (and upper) bounds of the minimum eigenvalue of Neural Tangent Kernel under the (in)finite width regime from a search space including mixed activation functions, fully connected, and residual neural networks. Our analysis is non-trivial due to the coupling of various architectures and activation functions under the unifying framework. Then, we leverage the eigenvalue bounds to establish generalization error bounds of NAS in the stochastic gradient descent training. Importantly, we theoretically and experimentally show how the derived results can guide NAS to select the top-performing architectures, even in the case without training, leading to a training-free algorithm based on our theory. Accordingly, our numerical validation shed light on the design of computationally efficient methods for NAS.  ( 2 min )
    Risk-aware linear bandits with convex loss. (arXiv:2209.07154v1 [stat.ML])
    In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such as mean-variance or conditional value-at-risk (CVaR), can be of interest for critical applications (healthcare, agriculture). Algorithms have been proposed for such risk-aware measures under bandit feedback without contextual information. In this work, we study contextual bandits where such risk measures can be elicited as linear functions of the contexts through the minimization of a convex loss. A typical example that fits within this framework is the expectile measure, which is obtained as the solution of an asymmetric least-square problem. Using the method of mixtures for supermartingales, we derive confidence sequences for the estimation of such risk measures. We then propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits. This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent, at the cost of slightly higher regret. We conclude by evaluating the resulting algorithms on numerical experiments.  ( 2 min )
    Weakly Supervised Invariant Representation Learning Via Disentangling Known and Unknown Nuisance Factors. (arXiv:2209.06827v1 [cs.LG])
    Disentangled and invariant representations are two critical goals of representation learning and many approaches have been proposed to achieve either one of them. However, those two goals are actually complementary to each other so that we propose a framework to accomplish both of them simultaneously. We introduce a weakly supervised signal to learn disentangled representation which consists of three splits containing predictive, known nuisance and unknown nuisance information respectively. Furthermore, we incorporate contrastive method to enforce representation invariance. Experiments shows that the proposed method outperforms state-of-the-art (SOTA) methods on four standard benchmarks and shows that the proposed method can have better adversarial defense ability comparing to other methods without adversarial training.  ( 2 min )
    First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization. (arXiv:2205.12381v2 [cs.LG] UPDATED)
    How can we train an assistive human-machine interface (e.g., an electromyography-based limb prosthesis) to translate a user's raw command signals into the actions of a robot or computer when there is no prior mapping, we cannot ask the user for supervision in the form of action labels or reward feedback, and we do not have prior knowledge of the tasks the user is trying to accomplish? The key idea in this paper is that, regardless of the task, when an interface is more intuitive, the user's commands are less noisy. We formalize this idea as a completely unsupervised objective for optimizing interfaces: the mutual information between the user's command signals and the induced state transitions in the environment. To evaluate whether this mutual information score can distinguish between effective and ineffective interfaces, we conduct an observational study on 540K examples of users operating various keyboard and eye gaze interfaces for typing, controlling simulated robots, and playing video games. The results show that our mutual information scores are predictive of the ground-truth task completion metrics in a variety of domains, with an average Spearman's rank correlation of 0.43. In addition to offline evaluation of existing interfaces, we use our unsupervised objective to learn an interface from scratch: we randomly initialize the interface, have the user attempt to perform their desired tasks using the interface, measure the mutual information score, and update the interface to maximize mutual information through reinforcement learning. We evaluate our method through a user study with 12 participants who perform a 2D cursor control task using a perturbed mouse, and an experiment with one user playing the Lunar Lander game using hand gestures. The results show that we can learn an interface from scratch, without any user supervision or prior knowledge of tasks, in under 30 minutes.  ( 3 min )
    Exploiting Reward Shifting in Value-Based Deep RL. (arXiv:2209.07288v1 [cs.LG])
    In this work, we study the simple yet universally applicable case of reward shaping in value-based Deep Reinforcement Learning (DRL). We show that reward shifting in the form of the linear transformation is equivalent to changing the initialization of the $Q$-function in function approximation. Based on such an equivalence, we bring the key insight that a positive reward shifting leads to conservative exploitation, while a negative reward shifting leads to curiosity-driven exploration. Accordingly, conservative exploitation improves offline RL value estimation, and optimistic value estimation improves exploration for online RL. We validate our insight on a range of RL tasks and show its improvement over baselines: (1) In offline RL, the conservative exploitation leads to improved performance based on off-the-shelf algorithms; (2) In online continuous control, multiple value functions with different shifting constants can be used to tackle the exploration-exploitation dilemma for better sample efficiency; (3) In discrete control tasks, a negative reward shifting yields an improvement over the curiosity-based exploration method.  ( 2 min )
    Learning-Based Adaptive Control for Stochastic Linear Systems with Input Constraints. (arXiv:2209.07040v1 [eess.SY])
    We propose a certainty-equivalence scheme for adaptive control of scalar linear systems subject to additive, i.i.d. Gaussian disturbances and bounded control input constraints, without requiring prior knowledge of the bounds of the system parameters, nor the control direction. Assuming that the system is at-worst marginally stable, mean square boundedness of the closed-loop system states is proven. Lastly, numerical examples are presented to illustrate our results.  ( 2 min )
    Compressed Particle-Based Federated Bayesian Learning and Unlearning. (arXiv:2209.07267v1 [cs.LG])
    Conventional frequentist FL schemes are known to yield overconfident decisions. Bayesian FL addresses this issue by allowing agents to process and exchange uncertainty information encoded in distributions over the model parameters. However, this comes at the cost of a larger per-iteration communication overhead. This letter investigates whether Bayesian FL can still provide advantages in terms of calibration when constraining communication bandwidth. We present compressed particle-based Bayesian FL protocols for FL and federated "unlearning" that apply quantization and sparsification across multiple particles. The experimental results confirm that the benefits of Bayesian FL are robust to bandwidth constraints.  ( 2 min )
    Dynamic Graph Message Passing Networks. (arXiv:1908.06955v5 [cs.CV] UPDATED)
    Modelling long-range dependencies is critical for scene understanding tasks in computer vision. Although CNNs have excelled in many vision tasks, they are still limited in capturing long-range structured relationships as they typically consist of layers of local kernels. A fully-connected graph is beneficial for such modelling, however, its computational overhead is prohibitive. We propose a dynamic graph message passing network, that significantly reduces the computational complexity compared to related works modelling a fully-connected graph. This is achieved by adaptively sampling nodes in the graph, conditioned on the input, for message passing. Based on the sampled nodes, we dynamically predict node-dependent filter weights and the affinity matrix for propagating information between them. Using this model, we show significant improvements with respect to strong, state-of-the-art baselines on three different tasks and backbone architectures. Our approach also outperforms fully-connected graphs while using substantially fewer floating-point operations and parameters. The project website is this http URL  ( 2 min )
    Deep invariant networks with differentiable augmentation layers. (arXiv:2202.02142v5 [cs.LG] UPDATED)
    Designing learning systems which are invariant to certain data transformations is critical in machine learning. Practitioners can typically enforce a desired invariance on the trained model through the choice of a network architecture, e.g. using convolutions for translations, or using data augmentation. Yet, enforcing true invariance in the network can be difficult, and data invariances are not always known a piori. State-of-the-art methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems, which are complex to solve and often computationally demanding. In this work we investigate new ways of learning invariances only from the training data. Using learnable augmentation layers built directly in the network, we demonstrate that our method is very versatile. It can incorporate any type of differentiable augmentation and be applied to a broad class of learning problems beyond computer vision. We provide empirical evidence showing that our approach is easier and faster to train than modern automatic data augmentation techniques based on bilevel optimization, while achieving comparable results. Experiments show that while the invariances transferred to a model through automatic data augmentation are limited by the model expressivity, the invariance yielded by our approach is insensitive to it by design.  ( 3 min )
    Simulation of Atlantic Hurricane Tracks and Features: A Deep Learning Approach. (arXiv:2209.06901v1 [physics.ao-ph])
    The objective of this paper is to employ machine learning (ML) and deep learning (DL) techniques to obtain from input data (storm features) available in or derived from the HURDAT2 database models capable of simulating important hurricane properties such as landfall location and wind speed that are consistent with historical records. In pursuit of this objective, a trajectory model providing the storm center in terms of longitude and latitude, and intensity models providing the central pressure and maximum 1-$min$ wind speed at 10 $m$ elevation were created. The trajectory and intensity models are coupled and must be advanced together, six hours at a time, as the features that serve as inputs to the models at any given step depend on predictions at the previous time steps. Once a synthetic storm database is generated, properties of interest, such as the frequencies of large wind speeds may be extracted from any part of the simulation domain. The coupling of the trajectory and intensity models obviates the need for an intensity decay inland of the coastline. Prediction results are compared to historical data, and the efficacy of the storm simulation models is demonstrated for three examples: New Orleans, Miami and Cape Hatteras.  ( 2 min )
    Use case-focused metrics to evaluate machine learning for diseases involving parasite loads. (arXiv:2209.06947v1 [cs.LG])
    Communal hill-climbing, via comparison of algorithm performances, can greatly accelerate ML research. However, it requires task-relevant metrics. For diseases involving parasite loads, e.g., malaria and neglected tropical diseases (NTDs) such as schistosomiasis, the metrics currently reported in ML papers (e.g., AUC, F1 score) are ill-suited to the clinical task. As a result, the hill-climbing system is not enabling progress towards solutions that address these dire illnesses. Drawing on examples from malaria and NTDs, this paper highlights two gaps in current ML practice and proposes methods for improvement: (i) We describe aspects of ML development, and performance metrics in particular, that need to be firmly grounded in the clinical use case, and we offer methods for acquiring this domain knowledge. (ii) We describe in detail performance metrics to guide development of ML models for diseases involving parasite loads. We highlight the importance of a patient-level perspective, interpatient variability, false positive rates, limit of detection, and different types of error. We also discuss problems with ROC curves and AUC as commonly used in this context.
    Multicalibrated Regression for Downstream Fairness. (arXiv:2209.07312v1 [cs.LG])
    We show how to take a regression function $\hat{f}$ that is appropriately ``multicalibrated'' and efficiently post-process it into an approximately error minimizing classifier satisfying a large variety of fairness constraints. The post-processing requires no labeled data, and only a modest amount of unlabeled data and computation. The computational and sample complexity requirements of computing $\hat f$ are comparable to the requirements for solving a single fair learning task optimally, but it can in fact be used to solve many different downstream fairness-constrained learning problems efficiently. Our post-processing method easily handles intersecting groups, generalizing prior work on post-processing regression functions to satisfy fairness constraints that only applied to disjoint groups. Our work extends recent work showing that multicalibrated regression functions are ``omnipredictors'' (i.e. can be post-processed to optimally solve unconstrained ERM problems) to constrained optimization.
    Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates. (arXiv:1905.05285v2 [stat.ML] UPDATED)
    We establish the first nonasymptotic error bounds for Kaplan-Meier-based nearest neighbor and kernel survival probability estimators where feature vectors reside in metric spaces. Our bounds imply rates of strong consistency for these nonparametric estimators and, up to a log factor, match an existing lower bound for conditional CDF estimation. Our proof strategy also yields nonasymptotic guarantees for nearest neighbor and kernel variants of the Nelson-Aalen cumulative hazards estimator. We experimentally compare these methods on four datasets. We find that for the kernel survival estimator, a good choice of kernel is one learned using random survival forests.
    Data Lifecycle Management in Evolving Input Distributions for Learning-based Aerospace Applications. (arXiv:2209.06855v1 [cs.CV])
    As input distributions evolve over a mission lifetime, maintaining performance of learning-based models becomes challenging. This paper presents a framework to incrementally retrain a model by selecting a subset of test inputs to label, which allows the model to adapt to changing input distributions. Algorithms within this framework are evaluated based on (1) model performance throughout mission lifetime and (2) cumulative costs associated with labeling and model retraining. We provide an open-source benchmark of a satellite pose estimation model trained on images of a satellite in space and deployed in novel scenarios (e.g., different backgrounds or misbehaving pixels), where algorithms are evaluated on their ability to maintain high performance by retraining on a subset of inputs. We also propose a novel algorithm to select a diverse subset of inputs for labeling, by characterizing the information gain from an input using Bayesian uncertainty quantification and choosing a subset that maximizes collective information gain using concepts from batch active learning. We show that our algorithm outperforms others on the benchmark, e.g., achieves comparable performance to an algorithm that labels 100% of inputs, while only labeling 50% of inputs, resulting in low costs and high performance over the mission lifetime.  ( 2 min )
    Feature Selection integrated Deep Learning for Ultrahigh Dimensional and Highly Correlated Feature Space. (arXiv:2209.07011v1 [stat.ML])
    In recent years, deep learning has been a topic of interest in almost all disciplines due to its impressive empirical success in analyzing complex data sets, such as imaging, genetics, climate, and medical data. While most of the developments are treated as black-box machines, there is an increasing interest in interpretable, reliable, and robust deep learning models applicable to a broad class of applications. Feature-selected deep learning is proven to be promising in this regard. However, the recent developments do not address the situations of ultra-high dimensional and highly correlated feature selection in addition to the high noise level. In this article, we propose a novel screening and cleaning strategy with the aid of deep learning for the cluster-level discovery of highly correlated predictors with a controlled error rate. A thorough empirical evaluation over a wide range of simulated scenarios demonstrates the effectiveness of the proposed method by achieving high power while having a minimal number of false discoveries. Furthermore, we implemented the algorithm in the riboflavin (vitamin $B_2$) production dataset in the context of understanding the possible genetic association with riboflavin production. The gain of the proposed methodology is illustrated by achieving lower prediction error compared to other state-of-the-art methods.  ( 2 min )
    Limit Cycles of AdaBoost. (arXiv:2209.06928v1 [cs.LG])
    The iterative weight update for the AdaBoost machine learning algorithm may be realized as a dynamical map on a probability simplex. When learning a low-dimensional data set this algorithm has a tendency towards cycling behavior, which is the topic of this paper. AdaBoost's cycling behavior lends itself to direct computational methods that are ineffective in the general, non-cycling case of the algorithm. From these computational properties we give a concrete correspondence between AdaBoost's cycling behavior and continued fractions dynamics. Then we explore the results of this correspondence to expound on how the algorithm comes to be in this periodic state at all. What we intend for this work is to be a novel and self-contained explanation for the cycling dynamics of this machine learning algorithm.  ( 2 min )
    On the interplay of adversarial robustness and architecture components: patches, convolution and attention. (arXiv:2209.06953v1 [cs.CV])
    In recent years novel architecture components for image classification have been developed, starting with attention and patches used in transformers. While prior works have analyzed the influence of some aspects of architecture components on the robustness to adversarial attacks, in particular for vision transformers, the understanding of the main factors is still limited. We compare several (non)-robust classifiers with different architectures and study their properties, including the effect of adversarial training on the interpretability of the learnt features and robustness to unseen threat models. An ablation from ResNet to ConvNeXt reveals key architectural changes leading to almost $10\%$ higher $\ell_\infty$-robustness.  ( 2 min )
    Complex-Valued Autoencoders for Object Discovery. (arXiv:2204.02075v4 [cs.LG] UPDATED)
    Object-centric representations form the basis of human perception, and enable us to reason about the world and to systematically generalize to new settings. Currently, most works on unsupervised object discovery focus on slot-based approaches, which explicitly separate the latent representations of individual objects. While the result is easily interpretable, it usually requires the design of involved architectures. In contrast to this, we propose a comparatively simple approach - the Complex AutoEncoder (CAE) - that creates distributed object-centric representations. Following a coding scheme theorized to underlie object representations in biological neurons, its complex-valued activations represent two messages: their magnitudes express the presence of a feature, while the relative phase differences between neurons express which features should be bound together to create joint object representations. In contrast to previous approaches using complex-valued activations for object discovery, we present a fully unsupervised approach that is trained end-to-end - resulting in significant improvements in performance and efficiency. Further, we show that the CAE achieves competitive or better unsupervised object discovery performance on simple multi-object datasets compared to a state-of-the-art slot-based approach while being up to 100 times faster to train.
    Decentralized Learning with Separable Data: Generalization and Fast Algorithms. (arXiv:2209.07116v1 [cs.LG])
    Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance.
    Sampling for network function learning. (arXiv:2209.07342v1 [cs.SI])
    Given a valued graph, where both the nodes and the edges of the graph are associated with one or several values, any network function for a given node must be defined in terms of that node and its connected nodes in the graph. Generally, applying the same definition to the whole graph or any given subgraph of it would result in systematically different network functions. In this paper we consider the feasibility of graph sampling approach to network function learning, as well as the corresponding learning methods based on the sample graphs. This can be useful either when the edges are unknown to start with or the graph is too large (or dynamic) to be processed entirely.
    An ensemble Multi-Agent System for non-linear classification. (arXiv:2209.06824v1 [cs.LG])
    Self-Adaptive Multi-Agent Systems (AMAS) transform machine learning problems into problems of local cooperation between agents. We present smapy, an ensemble based AMAS implementation for mobility prediction, whose agents are provided with machine learning models in addition to their cooperation rules. With a detailed methodology, we show that it is possible to use linear models for nonlinear classification on a benchmark transport mode detection dataset, if they are integrated in a cooperative multi-agent structure. The results obtained show a significant improvement of the performance of linear models in non-linear contexts thanks to the multi-agent approach.  ( 2 min )
    Blind and Channel-agnostic Equalization Using Adversarial Networks. (arXiv:2209.07277v1 [eess.SP])
    Due to the rapid development of autonomous driving, the Internet of Things and streaming services, modern communication systems have to cope with varying channel conditions and a steadily rising number of users and devices. This, and the still rising bandwidth demands, can only be met by intelligent network automation, which requires highly flexible and blind transceiver algorithms. To tackle those challenges, we propose a novel adaptive equalization scheme, which exploits the prosperous advances in deep learning by training an equalizer with an adversarial network. The learning is only based on the statistics of the transmit signal, so it is blind regarding the actual transmit symbols and agnostic to the channel model. The proposed approach is independent of the equalizer topology and enables the application of powerful neural network based equalizers. In this work, we prove this concept in simulations of different -- both linear and nonlinear -- transmission channels and demonstrate the capability of the proposed blind learning scheme to approach the performance of non-blind equalizers. Furthermore, we provide a theoretical perspective and highlight the challenges of the approach.  ( 2 min )
    Estimating large causal polytree skeletons from small samples. (arXiv:2209.07028v1 [stat.ME])
    We consider the problem of estimating the skeleton of a large causal polytree from a relatively small i.i.d. sample. This is motivated by the problem of determining causal structure when the number of variables is very large compared to the sample size, such as in gene regulatory networks. We give an algorithm that recovers the tree with high accuracy in such settings. The algorithm works under essentially no distributional or modeling assumptions other than some mild non-degeneracy conditions.  ( 2 min )
    Semi-Counterfactual Risk Minimization Via Neural Networks. (arXiv:2209.07148v1 [cs.LG])
    Counterfactual risk minimization is a framework for offline policy optimization with logged data which consists of context, action, propensity score, and reward for each sample point. In this work, we build on this framework and propose a learning method for settings where the rewards for some samples are not observed, and so the logged data consists of a subset of samples with unknown rewards and a subset of samples with known rewards. This setting arises in many application domains, including advertising and healthcare. While reward feedback is missing for some samples, it is possible to leverage the unknown-reward samples in order to minimize the risk, and we refer to this setting as semi-counterfactual risk minimization. To approach this kind of learning problem, we derive new upper bounds on the true risk under the inverse propensity score estimator. We then build upon these bounds to propose a regularized counterfactual risk minimization method, where the regularization term is based on the logged unknown-rewards dataset only; hence it is reward-independent. We also propose another algorithm based on generating pseudo-rewards for the logged unknown-rewards dataset. Experimental results with neural networks and benchmark datasets indicate that these algorithms can leverage the logged unknown-rewards dataset besides the logged known-reward dataset.  ( 2 min )
    Evolving Zero Cost Proxies For Neural Architecture Scoring. (arXiv:2209.07413v1 [cs.LG])
    Neural Architecture Search (NAS) has significantly improved productivity in the design and deployment of neural networks (NN). As NAS typically evaluates multiple models by training them partially or completely, the improved productivity comes at the cost of significant carbon footprint. To alleviate this expensive training routine, zero-shot/cost proxies analyze an NN at initialization to generate a score, which correlates highly with its true accuracy. Zero-cost proxies are currently designed by experts conducting multiple cycles of empirical testing on possible algorithms, data-sets, and neural architecture design spaces. This lowers productivity and is an unsustainable approach towards zero-cost proxy design as deep learning use-cases diversify in nature. Additionally, existing zero-cost proxies fail to generalize across neural architecture design spaces. In this paper, we propose a genetic programming framework to automate the discovery of zero-cost proxies for neural architecture scoring. Our methodology efficiently discovers an interpretable and generalizable zero-cost proxy that gives state of the art score-accuracy correlation on all data-sets and search spaces of NASBench-201 and Network Design Spaces (NDS). We believe that this research indicates a promising direction towards automatically discovering zero-cost proxies that can work across network architecture design spaces, data-sets, and tasks.  ( 2 min )
    On the State of the Art in Authorship Attribution and Authorship Verification. (arXiv:2209.06869v1 [cs.CL])
    Despite decades of research on authorship attribution (AA) and authorship verification (AV), inconsistent dataset splits/filtering and mismatched evaluation methods make it difficult to assess the state of the art. In this paper, we present a survey of the fields, resolve points of confusion, introduce Valla that standardizes and benchmarks AA/AV datasets and metrics, provide a large-scale empirical evaluation, and provide apples-to-apples comparisons between existing methods. We evaluate eight promising methods on fifteen datasets (including distribution-shifted challenge sets) and introduce a new large-scale dataset based on texts archived by Project Gutenberg. Surprisingly, we find that a traditional Ngram-based model performs best on 5 (of 7) AA tasks, achieving an average macro-accuracy of $76.50\%$ (compared to $66.71\%$ for a BERT-based model). However, on the two AA datasets with the greatest number of words per author, as well as on the AV datasets, BERT-based models perform best. While AV methods are easily applied to AA, they are seldom included as baselines in AA papers. We show that through the application of hard-negative mining, AV methods are competitive alternatives to AA methods. Valla and all experiment code can be found here: https://github.com/JacobTyo/Valla  ( 2 min )
    Wasserstein $K$-means for clustering probability distributions. (arXiv:2209.06975v1 [stat.ML])
    Clustering is an important exploratory data analysis technique to group objects based on their similarity. The widely used $K$-means clustering method relies on some notion of distance to partition data into a fewer number of groups. In the Euclidean space, centroid-based and distance-based formulations of the $K$-means are equivalent. In modern machine learning applications, data often arise as probability distributions and a natural generalization to handle measure-valued data is to use the optimal transport metric. Due to non-negative Alexandrov curvature of the Wasserstein space, barycenters suffer from regularity and non-robustness issues. The peculiar behaviors of Wasserstein barycenters may make the centroid-based formulation fail to represent the within-cluster data points, while the more direct distance-based $K$-means approach and its semidefinite program (SDP) relaxation are capable of recovering the true cluster labels. In the special case of clustering Gaussian distributions, we show that the SDP relaxed Wasserstein $K$-means can achieve exact recovery given the clusters are well-separated under the $2$-Wasserstein metric. Our simulation and real data examples also demonstrate that distance-based $K$-means can achieve better classification performance over the standard centroid-based $K$-means for clustering probability distributions and images.  ( 2 min )
    On the detrimental effect of invariances in the likelihood for variational inference. (arXiv:2209.07157v1 [cs.LG])
    Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.  ( 2 min )
    Towards Healing the Blindness of Score Matching. (arXiv:2209.07396v1 [stat.ML])
    Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of density estimation and report improved performance compared to traditional approaches.  ( 2 min )
    A Random Persistence Diagram Generator. (arXiv:2104.07737v4 [stat.ML] UPDATED)
    Topological data analysis (TDA) studies the shape patterns of data. Persistent homology is a widely used method in TDA that summarizes homological features of data at multiple scales and stores them in persistence diagrams (PDs). In this paper, we propose a random persistence diagram generator (RPDG) method that generates a sequence of random PDs from the ones produced by the data. RPDG is underpinned by a model based on pairwise interacting point processes, and a reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithm. A first example, which is based on a synthetic dataset, demonstrates the efficacy of RPDG and provides a comparison with another method for sampling PDs. A second example demonstrates the utility of RPDG to solve a materials science problem given a real dataset of small sample size.  ( 2 min )
    Unsupervised Learning of Group Invariant and Equivariant Representations. (arXiv:2202.07559v2 [cs.LG] UPDATED)
    Equivariant neural networks, whose hidden features transform according to representations of a group G acting on the data, exhibit training efficiency and an improved generalisation performance. In this work, we extend group invariant and equivariant representation learning to the field of unsupervised deep learning. We propose a general learning strategy based on an encoder-decoder framework in which the latent representation is separated in an invariant term and an equivariant group action component. The key idea is that the network learns to encode and decode data to and from a group-invariant representation by additionally learning to predict the appropriate group action to align input and output pose to solve the reconstruction task. We derive the necessary conditions on the equivariant encoder, and we present a construction valid for any G, both discrete and continuous. We describe explicitly our construction for rotations, translations and permutations. We test the validity and the robustness of our approach in a variety of experiments with diverse data types employing different network architectures.  ( 2 min )
    Learning Multi-agent Options for Tabular Reinforcement Learning using Factor Graphs. (arXiv:2201.08227v2 [cs.MA] UPDATED)
    Covering option discovery has been developed to improve the exploration of reinforcement learning in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. However, these option discovery methods cannot be directly extended to multi-agent scenarios, since the joint state space grows exponentially with the number of agents in the system. Thus, existing researches on adopting options in multi-agent scenarios still rely on single-agent option discovery and fail to directly discover the joint options that can improve the connectivity of the joint state space of agents. In this paper, we show that it is indeed possible to directly compute multi-agent options with collaborative exploratory behaviors among the agents, while still enjoying the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph -- the Kronecker product of individual agents' state transition graphs, based on which we can directly estimate the Fiedler vector of the joint state space using the Laplacian spectrum of individual agents' transition graphs. This decomposition enables us to efficiently construct multi-agent joint options by encouraging agents to connect the sub-goal joint states which are corresponding to the minimum or maximum values of the estimated joint Fiedler vector. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher cumulative rewards.  ( 3 min )
    Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Propagation Approach. (arXiv:2209.06995v1 [cs.CL])
    We propose PATRON, a new method that uses prompt-based uncertainty estimation for data selection for pre-trained language model fine-tuning under cold-start scenarios, i.e., no initial labeled data are available. In PATRON, we design (1) a prompt-based uncertainty propagation approach to estimate the importance of data points and (2) a partition-then-rewrite (PTR) strategy to promote sample diversity when querying for annotations. Experiments on six text classification datasets show that PATRON outperforms the strongest cold-start data selection baselines by up to 6.9%. Besides, with 128 labels only, PATRON achieves 91.0% and 92.1% of the fully supervised performance based on vanilla fine-tuning and prompt-based learning respectively. Our implementation of PATRON is available at \url{https://github.com/yueyu1030/Patron}.  ( 2 min )
    A Model Drift Detection and Adaptation Framework for 5G Core Networks. (arXiv:2209.06852v1 [cs.NI])
    The advent of Fifth Generation (5G) and beyond 5G networks (5G+) has revolutionized the way network operators consider the management and orchestration of their networks. With an increased focus on intelligence and automation through core network functions such as the NWDAF, service providers are tasked with integrating machine learning models and artificial intelligence systems into their existing network operation practices. Due to the dynamic nature of next-generation networks and their supported use cases and applications, model drift is a serious concern, which can deteriorate the performance of intelligent models deployed throughout the network. The work presented in this paper introduces a model drift detection and adaptation module for 5G core networks. Using a functional prototype of a 5G core network, a drift in user behaviour is emulated, and the proposed framework is deployed and tested. The results of this work demonstrate the ability of the drift detection module to accurately characterize a drifted concept as well as the ability of the drift adaptation module to begin the necessary remediation efforts to restore system performance.  ( 2 min )
    Modifying Squint for Prediction with Expert Advice in a Changing Environment. (arXiv:2209.06826v1 [cs.LG])
    We provide a new method for online learning, specifically prediction with expert advice, in a changing environment. In a non-changing environment the Squint algorithm has been designed to always function at least as well as other known algorithms and in specific cases it functions much better. However, when using a conventional black-box algorithm to make Squint suitable for a changing environment, it loses its beneficial properties. Hence, we provide a new algorithm, Squint-CE, which is suitable for a changing environment and preserves the properties of Squint.  ( 2 min )
    Ergo, SMIRK is Safe: A Safety Case for a Machine Learning Component in a Pedestrian Automatic Emergency Brake System. (arXiv:2204.07874v2 [cs.SE] UPDATED)
    Integration of Machine Learning (ML) components in critical applications introduces novel challenges for software certification and verification. New safety standards and technical guidelines are under development to support the safety of ML-based systems, e.g., ISO 21448 SOTIF for the automotive domain and the Assurance of Machine Learning for use in Autonomous Systems (AMLAS) framework. SOTIF and AMLAS provide high-level guidance but the details must be chiseled out for each specific case. We initiated a research project with the goal to demonstrate a complete safety case for an ML component in an open automotive system. This paper reports results from an industry-academia collaboration on safety assurance of SMIRK, an ML-based pedestrian automatic emergency braking demonstrator running in an industry-grade simulator. We demonstrate an application of AMLAS on SMIRK for a minimalistic operational design domain, i.e., we share a complete safety case for its integrated ML-based component. Finally, we report lessons learned and provide both SMIRK and the safety case under an open-source licence for the research community to reuse.  ( 3 min )
    Time Series Prediction for Food sustainability. (arXiv:2209.06889v1 [cs.LG])
    With exponential growth in the human population, it is vital to conserve natural resources without compromising on producing enough food to feed everyone. Doing so can improve people's livelihoods, health, and ecosystems for the present and future generations. Sustainable development, a paradigm of the United Nations, is rooted in food, crop, livestock, forest, population, and even the emission of gases. By understanding the overall usage of natural resources in different countries in the past, it is possible to forecast the demand in each country. The proposed solution consists of implementing a machine learning system using a statistical regression model that can predict the top k products that would endure a shortage in each country in a specific period in the future. The prediction performance in terms of absolute error and root mean square error show promising results due to its low errors. This solution could help organizations and manufacturers understand the productivity and sustainability needed to satisfy the global demand.  ( 2 min )
    Vectorized Adjoint Sensitivity Method for Graph Convolutional Neural Ordinary Differential Equations. (arXiv:2209.06886v1 [cs.LG])
    This document, as the title stated, is meant to provide a vectorized implementation of adjoint dynamics calculation for Graph Convolutional Neural Ordinary Differential Equations (GCDE). The adjoint sensitivity method is the gradient approximation method for neural ODEs that replaces the back propagation. When implemented on libraries such as PyTorch or Tensorflow, the adjoint can be calculated by autograd functions without the need for a hand-derived formula. In applications such as edge computing and in memristor crossbars, however, autograds are not available, and therefore we need a vectorized derivation of adjoint dynamics to efficiently map the system on hardware. This document will go over the basics, then move on to derive the vectorized adjoint dynamics for GCDE.  ( 2 min )
    Particle gradient descent model for point process generation. (arXiv:2010.14928v3 [stat.ML] UPDATED)
    This paper presents a statistical model for stationary ergodic point processes, estimated from a single realization observed in a square window. With existing approaches in stochastic geometry, it is very difficult to model processes with complex geometries formed by a large number of particles. Inspired by recent works on gradient descent algorithms for sampling maximum-entropy models, we describe a model that allows for fast sampling of new configurations reproducing the statistics of the given observation. Starting from an initial random configuration, its particles are moved according to the gradient of an energy, in order to match a set of prescribed moments (functionals). Our moments are defined via a phase harmonic operator on the wavelet transform of point patterns. They allow one to capture multi-scale interactions between the particles, while controlling explicitly the number of moments by the scales of the structures to model. We present numerical experiments on point processes with various geometric structures, and assess the quality of the model by spectral and topological data analysis.  ( 3 min )
    Neural Networks Reduction via Lumping. (arXiv:2209.07475v1 [cs.LG])
    The increasing size of recently proposed Neural Networks makes it hard to implement them on embedded devices, where memory, battery and computational power are a non-trivial bottleneck. For this reason during the last years network compression literature has been thriving and a large number of solutions has been been published to reduce both the number of operations and the parameters involved with the models. Unfortunately, most of these reducing techniques are actually heuristic methods and usually require at least one re-training step to recover the accuracy. The need of procedures for model reduction is well-known also in the fields of Verification and Performances Evaluation, where large efforts have been devoted to the definition of quotients that preserve the observable underlying behaviour. In this paper we try to bridge the gap between the most popular and very effective network reduction strategies and formal notions, such as lumpability, introduced for verification and evaluation of Markov Chains. Elaborating on lumpability we propose a pruning approach that reduces the number of neurons in a network without using any data or fine-tuning, while completely preserving the exact behaviour. Relaxing the constraints on the exact definition of the quotienting method we can give a formal explanation of some of the most common reduction techniques.  ( 2 min )
    A Stochastic Optimization Framework for Fair Risk Minimization. (arXiv:2102.12586v3 [cs.LG] UPDATED)
    Despite the success of large-scale empirical risk minimization (ERM) at achieving high accuracy across a variety of machine learning tasks, fair ERM is hindered by the incompatibility of fairness constraints with stochastic optimization. We consider the problem of fair classification with discrete sensitive attributes and potentially large models and data sets, requiring stochastic solvers. Existing in-processing fairness algorithms are either impractical in the large-scale setting because they require large batches of data at each iteration or they are not guaranteed to converge. In this paper, we develop the first stochastic in-processing fairness algorithm with guaranteed convergence. For demographic parity, equalized odds, and equal opportunity notions of fairness, we provide slight variations of our algorithm--called FERMI--and prove that each of these variations converges in stochastic optimization with any batch size. Empirically, we show that FERMI is amenable to stochastic solvers with multiple (non-binary) sensitive attributes and non-binary targets, performing well even with minibatch size as small as one. Extensive experiments show that FERMI achieves the most favorable tradeoffs between fairness violation and test accuracy across all tested setups compared with state-of-the-art baselines for demographic parity, equalized odds, equal opportunity. These benefits are especially significant with small batch sizes and for non-binary classification with large number of sensitive attributes, making FERMI a practical fairness algorithm for large-scale problems.  ( 3 min )
    Decision making in cancer: Causal questions require causal answers. (arXiv:2209.07397v1 [cs.LG])
    Treatment decisions in cancer care are guided by treatment effect estimates from randomized controlled trials (RCTs). RCTs estimate the average effect of one treatment versus another in a certain population. However, treatments may not be equally effective for every patient in a population. Knowing the effectiveness of treatments tailored to specific patient and tumor characteristics would enable individualized treatment decisions. Getting tailored treatment effects by averaging outcomes in different patient subgroups in RCTs requires an unfeasible number of patients to have sufficient statistical power in all relevant subgroups for all possible treatments. The American Joint Committee on Cancer (AJCC) recommends that researchers develop outcome prediction models (OPMs) in an effort to individualize treatment decisions. OPMs sometimes called risk models or prognosis models, use patient and tumor characteristics to predict a patient outcome such as overall survival. The assumption is that the predictions are useful for treatment decisions using rules such as "prescribe chemotherapy only if the OPM predicts the patient has a high risk of recurrence". Recognizing the importance of reliable predictions, the AJCC published a checklist for OPMs to ensure dependable OPM prediction accuracy in the patient population for which the OPM was designed. However, accurate outcome predictions do not imply that these predictions yield good treatment decisions. In this perspective, we show that OPM rely on a fixed treatment policy which implies that OPM that were found to accurately predict outcomes in validation studies can still lead to patient harm when used to inform treatment decisions. We then give guidance on how to develop models that are useful for individualized treatment decisions and how to evaluate whether a model has value for decision-making.  ( 3 min )
    Towards Coupling Full-disk and Active Region-based Flare Prediction for Operational Space Weather Forecasting. (arXiv:2209.07406v1 [physics.space-ph])
    Solar flare prediction is a central problem in space weather forecasting and has captivated the attention of a wide spectrum of researchers due to recent advances in both remote sensing as well as machine learning and deep learning approaches. The experimental findings based on both machine and deep learning models reveal significant performance improvements for task specific datasets. Along with building models, the practice of deploying such models to production environments under operational settings is a more complex and often time-consuming process which is often not addressed directly in research settings. We present a set of new heuristic approaches to train and deploy an operational solar flare prediction system for $\geq$M1.0-class flares with two prediction modes: full-disk and a…  ( 3 min )
    Distributed Sparse Linear Regression with Sublinear Communication. (arXiv:2209.07230v1 [stat.ML])
    We study the problem of high-dimensional sparse linear regression in a distributed setting under both computational and communication constraints. Specifically, we consider a star topology network whereby several machines are connected to a fusion center, with whom they can exchange relatively short messages. Each machine holds noisy samples from a linear regression model with the same unknown sparse $d$-dimensional vector of regression coefficients $\theta$. The goal of the fusion center is to estimate the vector $\theta$ and its support using few computations and limited communication at each machine. In this work, we consider distributed algorithms based on Orthogonal Matching Pursuit (OMP) and theoretically study their ability to exactly recover the support of $\theta$. We prove that under certain conditions, even at low signal-to-noise-ratios where individual machines are unable to detect the support of $\theta$, distributed-OMP methods correctly recover it with total communication sublinear in $d$. In addition, we present simulations that illustrate the performance of distributed OMP-based algorithms and show that they perform similarly to more sophisticated and computationally intensive methods, and in some cases even outperform them.  ( 2 min )
    Distribution Aware Metrics for Conditional Natural Language Generation. (arXiv:2209.07518v1 [cs.CL])
    Traditional automated metrics for evaluating conditional natural language generation use pairwise comparisons between a single generated text and the best-matching gold-standard ground truth text. When multiple ground truths are available, scores are aggregated using an average or max operation across references. While this approach works well when diversity in the ground truth data (i.e. dispersion of the distribution of conditional texts) can be ascribed to noise, such as in automated speech recognition, it does not allow for robust evaluation in the case where diversity in the ground truths represents signal for the model. In this work we argue that existing metrics are not appropriate for domains such as visual description or summarization where ground truths are semantically diverse, and where the diversity in those captions captures useful additional information about the context. We propose a novel paradigm for multi-candidate evaluation of conditional language generation models, and a new family of metrics that compare the distributions of reference and model-generated caption sets using small sample sets of each. We demonstrate the utility of our approach with a case study in visual description: where we show that existing models optimize for single-description quality over diversity, and gain some insights into how sampling methods and temperature impact description quality and diversity.  ( 3 min )
    Constrained Update Projection Approach to Safe Policy Optimization. (arXiv:2209.07089v1 [cs.LG])
    Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. 2) CUP unifies performance bounds, providing a better understanding and interpretability for some existing algorithms; 3) CUP provides a non-convex implementation via only first-order optimizers, which does not require any strong approximation on the convexity of the objectives. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction. We have opened the source code of CUP at https://github.com/RL-boxes/Safe-RL/tree/ main/CUP.  ( 2 min )
    COOL-MC: A Comprehensive Tool for Reinforcement Learning and Model Checking. (arXiv:2209.07133v1 [cs.LG])
    This paper presents COOL-MC, a tool that integrates state-of-the-art reinforcement learning (RL) and model checking. Specifically, the tool builds upon the OpenAI gym and the probabilistic model checker Storm. COOL-MC provides the following features: (1) a simulator to train RL policies in the OpenAI gym for Markov decision processes (MDPs) that are defined as input for Storm, (2) a new model builder for Storm, which uses callback functions to verify (neural network) RL policies, (3) formal abstractions that relate models and policies specified in OpenAI gym or Storm, and (4) algorithms to obtain bounds on the performance of so-called permissive policies. We describe the components and architecture of COOL-MC and demonstrate its features on multiple benchmark environments.  ( 2 min )
    Semiparametric Best Arm Identification with Contextual Information. (arXiv:2209.07330v1 [cs.LG])
    We study best-arm identification with a fixed budget and contextual (covariate) information in stochastic multi-armed bandit problems. In each round, after observing contextual information, we choose a treatment arm using past observations and current context. Our goal is to identify the best treatment arm, a treatment arm with the maximal expected reward marginalized over the contextual distribution, with a minimal probability of misidentification. First, we derive semiparametric lower bounds for this problem, where we regard the gaps between the expected rewards of the best and suboptimal treatment arms as parameters of interest, and all other parameters, such as the expected rewards conditioned on contexts, as the nuisance parameters. We then develop the "Contextual RS-AIPW strategy," which consists of the random sampling (RS) rule tracking a target allocation ratio and the recommendation rule using the augmented inverse probability weighting (AIPW) estimator. Our proposed Contextual RS-AIPW strategy is optimal because the upper bound for the probability of misidentification matches the semiparametric lower bound when the budget goes to infinity, and the gaps converge to zero.  ( 2 min )
    Robust field-level inference with dark matter halos. (arXiv:2209.06843v1 [astro-ph.CO])
    We train graph neural networks on halo catalogues from Gadget N-body simulations to perform field-level likelihood-free inference of cosmological parameters. The catalogues contain $\lesssim$5,000 halos with masses $\gtrsim 10^{10}~h^{-1}M_\odot$ in a periodic volume of $(25~h^{-1}{\rm Mpc})^3$; every halo in the catalogue is characterized by several properties such as position, mass, velocity, concentration, and maximum circular velocity. Our models, built to be permutationally, translationally, and rotationally invariant, do not impose a minimum scale on which to extract information and are able to infer the values of $\Omega_{\rm m}$ and $\sigma_8$ with a mean relative error of $\sim6\%$, when using positions plus velocities and positions plus masses, respectively. More importantly, we find that our models are very robust: they can infer the value of $\Omega_{\rm m}$ and $\sigma_8$ when tested using halo catalogues from thousands of N-body simulations run with five different N-body codes: Abacus, CUBEP$^3$M, Enzo, PKDGrav3, and Ramses. Surprisingly, the model trained to infer $\Omega_{\rm m}$ also works when tested on thousands of state-of-the-art CAMELS hydrodynamic simulations run with four different codes and subgrid physics implementations. Using halo properties such as concentration and maximum circular velocity allow our models to extract more information, at the expense of breaking the robustness of the models. This may happen because the different N-body codes are not converged on the relevant scales corresponding to these parameters.  ( 3 min )
  • Open

    The Fragility of Optimized Bandit Algorithms. (arXiv:2109.13595v3 [cs.LG] CROSS LISTED)
    Much of the literature on optimal design of bandit algorithms is based on minimization of expected regret. It is well known that designs that are optimal over certain exponential families can achieve expected regret that grows logarithmically in the number of arm plays, at a rate governed by the Lai-Robbins lower bound. In this paper, we show that when one uses such optimized designs, the regret distribution of the associated algorithms necessarily has a very heavy tail, specifically, that of a truncated Cauchy distribution. Furthermore, for $p>1$, the $p$'th moment of the regret distribution grows much faster than poly-logarithmically, in particular as a power of the total number of arm plays. We show that optimized UCB bandit designs are also fragile in an additional sense, namely when the problem is even slightly mis-specified, the regret can grow much faster than the conventional theory suggests. Our arguments are based on standard change-of-measure ideas, and indicate that the most likely way that regret becomes larger than expected is when the optimal arm returns below-average rewards in the first few arm plays, thereby causing the algorithm to believe that the arm is sub-optimal. To alleviate the fragility issues exposed, we show that UCB algorithms can be modified so as to ensure a desired degree of robustness to mis-specification. In doing so, we also provide a sharp trade-off between the amount of UCB exploration and the tail exponent of the resulting regret distribution.
    A Random Persistence Diagram Generator. (arXiv:2104.07737v4 [stat.ML] UPDATED)
    Topological data analysis (TDA) studies the shape patterns of data. Persistent homology is a widely used method in TDA that summarizes homological features of data at multiple scales and stores them in persistence diagrams (PDs). In this paper, we propose a random persistence diagram generator (RPDG) method that generates a sequence of random PDs from the ones produced by the data. RPDG is underpinned by a model based on pairwise interacting point processes, and a reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithm. A first example, which is based on a synthetic dataset, demonstrates the efficacy of RPDG and provides a comparison with another method for sampling PDs. A second example demonstrates the utility of RPDG to solve a materials science problem given a real dataset of small sample size.
    Wasserstein $K$-means for clustering probability distributions. (arXiv:2209.06975v1 [stat.ML])
    Clustering is an important exploratory data analysis technique to group objects based on their similarity. The widely used $K$-means clustering method relies on some notion of distance to partition data into a fewer number of groups. In the Euclidean space, centroid-based and distance-based formulations of the $K$-means are equivalent. In modern machine learning applications, data often arise as probability distributions and a natural generalization to handle measure-valued data is to use the optimal transport metric. Due to non-negative Alexandrov curvature of the Wasserstein space, barycenters suffer from regularity and non-robustness issues. The peculiar behaviors of Wasserstein barycenters may make the centroid-based formulation fail to represent the within-cluster data points, while the more direct distance-based $K$-means approach and its semidefinite program (SDP) relaxation are capable of recovering the true cluster labels. In the special case of clustering Gaussian distributions, we show that the SDP relaxed Wasserstein $K$-means can achieve exact recovery given the clusters are well-separated under the $2$-Wasserstein metric. Our simulation and real data examples also demonstrate that distance-based $K$-means can achieve better classification performance over the standard centroid-based $K$-means for clustering probability distributions and images.
    On the Dissipation of Ideal Hamiltonian Monte Carlo Sampler. (arXiv:2209.07438v1 [stat.CO])
    We report on what seems to be an intriguing connection between variable integration time and partial velocity refreshment of Ideal Hamiltonian Monte Carlo samplers, both of which can be used for reducing the dissipative behavior of the dynamics. More concretely, we show that on quadratic potentials, efficiency can be improved through these means by a $\sqrt{\kappa}$ factor in Wasserstein-2 distance, compared to classical constant integration time, fully refreshed HMC.
    Stochastic Tree Ensembles for Estimating Heterogeneous Effects. (arXiv:2209.06998v1 [stat.ML])
    Determining subgroups that respond especially well (or poorly) to specific interventions (medical or policy) requires new supervised learning methods tailored specifically for causal inference. Bayesian Causal Forest (BCF) is a recent method that has been documented to perform well on data generating processes with strong confounding of the sort that is plausible in many applications. This paper develops a novel algorithm for fitting the BCF model, which is more efficient than the previously available Gibbs sampler. The new algorithm can be used to initialize independent chains of the existing Gibbs sampler leading to better posterior exploration and coverage of the associated interval estimates in simulation studies. The new algorithm is compared to related approaches via simulation studies as well as an empirical analysis.
    $\rho$-GNF : A Novel Sensitivity Analysis Approach Under Unobserved Confounders. (arXiv:2209.07111v1 [stat.ME])
    We propose a new sensitivity analysis model that combines copulas and normalizing flows for causal inference under unobserved confounding. We refer to the new model as $\rho$-GNF ($\rho$-Graphical Normalizing Flow), where $\rho{\in}[-1,+1]$ is a bounded sensitivity parameter representing the backdoor non-causal association due to unobserved confounding modeled using the most well studied and widely popular Gaussian copula. Specifically, $\rho$-GNF enables us to estimate and analyse the frontdoor causal effect or average causal effect (ACE) as a function of $\rho$. We call this the $\rho_{curve}$. The $\rho_{curve}$ enables us to specify the confounding strength required to nullify the ACE. We call this the $\rho_{value}$. Further, the $\rho_{curve}$ also enables us to provide bounds for the ACE given an interval of $\rho$ values. We illustrate the benefits of $\rho$-GNF with experiments on simulated and real-world data in terms of our empirical ACE bounds being narrower than other popular ACE bounds.
    Differentially Private Estimation of Hawkes Process. (arXiv:2209.07303v1 [cs.LG])
    Point process models are of great importance in real world applications. In certain critical applications, estimation of point process models involves large amounts of sensitive personal data from users. Privacy concerns naturally arise which have not been addressed in the existing literature. To bridge this glaring gap, we propose the first general differentially private estimation procedure for point process models. Specifically, we take the Hawkes process as an example, and introduce a rigorous definition of differential privacy for event stream data based on a discretized representation of the Hawkes process. We then propose two differentially private optimization algorithms, which can efficiently estimate Hawkes process models with the desired privacy and utility guarantees under two different settings. Experiments are provided to back up our theoretical analysis.
    Efficient learning of nonlinear prediction models with time-series privileged information. (arXiv:2209.07067v1 [cs.LG])
    In domains where sample sizes are limited, efficient learning algorithms are critical. Learning using privileged information (LuPI) offers increased sample efficiency by allowing prediction models access to types of information at training time which is unavailable when the models are used. In recent work, it was shown that for prediction in linear-Gaussian dynamical systems, a LuPI learner with access to intermediate time series data is never worse and often better in expectation than any unbiased classical learner. We provide new insights into this analysis and generalize it to nonlinear prediction tasks in latent dynamical systems, extending theoretical guarantees to the case where the map connecting latent variables and observations is known up to a linear transform. In addition, we propose algorithms based on random features and representation learning for the case when this map is unknown. A suite of empirical results confirm theoretical findings and show the potential of using privileged time-series information in nonlinear prediction.
    Private Stochastic Optimization in the Presence of Outliers: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses. (arXiv:2209.07403v1 [cs.LG])
    We study differentially private (DP) stochastic optimization (SO) with data containing outliers and loss functions that are not Lipschitz continuous. To date, the vast majority of work on DP SO assumes that the loss is Lipschitz (i.e. stochastic gradients are uniformly bounded), and their error bounds scale with the Lipschitz parameter of the loss. While this assumption is convenient, it is often unrealistic: in many practical problems where privacy is required, data may contain outliers or be unbounded, causing some stochastic gradients to have large norm. In such cases, the Lipschitz parameter may be prohibitively large, leading to vacuous excess risk bounds. Thus, building on a recent line of work [WXDX20, KLZ22], we make the weaker assumption that stochastic gradients have bounded $k$-th moments for some $k \geq 2$. Compared with works on DP Lipschitz SO, our excess risk scales with the $k$-th moment bound instead of the Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). Moreover, in contrast to the prior works [WXDX20, KLZ22], our bounds do not require the loss function to be differentiable/smooth. We also devise an accelerated algorithm that runs in linear time and yields improved (compared to prior works) and nearly optimal excess risk for smooth losses. Additionally, our work is the first to address non-convex non-Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some classes of neural nets, among other practical models. Our Proximal-PL algorithm has nearly optimal excess risk that almost matches the strongly convex lower bound. Lastly, we provide shuffle DP variations of our algorithms, which do not require a trusted curator (e.g. for distributed learning).
    Towards Healing the Blindness of Score Matching. (arXiv:2209.07396v1 [stat.ML])
    Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss the blindness problem and propose a new family of divergences that can mitigate the blindness problem. We illustrate our proposed divergence in the context of density estimation and report improved performance compared to traditional approaches.
    Large-scale Stochastic Optimization of NDCG Surrogates for Deep Learning with Provable Convergence. (arXiv:2202.12183v4 [cs.LG] UPDATED)
    NDCG, namely Normalized Discounted Cumulative Gain, is a widely used ranking metric in information retrieval and machine learning. However, efficient and provable stochastic methods for maximizing NDCG are still lacking, especially for deep models. In this paper, we propose a principled approach to optimize NDCG and its top-$K$ variant. First, we formulate a novel compositional optimization problem for optimizing the NDCG surrogate, and a novel bilevel compositional optimization problem for optimizing the top-$K$ NDCG surrogate. Then, we develop efficient stochastic algorithms with provable convergence guarantees for the non-convex objectives. Different from existing NDCG optimization methods, the per-iteration complexity of our algorithms scales with the mini-batch size instead of the number of total items. To improve the effectiveness for deep learning, we further propose practical strategies by using initial warm-up and stop gradient operator. Experimental results on multiple datasets demonstrate that our methods outperform prior ranking approaches in terms of NDCG. To the best of our knowledge, this is the first time that stochastic algorithms are proposed to optimize NDCG with a provable convergence guarantee. Our proposed methods are implemented in the LibAUC library at https://libauc.org/.
    Do Residual Neural Networks discretize Neural Ordinary Differential Equations?. (arXiv:2205.14612v2 [cs.LG] UPDATED)
    Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify the distance between the ResNet's hidden state trajectory and the solution of its corresponding Neural ODE. Our bound is tight and, on the negative side, does not go to 0 with depth N if the residual functions are not smooth with depth. On the positive side, we show that this smoothness is preserved by gradient descent for a ResNet with linear residual functions and small enough initial loss. It ensures an implicit regularization towards a limit Neural ODE at rate 1 over N, uniformly with depth and optimization time. As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input. We then show that Heun's method, a second order ODE integration scheme, allows for better gradient estimation with the adjoint method when the residual functions are smooth with depth. We experimentally validate that our adjoint method succeeds at large depth, and that Heun method needs fewer layers to succeed. We finally use the adjoint method successfully for fine-tuning very deep ResNets without memory consumption in the residual layers.
    Distributed Sparse Linear Regression with Sublinear Communication. (arXiv:2209.07230v1 [stat.ML])
    We study the problem of high-dimensional sparse linear regression in a distributed setting under both computational and communication constraints. Specifically, we consider a star topology network whereby several machines are connected to a fusion center, with whom they can exchange relatively short messages. Each machine holds noisy samples from a linear regression model with the same unknown sparse $d$-dimensional vector of regression coefficients $\theta$. The goal of the fusion center is to estimate the vector $\theta$ and its support using few computations and limited communication at each machine. In this work, we consider distributed algorithms based on Orthogonal Matching Pursuit (OMP) and theoretically study their ability to exactly recover the support of $\theta$. We prove that under certain conditions, even at low signal-to-noise-ratios where individual machines are unable to detect the support of $\theta$, distributed-OMP methods correctly recover it with total communication sublinear in $d$. In addition, we present simulations that illustrate the performance of distributed OMP-based algorithms and show that they perform similarly to more sophisticated and computationally intensive methods, and in some cases even outperform them.
    Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients. (arXiv:2206.06295v3 [cs.LG] UPDATED)
    Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods-which we collectively refer to as Markov chain score ascent (MCSA) methods-can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.
    Stochastic first-order methods for average-reward Markov decision processes. (arXiv:2205.05800v5 [cs.LG] UPDATED)
    We study the problem of average-reward Markov decision processes (AMDPs) and develop novel first-order methods with strong theoretical guarantees for both policy evaluation and optimization. Existing on-policy evaluation methods suffer from sub-optimal convergence rates as well as failure in handling insufficiently random policies, e.g., deterministic policies, for lack of exploration. To remedy these issues, we develop a novel variance-reduced temporal difference (VRTD) method with linear function approximation for randomized policies along with sharp convergence guarantees, and an exploratory variance-reduced temporal difference (EVRTD) method for insufficiently random policies with comparable convergence guarantees. We further establish linear convergence rate on the bias of policy evaluation, which is essential for improving the overall sample complexity of policy optimization. On the other hand, compared with intensive research interest in finite sample analysis of policy gradient methods for discounted MDPs, existing studies on policy gradient methods for AMDPs mostly focus on regret bounds under restrictive assumptions on the underlying Markov processes (see, e.g., Abbasi-Yadkori et al., 2019), and they often lack guarantees on the overall sample complexities. Towards this end, we develop an average-reward variant of the stochastic policy mirror descent (SPMD) (Lan, 2022). We establish the first $\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity for solving AMDPs with policy gradient method under both the generative model (with unichain assumption) and Markovian noise model (with ergodic assumption). This bound can be further improved to $\widetilde{\mathcal{O}}(\epsilon^{-1})$ for solving regularized AMDPs. Our theoretical advantages are corroborated by numerical experiments.
    Low-rank Optimal Transport: Approximation, Statistics and Debiasing. (arXiv:2205.12365v2 [stat.ML] UPDATED)
    The matching principles behind optimal transport (OT) play an increasingly important role in machine learning, a trend which can be observed when OT is used to disambiguate datasets in applications (e.g. single-cell genomics) or used to improve more complex methods (e.g. balanced attention in transformers or self-supervised learning). To scale to more challenging problems, there is a growing consensus that OT requires solvers that can operate on millions, not thousands, of points. The low-rank optimal transport (LOT) approach advocated in \cite{scetbon2021lowrank} holds several promises in that regard, and was shown to complement more established entropic regularization approaches, being able to insert itself in more complex pipelines, such as quadratic OT. LOT restricts the search for low-cost couplings to those that have a low-nonnegative rank, yielding linear time algorithms in cases of interest. However, these promises can only be fulfilled if the LOT approach is seen as a legitimate contender to entropic regularization when compared on properties of interest, where the scorecard typically includes theoretical properties (statistical complexity and relation to other methods) or practical aspects (debiasing, hyperparameter tuning, initialization). We target each of these areas in this paper in order to cement the impact of low-rank approaches in computational OT.
    Fixed-Point Centrality for Networks. (arXiv:2209.07070v1 [eess.SY])
    This paper proposes a family of network centralities called fixed-point centralities. This centrality family is defined via the fixed point of permutation equivariant mappings related to the underlying network. Such a centrality notion is immediately extended to define fixed-point centralities for infinite graphs characterized by graphons. Variation bounds of such centralities with respect to the variations of the underlying graphs and graphons under mild assumptions are established. Fixed-point centralities connect with a variety of different models on networks including graph neural networks, static and dynamic games on networks, and Markov decision processes.
    Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates. (arXiv:1905.05285v2 [stat.ML] UPDATED)
    We establish the first nonasymptotic error bounds for Kaplan-Meier-based nearest neighbor and kernel survival probability estimators where feature vectors reside in metric spaces. Our bounds imply rates of strong consistency for these nonparametric estimators and, up to a log factor, match an existing lower bound for conditional CDF estimation. Our proof strategy also yields nonasymptotic guarantees for nearest neighbor and kernel variants of the Nelson-Aalen cumulative hazards estimator. We experimentally compare these methods on four datasets. We find that for the kernel survival estimator, a good choice of kernel is one learned using random survival forests.
    Distributed Online System Identification for LTI Systems Using Reverse Experience Replay. (arXiv:2207.01062v2 [cs.LG] UPDATED)
    Identification of linear time-invariant (LTI) systems plays an important role in control and reinforcement learning. Both asymptotic and finite-time offline system identification are well-studied in the literature. For online system identification, the idea of stochastic-gradient descent with reverse experience replay (SGD-RER) was recently proposed, where the data sequence is stored in several buffers and the stochastic-gradient descent (SGD) update performs backward in each buffer to break the time dependency between data points. Inspired by this work, we study distributed online system identification of LTI systems over a multi-agent network. We consider agents as identical LTI systems, and the network goal is to jointly estimate the system parameters by leveraging the communication between agents. We propose DSGD-RER, a distributed variant of the SGD-RER algorithm, and theoretically characterize the improvement of the estimation error with respect to the network size. Our numerical experiments certify the reduction of estimation error as the network size grows.
    Upper bounds on the Natarajan dimensions of some function classes. (arXiv:2209.07015v1 [stat.ML])
    The Natarajan dimension is a fundamental tool for characterizing multi-class PAC learnability, generalizing the Vapnik-Chervonenkis (VC) dimension from binary to multi-class classification problems. This note establishes upper bounds on Natarajan dimensions for certain function classes, including (i) multi-class decision tree and random forests, and (ii) multi-class neural networks with binary, linear and ReLU activations. These results may be relevant for describing the performance of certain multi-class learning algorithms.
    Limit Cycles of AdaBoost. (arXiv:2209.06928v1 [cs.LG])
    The iterative weight update for the AdaBoost machine learning algorithm may be realized as a dynamical map on a probability simplex. When learning a low-dimensional data set this algorithm has a tendency towards cycling behavior, which is the topic of this paper. AdaBoost's cycling behavior lends itself to direct computational methods that are ineffective in the general, non-cycling case of the algorithm. From these computational properties we give a concrete correspondence between AdaBoost's cycling behavior and continued fractions dynamics. Then we explore the results of this correspondence to expound on how the algorithm comes to be in this periodic state at all. What we intend for this work is to be a novel and self-contained explanation for the cycling dynamics of this machine learning algorithm.
    Langevin Autoencoders for Learning Deep Latent Variable Models. (arXiv:2209.07036v1 [cs.LG])
    Markov chain Monte Carlo (MCMC), such as Langevin dynamics, is valid for approximating intractable distributions. However, its usage is limited in the context of deep latent variable models owing to costly datapoint-wise sampling iterations and slow convergence. This paper proposes the amortized Langevin dynamics (ALD), wherein datapoint-wise MCMC iterations are entirely replaced with updates of an encoder that maps observations into latent variables. This amortization enables efficient posterior sampling without datapoint-wise iterations. Despite its efficiency, we prove that ALD is valid as an MCMC algorithm, whose Markov chain has the target posterior as a stationary distribution under mild assumptions. Based on the ALD, we also present a new deep latent variable model named the Langevin autoencoder (LAE). Interestingly, the LAE can be implemented by slightly modifying the traditional autoencoder. Using multiple synthetic datasets, we first validate that ALD can properly obtain samples from target posteriors. We also evaluate the LAE on the image generation task, and show that our LAE can outperform existing methods based on variational inference, such as the variational autoencoder, and other MCMC-based methods in terms of the test likelihood.
    Shifts 2.0: Extending The Dataset of Real Distributional Shifts. (arXiv:2206.15407v2 [cs.LG] UPDATED)
    Distributional shift, or the mismatch between training and deployment data, is a significant obstacle to the usage of machine learning in high-stakes industrial applications, such as autonomous driving and medicine. This creates a need to be able to assess how robustly ML models generalize as well as the quality of their uncertainty estimates. Standard ML baseline datasets do not allow these properties to be assessed, as the training, validation and test data are often identically distributed. Recently, a range of dedicated benchmarks have appeared, featuring both distributionally matched and shifted data. Among these benchmarks, the Shifts dataset stands out in terms of the diversity of tasks as well as the data modalities it features. While most of the benchmarks are heavily dominated by 2D image classification tasks, Shifts contains tabular weather forecasting, machine translation, and vehicle motion prediction tasks. This enables the robustness properties of models to be assessed on a diverse set of industrial-scale tasks and either universal or directly applicable task-specific conclusions to be reached. In this paper, we extend the Shifts Dataset with two datasets sourced from industrial, high-risk applications of high societal importance. Specifically, we consider the tasks of segmentation of white matter Multiple Sclerosis lesions in 3D magnetic resonance brain images and the estimation of power consumption in marine cargo vessels. Both tasks feature ubiquitous distributional shifts and a strict safety requirement due to the high cost of errors. These new datasets will allow researchers to further explore robust generalization and uncertainty estimation in new situations. In this work, we provide a description of the dataset and baseline results for both tasks.
    Lossy Image Compression with Conditional Diffusion Models. (arXiv:2209.06950v1 [eess.IV])
    Diffusion models are a new class of generative models that mark a milestone in high-quality image generation while relying on solid probabilistic principles. This makes them promising candidate models for neural image compression. This paper outlines an end-to-end optimized framework based on a conditional diffusion model for image compression. Besides latent variables inherent to the diffusion process, the model introduces an additional per-instance "content" latent variable to condition the denoising process. Upon decoding, the diffusion process conditionally generates/reconstructs an image using ancestral sampling. Our experiments show that this approach outperforms one of the best-performing conventional image codecs (BPG) and one neural codec on two compression benchmarks, where we focus on rate-perception tradeoffs. Qualitatively, our approach shows fewer decompression artifacts than the classical approach.
    Particle gradient descent model for point process generation. (arXiv:2010.14928v3 [stat.ML] UPDATED)
    This paper presents a statistical model for stationary ergodic point processes, estimated from a single realization observed in a square window. With existing approaches in stochastic geometry, it is very difficult to model processes with complex geometries formed by a large number of particles. Inspired by recent works on gradient descent algorithms for sampling maximum-entropy models, we describe a model that allows for fast sampling of new configurations reproducing the statistics of the given observation. Starting from an initial random configuration, its particles are moved according to the gradient of an energy, in order to match a set of prescribed moments (functionals). Our moments are defined via a phase harmonic operator on the wavelet transform of point patterns. They allow one to capture multi-scale interactions between the particles, while controlling explicitly the number of moments by the scales of the structures to model. We present numerical experiments on point processes with various geometric structures, and assess the quality of the model by spectral and topological data analysis.
    A Geometric Perspective on Variational Autoencoders. (arXiv:2209.07370v1 [stat.ML])
    This paper introduces a new interpretation of the Variational Autoencoder framework by taking a fully geometric point of view. We argue that vanilla VAE models unveil naturally a Riemannian structure in their latent space and that taking into consideration those geometrical aspects can lead to better interpolations and an improved generation procedure. This new proposed sampling method consists in sampling from the uniform distribution deriving intrinsically from the learned Riemannian latent space and we show that using this scheme can make a vanilla VAE competitive and even better than more advanced versions on several benchmark datasets. Since generative models are known to be sensitive to the number of training samples we also stress the method's robustness in the low data regime.
    Learning the conditional law: signatures and conditional GANs in filtering and prediction of diffusion processes. (arXiv:2204.00611v2 [stat.ML] UPDATED)
    We consider the filtering and prediction problem for a diffusion process. The signal and observation are modeled by stochastic differential equations (SDEs) driven by correlated Wiener processes. In classical estimation theory, measure-valued stochastic partial differential equations (SPDEs) are derived for the filtering and prediction measures. These equations can be hard to solve numerically. We provide an approximation algorithm using conditional generative adversarial networks (GANs) in combination with signatures, an object from rough path theory. The signature of a sufficiently smooth path determines the path completely. As a result, in some cases, GANs based on signatures have been shown to efficiently approximate the law of a stochastic process. For our algorithm we extend this method to sample from the conditional law, given noisy, partial observation. Our generator is constructed using neural differential equations (NDEs), relying on their universal approximator property. We show well-posedness in providing a rigorous mathematical framework. Numerical results show the efficiency of our algorithm.
    Estimating Classification Confidence Using Kernel Densities. (arXiv:2207.06529v3 [stat.ML] UPDATED)
    This paper investigates the post-hoc calibration of confidence for "exploratory" machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the "one-versus-all" approach (top-label calibration) must be used rather than the "calibrate-the-full-response-matrix" approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel, bulletproof algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs) as well as the classic MNIST benchmark. Finally, our analysis argues that post-hoc calibration should always be performed, should be based only on the test dataset, and should be sanity-checked visually.
    Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model. (arXiv:2005.12900v5 [cs.LG] UPDATED)
    This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator). We first consider $\gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $\mathcal{S}$ and action space $\mathcal{A}$. Despite a number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy is yet to be determined. In particular, all prior results suffer from a severe sample size barrier, in the sense that their claimed statistical guarantees hold only when the sample size exceeds at least $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^2}$. The current paper overcomes this barrier by certifying the minimax optimality of two algorithms -- a perturbed model-based algorithm and a conservative model-based algorithm -- as soon as the sample size exceeds the order of $\frac{|\mathcal{S}||\mathcal{A}|}{1-\gamma}$ (modulo some log factor). Moving beyond infinite-horizon MDPs, we further study time-inhomogeneous finite-horizon MDPs, and prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).
    Information Theoretic Measures of Causal Influences during Transient Neural Events. (arXiv:2209.07508v1 [q-bio.NC])
    Transient phenomena play a key role in coordinating brain activity at multiple scales, however,their underlying mechanisms remain largely unknown. A key challenge for neural data science is thus to characterize the network interactions at play during these events. Using the formalism of Structural Causal Models and their graphical representation, we investigate the theoretical and empirical properties of Information Theory based causal strength measures in the context of recurring spontaneous transient events. After showing the limitations of Transfer Entropy and Dynamic Causal Strength in such a setting, we introduce a novel measure, relative Dynamic Causal Strength, and provide theoretical and empirical support for its benefits. These methods are applied to simulated and experimentally recorded neural time series, and provide results in agreement with our current understanding of the underlying brain circuits.
    OOD Link Prediction Generalization Capabilities of Message-Passing GNNs in Larger Test Graphs. (arXiv:2205.15117v4 [cs.LG] UPDATED)
    This work provides the first theoretical study on the ability of graph Message Passing Neural Networks (gMPNNs) -- such as Graph Neural Networks (GNNs) -- to achieve counterfactually-invariant representations for inductive out-of-distribution (OOD) link prediction tasks, where deployment (test) graph sizes are larger than training graphs. We first prove non-asymptotic bounds showing that link predictors based on permutation-equivariant (structural) node embeddings obtained by gMPNNs can converge to a random guess as test graphs get larger. We then propose a theoretically-sound gMPNN that outputs structural pairwise (2-node) embeddings and prove non-asymptotic bounds showing that, as test graphs grow, these embeddings converge to embeddings of a continuous function that retains its ability to predict links OOD. Empirical results on random graphs show agreement with our theoretical results.
    Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries. (arXiv:2209.06931v1 [cs.LG])
    The widespread adoption of deep neural networks in computer vision applications has brought forth a significant interest in adversarial robustness. Existing research has shown that maliciously perturbed inputs specifically tailored for a given model (i.e., adversarial examples) can be successfully transferred to another independently trained model to induce prediction errors. Moreover, this property of adversarial examples has been attributed to features derived from predictive patterns in the data distribution. Thus, we are motivated to investigate the following question: Can adversarial defenses, like adversarial examples, be successfully transferred to other independently trained models? To this end, we propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE). After examining theoretical motivation and implications, we experimentally show that our method can provide adversarial robustness to multiple independently pre-trained classifiers that are otherwise ineffective against an adaptive white box adversary. Furthermore, we show that RTFEs can even provide one-shot adversarial robustness to models independently trained on different datasets.
    Asymptotic Statistical Analysis of $f$-divergence GAN. (arXiv:2209.06853v1 [math.ST])
    Generative Adversarial Networks (GANs) have achieved great success in data generation. However, its statistical properties are not fully understood. In this paper, we consider the statistical behavior of the general $f$-divergence formulation of GAN, which includes the Kullback--Leibler divergence that is closely related to the maximum likelihood principle. We show that for parametric generative models that are correctly specified, all $f$-divergence GANs with the same discriminator classes are asymptotically equivalent under suitable regularity conditions. Moreover, with an appropriately chosen local discriminator, they become equivalent to the maximum likelihood estimate asymptotically. For generative models that are misspecified, GANs with different $f$-divergences {converge to different estimators}, and thus cannot be directly compared. However, it is shown that for some commonly used $f$-divergences, the original $f$-GAN is not optimal in that one can achieve a smaller asymptotic variance when the discriminator training in the original $f$-GAN formulation is replaced by logistic regression. The resulting estimation method is referred to as Adversarial Gradient Estimation (AGE). Empirical studies are provided to support the theory and to demonstrate the advantage of AGE over the original $f$-GANs under model misspecification.  ( 2 min )
    Sample and Computationally Efficient Stochastic Kriging in High Dimensions. (arXiv:2010.06802v5 [stat.ME] UPDATED)
    Stochastic kriging has been widely employed for simulation metamodeling to predict the response surface of complex simulation models. However, its use is limited to cases where the design space is low-dimensional because, in general, the sample complexity (i.e., the number of design points required for stochastic kriging to produce an accurate prediction) grows exponentially in the dimensionality of the design space. The large sample size results in both a prohibitive sample cost for running the simulation model and a severe computational challenge due to the need to invert large covariance matrices. Based on tensor Markov kernels and sparse grid experimental designs, we develop a novel methodology that dramatically alleviates the curse of dimensionality. We show that the sample complexity of the proposed methodology grows only slightly in the dimensionality, even under model misspecification. We also develop fast algorithms that compute stochastic kriging in its exact form without any approximation schemes. We demonstrate via extensive numerical experiments that our methodology can handle problems with a design space of more than 10,000 dimensions, improving both prediction accuracy and computational efficiency by orders of magnitude relative to typical alternative methods in practice.  ( 3 min )
    Fitting an immersed submanifold to data via Sussmann's orbit theorem. (arXiv:2204.01119v3 [cs.LG] UPDATED)
    This paper describes an approach for fitting an immersed submanifold of a finite-dimensional Euclidean space to random samples. The reconstruction mapping from the ambient space to the desired submanifold is implemented as a composition of an encoder that maps each point to a tuple of (positive or negative) times and a decoder given by a composition of flows along finitely many vector fields starting from a fixed initial point. The encoder supplies the times for the flows. The encoder-decoder map is obtained by empirical risk minimization, and a high-probability bound is given on the excess risk relative to the minimum expected reconstruction error over a given class of encoder-decoder maps. The proposed approach makes fundamental use of Sussmann's orbit theorem, which guarantees that the image of the reconstruction map is indeed contained in an immersed submanifold.  ( 2 min )
    Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits. (arXiv:2209.06983v1 [stat.ML])
    We propose a novel contextual bandit algorithm for generalized linear rewards with an $\tilde{O}(\sqrt{\kappa^{-1} \phi T})$ regret over $T$ rounds where $\phi$ is the minimum eigenvalue of the covariance of contexts and $\kappa$ is a lower bound of the variance of rewards. In several practical cases where $\phi=O(d)$, our result is the first regret bound for generalized linear model (GLM) bandits with the order $\sqrt{d}$ without relying on the approach of Auer [2002]. We achieve this bound using a novel estimator called double doubly-robust (DDR) estimator, a subclass of doubly-robust (DR) estimator but with a tighter error bound. The approach of Auer [2002] achieves independence by discarding the observed rewards, whereas our algorithm achieves independence considering all contexts using our DDR estimator. We also provide an $O(\kappa^{-1} \phi \log (NT) \log T)$ regret bound for $N$ arms under a probabilistic margin condition. Regret bounds under the margin condition are given by Bastani and Bayati [2020] and Bastani et al. [2021] under the setting that contexts are common to all arms but coefficients are arm-specific. When contexts are different for all arms but coefficients are common, ours is the first regret bound under the margin condition for linear models or GLMs. We conduct empirical studies using synthetic data and real examples, demonstrating the effectiveness of our algorithm.  ( 3 min )
    Feature Selection integrated Deep Learning for Ultrahigh Dimensional and Highly Correlated Feature Space. (arXiv:2209.07011v1 [stat.ML])
    In recent years, deep learning has been a topic of interest in almost all disciplines due to its impressive empirical success in analyzing complex data sets, such as imaging, genetics, climate, and medical data. While most of the developments are treated as black-box machines, there is an increasing interest in interpretable, reliable, and robust deep learning models applicable to a broad class of applications. Feature-selected deep learning is proven to be promising in this regard. However, the recent developments do not address the situations of ultra-high dimensional and highly correlated feature selection in addition to the high noise level. In this article, we propose a novel screening and cleaning strategy with the aid of deep learning for the cluster-level discovery of highly correlated predictors with a controlled error rate. A thorough empirical evaluation over a wide range of simulated scenarios demonstrates the effectiveness of the proposed method by achieving high power while having a minimal number of false discoveries. Furthermore, we implemented the algorithm in the riboflavin (vitamin $B_2$) production dataset in the context of understanding the possible genetic association with riboflavin production. The gain of the proposed methodology is illustrated by achieving lower prediction error compared to other state-of-the-art methods.  ( 2 min )
    Adversarially Robust Learning: A Generic Minimax Optimal Learner and Characterization. (arXiv:2209.07369v1 [cs.LG])
    We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Interestingly, we find that this requires new algorithmic ideas and approaches to adversarially robust learning. In particular, we show, in a strong negative sense, the suboptimality of the robust learner proposed by Montasser, Hanneke, and Srebro (2019) and a broader family of learners we identify as local learners. Our results are enabled by adopting a global perspective, specifically, through a key technical contribution: the global one-inclusion graph, which may be of independent interest, that generalizes the classical one-inclusion graph due to Haussler, Littlestone, and Warmuth (1994). Finally, as a byproduct, we identify a dimension characterizing qualitatively and quantitatively what classes of predictors $\mathcal{H}$ are robustly learnable. This resolves an open problem due to Montasser et al. (2019), and closes a (potentially) infinite gap between the established upper and lower bounds on the sample complexity of adversarially robust learning.  ( 2 min )
    Estimating large causal polytree skeletons from small samples. (arXiv:2209.07028v1 [stat.ME])
    We consider the problem of estimating the skeleton of a large causal polytree from a relatively small i.i.d. sample. This is motivated by the problem of determining causal structure when the number of variables is very large compared to the sample size, such as in gene regulatory networks. We give an algorithm that recovers the tree with high accuracy in such settings. The algorithm works under essentially no distributional or modeling assumptions other than some mild non-degeneracy conditions.  ( 2 min )
    Observable adjustments in single-index models for regularized M-estimators. (arXiv:2204.06990v2 [math.ST] UPDATED)
    We consider observations $(X,y)$ from single index models with unknown link function, Gaussian covariates and a regularized M-estimator $\hat\beta$ constructed from convex loss function and regularizer. In the regime where sample size $n$ and dimension $p$ are both increasing such that $p/n$ has a finite limit, the behavior of the empirical distribution of $\hat\beta$ and the predicted values $X\hat\beta$ has been previously characterized in a number of models: The empirical distributions are known to converge to proximal operators of the loss and penalty in a related Gaussian sequence model, which captures the interplay between ratio $p/n$, loss, regularization and the data generating process. This connection between$(\hat\beta,X\hat\beta)$ and the corresponding proximal operators require solving fixed-point equations that typically involve unobservable quantities such as the prior distribution on the index or the link function. This paper develops a different theory to describe the empirical distribution of $\hat\beta$ and $X\hat\beta$: Approximations of $(\hat\beta,X\hat\beta)$ in terms of proximal operators are provided that only involve observable adjustments. These proposed observable adjustments are data-driven, e.g., do not require prior knowledge of the index or the link function. These new adjustments yield confidence intervals for individual components of the index, as well as estimators of the correlation of $\hat\beta$ with the index. The interplay between loss, regularization and the model is thus captured in a data-driven manner, without solving the fixed-point equations studied in previous works. The results apply to both strongly convex regularizers and unregularized M-estimation. Simulations are provided for the square and logistic loss in single index models including logistic regression and 1-bit compressed sensing with 20\% corrupted bits.  ( 3 min )
    Decision making in cancer: Causal questions require causal answers. (arXiv:2209.07397v1 [cs.LG])
    Treatment decisions in cancer care are guided by treatment effect estimates from randomized controlled trials (RCTs). RCTs estimate the average effect of one treatment versus another in a certain population. However, treatments may not be equally effective for every patient in a population. Knowing the effectiveness of treatments tailored to specific patient and tumor characteristics would enable individualized treatment decisions. Getting tailored treatment effects by averaging outcomes in different patient subgroups in RCTs requires an unfeasible number of patients to have sufficient statistical power in all relevant subgroups for all possible treatments. The American Joint Committee on Cancer (AJCC) recommends that researchers develop outcome prediction models (OPMs) in an effort to individualize treatment decisions. OPMs sometimes called risk models or prognosis models, use patient and tumor characteristics to predict a patient outcome such as overall survival. The assumption is that the predictions are useful for treatment decisions using rules such as "prescribe chemotherapy only if the OPM predicts the patient has a high risk of recurrence". Recognizing the importance of reliable predictions, the AJCC published a checklist for OPMs to ensure dependable OPM prediction accuracy in the patient population for which the OPM was designed. However, accurate outcome predictions do not imply that these predictions yield good treatment decisions. In this perspective, we show that OPM rely on a fixed treatment policy which implies that OPM that were found to accurately predict outcomes in validation studies can still lead to patient harm when used to inform treatment decisions. We then give guidance on how to develop models that are useful for individualized treatment decisions and how to evaluate whether a model has value for decision-making.  ( 3 min )
    Statistical monitoring of models based on artificial intelligence. (arXiv:2209.07436v1 [stat.ME])
    The rapid advancement of models based on artificial intelligence demands innovative monitoring techniques which can operate in real time with low computational costs. In machine learning, especially if we consider neural network (NN) learning algorithms, and in particular deep-learning architectures, the models are often trained in a supervised manner. Consequently, the learned relationship between the input and the output must remain valid during the model's deployment. If this stationarity assumption holds, we can conclude that the NN generates accurate predictions. Otherwise, the retraining or rebuilding of the model is required. We propose to consider the latent feature representation of the data (called "embedding") generated by the NN for determining the time point when the data stream starts being nonstationary. To be precise, we monitor embeddings by applying multivariate control charts based on the calculation of the data depth and normalized ranks. The performance of the introduced method is evaluated using various NNs with different underlying data formats.  ( 2 min )
    Risk-aware linear bandits with convex loss. (arXiv:2209.07154v1 [stat.ML])
    In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such as mean-variance or conditional value-at-risk (CVaR), can be of interest for critical applications (healthcare, agriculture). Algorithms have been proposed for such risk-aware measures under bandit feedback without contextual information. In this work, we study contextual bandits where such risk measures can be elicited as linear functions of the contexts through the minimization of a convex loss. A typical example that fits within this framework is the expectile measure, which is obtained as the solution of an asymmetric least-square problem. Using the method of mixtures for supermartingales, we derive confidence sequences for the estimation of such risk measures. We then propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits. This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent, at the cost of slightly higher regret. We conclude by evaluating the resulting algorithms on numerical experiments.  ( 2 min )
    Semiparametric Best Arm Identification with Contextual Information. (arXiv:2209.07330v1 [cs.LG])
    We study best-arm identification with a fixed budget and contextual (covariate) information in stochastic multi-armed bandit problems. In each round, after observing contextual information, we choose a treatment arm using past observations and current context. Our goal is to identify the best treatment arm, a treatment arm with the maximal expected reward marginalized over the contextual distribution, with a minimal probability of misidentification. First, we derive semiparametric lower bounds for this problem, where we regard the gaps between the expected rewards of the best and suboptimal treatment arms as parameters of interest, and all other parameters, such as the expected rewards conditioned on contexts, as the nuisance parameters. We then develop the "Contextual RS-AIPW strategy," which consists of the random sampling (RS) rule tracking a target allocation ratio and the recommendation rule using the augmented inverse probability weighting (AIPW) estimator. Our proposed Contextual RS-AIPW strategy is optimal because the upper bound for the probability of misidentification matches the semiparametric lower bound when the budget goes to infinity, and the gaps converge to zero.  ( 2 min )

  • Open

    [D] Using a lip-reading network as a training loss for face performance reenactment
    One thing I noticed with facial performance transfer networks is often the reenactment tends to poorly imitate the mouth movements of the source actor. The overall shape is preserved fairly well, but subtle "visemes" (visual analog of "phonemes") like those produced when pronouncing consonants seem to regress to the mean. In 2016, a project called LipNet (https://www.youtube.com/watch?v=wg3upHE8qJw) had shown impressive results in predicting dialogue from viseme inputs only (no audio inputs). Therefore, perhaps a similar network could be used to produce a loss term when training a facial reenactment network: If the lip-reading predictions of the input and output are more similar, the loss is lower. (A possible design for this classification network could be something similar to CLIP: Given a text transcript converted to the phonetic alphabet, and a short video sequence cropped to the mouth region, the classifier is trained using contrastive learning to pair the two, using the dot product between their two vectors to judge similarity. Then after the network is trained, two input video sequences can be compared with or without a text transcript.) Has something like this been attempted already? If so, does it improve the results? Thanks for your attention! submitted by /u/zergling103 [link] [comments]  ( 90 min )
    [N] Large OpenCLIP released!
    A new open source CLIP model was released at LAION. Very interesting results. This will probably power next iterations of stable diffusion. https://laion.ai/blog/large-openclip/ submitted by /u/nightshadew [link] [comments]  ( 88 min )
    Anyone wanna make a think tank? [P]
    Hello, I want to make a think tank that goes beyond what think tanks are usually known for. An organization that cumulatively resolves the problems of society through innovation and the active fostering of it. Moving past my lofty goals of delusional grandeur what I do have as of now is an established company; I.S.P.A. Industries, and a website that has basic descriptions of some of the ideas I have more thought out/ designed in my head. I'm hoping to find people who want to look over what I have, help me develop it, and to get the same courtesy in return. We currently have 3 people involved and are looking to add a few more people to the core group who will have ownership rights to the company. IF you ever had that spark of genius for a design, program, invention, etc. but then lost it because it was a bit beyond your reach in making then we are the group for you. I know I have struggled with the various barriers that feed into that issue myself for as long as I can remember and intend to bring down those barriers. I want to make a company that solves that issue by directly providing in house services to get ideas made and out into the market without the extortionary price tag that some company's that provide similar services require to even start the process. I plan to do so by making the products this company makes from it's various members the main source of revenue so that we can foster new inventors and bring their respective products into the cumulative fold. This whole idea is based on the idea that we can bring many people, from all walks of life together and then pool our cumulative resources to make some initial products. If we can manage that then we can grow from there. With that in mind feel free to DM me, leave a comment, or maybe share this post with someone you think might be interested. Let's bring our innovative talents together and do something big. submitted by /u/38931841Hz [link] [comments]  ( 90 min )
    [D] Using machine learning models to study glossolalia
    Does anyone know of any glossolalia data sets? Or perhaps papers that use modern speech models to study glossolalia? I couldn't find anything on Google Scholar but it might just be too niche a topic. Or maybe I'm using the wrong terms. submitted by /u/carmichael561 [link] [comments]  ( 88 min )
    [N] Researcher implemented a neuromorphic architecture on the FPGA-based IBM supercomputer and ran a neural net on it
    I remember asking a while ago whether it makes sense to run neural nets in an architecture more resembling biological brain, and apparently there are efforts to accomplish exactly that: https://www.fz-juelich.de/en/news/archive/feature-stories/faster-than-the-biological-model In particular, this architecture allows to circumvent von Neumann bottleneck, which is manifested in latencies due to separation of memory and processing units in the classical computer architecture, thus resembling human brain more closely. submitted by /u/stockabuse [link] [comments]  ( 98 min )
    [P] Model for Instance Segmentation
    Hey. I am new to the field. However I need a model that could do instance segmentation. At least label person, dog and cat at the image. Also I need to somehow convert it to use on iOS. Could someone please suggest some models that could possibly fit? Thanks in advance! submitted by /u/anokmik [link] [comments]  ( 89 min )
    [D] Transformer model and permutation invariance
    Transformers are designed to work well with natural languages and sequences where order matters. Positional encoding is used to take relative order of a token in a sequence. Is there a way to use it on a set of strings to test if the strings have similar meaning or coherence in general. Example {"Border Collie", "Doberman", "Pug"} should be classified as positive since all of them belong to the same category. {"Mango", "Cat", "some gibberish string", "Another GiBBerish STring", "treees"} should be classified as negative because there is the set is incoherent. In these kind of datasets, the relative order matters for individual elements, but on the whole order doesn't matter. Concretely, {"Border Collie", "Doberman", "Pug"} and {"Pug", "Border Collie", "Doberman"} should map to the same class(positive ); {"obrdr doliye", "mandober", "Pug"} should be classified as negative because the extra characters in the first element and shuffling of characters in the second element results in the loss of meaning and the set as a whole is incoherent. Is there a way to get transformer model to work on these kind of inputs? submitted by /u/legitimate-coffer [link] [comments]  ( 91 min )
    [N] CfP: Workshop on Behavior-driven Autonomous Driving in Unstructured Environments at IROS 2022 in Kyoto, Japan on 27th October 2022.
    Hello, We are accepting short (4+n) or long paper (8+n) contributions to our workshop, "Behavior-driven Autonomous Driving in Unstructured Environments (BADUE22)", to be held at IROS 2022 in Kyoto, Japan on 27th October 2022. We encourage the submission of early ideas, late-breaking results, position papers, or open research questions that are likely to generate interesting discussions. Work published elsewhere is allowed. Accepted papers will be presented in a poster session and selected papers as spotlight talks. All submitted contributions will go through a single blind review process. Deadline: Sept. 20, 2022 (AoE). Website: https://gamma.umd.edu/workshops/badue22/ Submit: https://cmt3.research.microsoft.com/BADUE2022/Submission/Index/ Contact organizer via email: rchandra@utexas.…  ( 92 min )
    [D] Parallelize episodes in RL
    Hi r/ML, I have an RL setup where an episode involves training a NN and obtaining performance. And since I want to use batches of episodes to make updates using Policy Gradient, I'm wondering if there is a simple codebase which allows me to train NNs in parallel with no communication (no grad averaging and splitting data as in DDP) submitted by /u/selfsupervisedbot [link] [comments]  ( 90 min )
    [D] How does one choose a learning rate schedule for models that take days or weeks to train?
    I'm currently using AdamW and find that an exponential decay schedule gives far better results than a fixed learning rate. I've used optuna to do bayesian optimization of my hyper-parameters including learning rate schedule and I just choose 300 epochs so trials would complete in a reasonable amount of time. However, I can train the winner above 1000 epochs and the validation loss continues to drop (around 1700 it starts to overfit). I imagine if I did another search over learning rate schedules using 2000 epochs that I'd get a different schedule and that would continue to do better if trained even longer as well. With Optuna, I stopped using pruners (like asynchronous successive halving) because I don't think the validation loss early in the training process says much about the final performance, and instead would be biased towards schedules with fast decays. So how do these projects that commit to a model and train it for days or weeks choose a schedule that isn't going to be 100x to cautious or end up overfitting 20% into their training budget? I'd imagine there's a method for dynamically adjusting the learning rate based on generalization error and the derivative of validation loss. I.E. if it starts overfitting then bump up the learning rate to get out of that local minimum and try to settle into a new one. But I haven't found any papers in that direction. submitted by /u/elbiot [link] [comments]  ( 93 min )
    [D] Pre-trained networks and batch normalization
    When we take a pre-trained network, e.g., ResNet50 on ImageNet, and want to apply it to a new dataset, what we typically do is: Freeze the backbone, but keep the classifier trainable Train until convergence Unfreeze the backbone and train with a low learning rate until convergence However, I noticed that when we freeze a network with batch normalization layers, the following parameters are still being updated because the batch normalization layers are in training mode: running_mean, running_var, num_batches_tracked (in PyTorch). I think this is okay, because this means that the pre-trained network's normalization statistics are adapting to the new data which may have a different distribution. If I understand correctly, this is what we want. We want the batch normalization statistics to be updated to the new distribution of the new dataset. If we kept the old statistics from the old dataset, our normalization might fail to produce actually normal outputs. I just want to check if this is a correct interpretation. submitted by /u/RaptorDotCpp [link] [comments]  ( 107 min )
    [D] Citing blog posts
    I would like to make a point in an upcoming paper that was inspired by a blog post critiquing earlier work. Philosophically it is correct to cite the blog post but I am worried that reviewers would consider this a red flag even though the thought only exists in blog form. Is this correct? Another option would be to acknowledge the blog post author for helpful discussions if/when the paper is accepted. submitted by /u/Swimming-Pool397 [link] [comments]  ( 94 min )
    [N] AI/ML Model API Design, Numerical Stability, and More Models from Scratch! (stylepoint)
    Hi folks, stylepoint here. I have uploaded some new videos and posting the news here - maybe people find them helpful (: Also, some folks who have not seen the previous post might discover this one (: New vids: AI/ML Model API Design and Numerical Stability (follow-up) Implement - Linear Regression The first one is a follow-up video I have decided to upload in order to make things clear and update some of our model implementations. I made some further changes based on your and others' feedback and suggestions: All of the videos on the channel have timestamps now! (so you can skip chapters or just skim over the vids). I have increased the font size of the terminal so hoping the code is a lot more readable now. Increased the volume for the new video (should be better moving …  ( 91 min )
    [R] SpaceRobotEnv is an open-sourced environments for trajectory planning of free-floating space robots.
    SpaceRobotEnv is an open-sourced environments for trajectory planning of free-floating space robots. Reaching high-level planning accuracy, bimanual coordination and end-to-end control remains an open challenge for space robotics researchers. To better help the community study this problem, SpaceRobotEnv are developed with the following key features: Real Space Environment; Dynamic coupling control; Image input. URL: https://github.com/Tsinghua-Space-Robot-Learning-Group/SpaceRobotEnv. Note: our repo can be found in the OpenAI Gym Documentation now. Please see SpaceRobotEnv. Hope everyone enjoy it! https://reddit.com/link/xek9im/video/0migj29dlxn91/player submitted by /u/Shengjie_Wang [link] [comments]  ( 103 min )
    [R] Toy Models of Superposition
    https://transformer-circuits.pub/2022/toy_model/index.html It would be very convenient if the individual neurons of artificial neural networks corresponded to cleanly interpretable features of the input. For example, in an “ideal” ImageNet classifier, each neuron would fire only in the presence of a specific visual feature, such as the color red, a left-facing curve, or a dog snout. Empirically, in models we have studied, some of the neurons do cleanly map to features. But it isn't always the case that features correspond so cleanly to neurons, especially in large language models where it actually seems rare for neurons to correspond to clean features. This brings up many questions. Why is it that neurons sometimes align with features and sometimes don't? Why do some models and tasks have many of these clean neurons, while they're vanishingly rare in others? In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition. When features are sparse, superposition allows compression beyond what a linear model would do, at the cost of "interference" that requires nonlinear filtering. submitted by /u/shitboots [link] [comments]  ( 104 min )
    [Discussion] Is there a way to generate 3D mesh file (e.g, stl file) from 3D construction models such as Gan2Shape?
    The model can output multiple images with different viewpoints? How can we combine them into a 3D mesh model file??? many thanks :) submitted by /u/JC1DA [link] [comments]  ( 88 min )
  • Open

    A friggin beautiful DNA strand Nebula for Visual Stimulation
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    Google and Oxford scientists publish paper claiming ai will "likely" annihilate humankind
    submitted by /u/estasfuera [link] [comments]  ( 87 min )
    AI virtual assistant for the blind or visually impaired
    Hi guys, I developed a prototype virtual AI assistant called Luna to assist visually impaired people. In the current implementation, Luna is successfully able to have conversations and respond to enquiries about the following topics: a) things it can see b) if there is food to eat c) whether there is anything to drink d) threats in this scene e) the location of the user. Below link is to my LinkedIn post where a 40 second demo. https://www.linkedin.com/posts/rishil-darne-78326658_artificialintelligence-computervision-ai-activity-6976184548300623872-XU92?utm_source=share&utm_medium=member_android submitted by /u/Avatar_d2 [link] [comments]  ( 100 min )
    The Green Swan: On the Usefulness of Logic in AI
    submitted by /u/CardboardDreams [link] [comments]  ( 90 min )
    Jasper AI
    Is it worth the monthly subscription? I was not impressed with the demo at all and currently looking at writing AI options to help with my novel writing. More of a creative thought process for me. Let me know thanks guys submitted by /u/leg18 [link] [comments]  ( 87 min )
    Image Gen from Stable Diffusion, Prompt: "portrait photo of a cowboy, high resolution"
    ​ https://preview.redd.it/upfqiyhjd2o91.png?width=512&format=png&auto=webp&s=7acc95da5cf8144752c55c41cf55122b43b0685f submitted by /u/account_name4 [link] [comments]  ( 87 min )
    Loab, the Internet's Latest Urban Legend, Is Worse Than Anything
    submitted by /u/trueslicky [link] [comments]  ( 86 min )
    Any advice on learning AI?
    I have a project in mind that involves AI, mostly computer vision. I still don't know much about AI, only the basics like the different types of machine learning. Any advice on what to learn and any ressources that could be useful? submitted by /u/Moemen02 [link] [comments]  ( 88 min )
    has anyone attempted to make a Moore's law type of metric for AI yet?
    We all know and love/hate Moore's law, but has anyone attempted to put together a similar metric for AI? Maybe it would need to be domain specific like performance in NLP, or CV tasks. Perhaps an aggregate of multiple benchmarks across multiple fields to test generality? It could even be a simple as the number of nodes in a production NN, or something. It would be interesting to follow a metric like this over the next decade as I'm sure it would double faster than 18 months. submitted by /u/TrainquilOasis1423 [link] [comments]  ( 93 min )
    Is anyone working on an AI that can translate legalese, or other over-explained concepts?
    I just had an idea I thought would be really useful and was wondering if anyone was working on it. I was wondering if an AI could scan a legal document (such as a EULA), and cut out all the unnecessary stuff and provide a plain explanation for what it says. I doubt this would ever replace a real lawyer, but it could be super helpful for certain stuff. Some other documents like instruction manuals are also over-explained and it could be extremely useful to have it trimmed down to something more manageable/reasonable. submitted by /u/SniperFiction [link] [comments]  ( 87 min )
    What is the difference between an algorithm and an artificial intelligence?
    My understanding is an artificial intelligence is an algorithm with more than one action which is chosen based on state. However this would make any algorithm with an if statement an AI, which sounds off… Can somebody clarify? submitted by /u/shsuGknGe5Bvd685hH [link] [comments]  ( 99 min )
    Half dragon man from love death and robots
    submitted by /u/xHypnotist [link] [comments]  ( 86 min )
    Ai generate love story
    submitted by /u/Due-Ad9795 [link] [comments]  ( 90 min )
    Deep Learning-Powered Speech Recognition Service for Subtitling
    Hi everyone, Our team has built an automated Speech Recognition service that generates subtitle files for any video or audio file and can translate into 45+ languages. It's powered fully by Deep Learning and beats other implementations like YouTube captions. If you'd like to try it with your own content, it's free to create an account and use. Learn more about it here: https://www.smartmine.net/video-services/subtitling-description Try it free here: https://ai.smartmine.net/service/speech-recognition/captioning It has some great features like: Accurate subtitles powered by DL Speech recognition in 11 languages DL-powered translation into 45+ languages Multiple speaker recognition Subtitle editor to fine-tune results The ASR methodology is best described by this paper: https://arxiv.org/pdf/2111.09296.pdf We re-trained the models using the Mozilla Common Voice data set (a lot of other implementations use the LibriSpeech data set, but it's much more limited and renders worse results). Training was performed on a cluster of 8 RTX 3090 GPUs (the 24GB of memory is really helpful for using larger sequence lengths). We're a small company, so we would really appreciate any feedback you have! submitted by /u/aL_eX49 [link] [comments]  ( 88 min )
    Stable Diffusion Artificial Intelligence Tree Art 🌳
    submitted by /u/FreshRelaxation [link] [comments]  ( 87 min )
    Stable Diffusion experiment AI img2img - Julie Gautier underwater dance as an action toy doll
    submitted by /u/navalguijo [link] [comments]  ( 88 min )
    AI music video showcasing human chaos
    submitted by /u/LightOfAntara [link] [comments]  ( 89 min )
    Project Ideas
    Hello! In my AI Subject we have to research and implement a working AI starting by writing the state of art,then doing the prototype and which tools are going to be used and then developing the AI. We are totally free to do whatever you want, provided that we can implement the AI. I don't have a good imagination/creativity so i came here asking for ideias. It's my first subject going deep into AI so my knowledge in developing and all the learning types is pretty superficial so it has to be something that a beginner like me can "easilly" develop or learn to do it. We can use whatever tools we want be it Unity, OpenAI,etc and create whatever we want. I have some insights in ML(Deep Learning and Reinformecent Learning) since in my last subject we learned it in the theory but never implemented something like it. So i appreciate all ideas,where to start,what to use! The due date is in december but i have to write the state of art until mid October. submitted by /u/alfaces12 [link] [comments]  ( 93 min )
    Any other AI about upscaling?
    I know Real-ESRGAN any others? submitted by /u/typcalthowawayacount [link] [comments]  ( 87 min )
    Research on EARLY RISK PREDICTION ON THE INTERNET
    Help us!! We are a team of academic researchers interested in psychology and natural language use. We are currently interested in gathering some data from people with no psychological disorders. More information: https://erisk.irlab.org/ We would greatly appreciate if you could fill the questionnaire attached. It takes 2 minutes :) It is a standard inventory of questions used by psychologists. Note that the questionnaire contains a field in which the respondent has to provide his/her Reddit username. This would help us to link word use (as extracted from your Reddit's public submissions) with your responses to the questionnaire. Of course, we will treat the information you provide with the utmost confidentiality and privacy. All information we will extract from Reddit will be anonymised. Link to the questionnaire: https://forms.gle/PkWyB64aAu6BQTqi6 Best regards David E. Losada, Univ. Santiago de Compostela, Spain ([david.losada@usc.es](mailto:david.losada@usc.es)) Fabio Crestani, Univ. della Svizzera Italiana, Switzerland ([fabio.crestani@usi.ch](mailto:fabio.crestani@usi.ch)) Javier Parapar, Univ. A Coruña, Spain ([javierparapar@udc.es](mailto:javierparapar@udc.es)) Patricia Martin-Rodilla, Univ. A Coruña, Spain ([patricia.martin.rodilla@udc.es](mailto:patricia.martin.rodilla@udc.es) ) submitted by /u/pamroda [link] [comments]  ( 87 min )
    A picture of my father in the 70s colorised with palette fm (basic palette)
    submitted by /u/Greatgg [link] [comments]  ( 90 min )
    12th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART)
    Hello colleagues, We are organizing the 12th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART) and we think it may be of interest to many of you. The conference will take place in Brno, Czech Republic, between 12 and 14 April 2023. If you work with Artificial Intelligence techniques applied to visual art, music, sound synthesis, architecture, video, poetry, design or other creative tasks, you can present your work at this conference. If not, it is also a great opportunity to know all the news of research in these fields. For more information, visit the event's webpage: https://www.evostar.org/2023/evomusart/ ​ https://preview.redd.it/w1v10wr0gzn91.png?width=4167&format=png&auto=webp&s=d5f0e9e008f631ea8aec12cfe2e2d0639b336d76 submitted by /u/evomusart_conference [link] [comments]  ( 88 min )
    IT worker uses Midjourney to create stunning 706-page sci-fi graphic novel
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 87 min )
    How Much "Hard-Coding" is required to understand and Create AI models?
    How much actual coding is required? I guess how much time should I spend on learning Java, if I understand the basics? submitted by /u/Lucky_Plastic_3113 [link] [comments]  ( 87 min )
  • Open

    Meet the Omnivore: Christopher Scott Constructs Architectural Designs, Virtual Environments With NVIDIA Omniverse
    Growing up in a military family, Christopher Scott moved more than 30 times, which instilled in him “the ability to be comfortable with, and even motivated by, new environments,” he said. The post Meet the Omnivore: Christopher Scott Constructs Architectural Designs, Virtual Environments With NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 6 min )
    GFN Thursday Delivers Seven New Games This Week
    TGIGFNT: thank goodness it’s GFN Thursday. Start your weekend early with seven new games joining the GeForce NOW library of over 1,400 titles. Whether it’s streaming on an older-than-the-dinosaurs PC, a Mac that normally couldn’t dream of playing PC titles, or mobile devices – it’s all possible to play your way thanks to GeForce NOW. Read article > The post GFN Thursday Delivers Seven New Games This Week appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    PaLI: Scaling Language-Image Learning in 100+ Languages
    Posted by Xi Chen and Xiao Wang, Software Engineers, Google Research Advanced language models (e.g., GPT, GLaM, PaLM and T5) have demonstrated diverse capabilities and achieved impressive results across tasks and languages by scaling up their number of parameters. Vision-language (VL) models can benefit from similar scaling to address many tasks, such as image captioning, visual question answering (VQA), object recognition, and in-context optical-character-recognition (OCR). Increasing the success rates for these practical tasks is important for everyday interactions and applications. Furthermore, for a truly universal system, vision-language models should be able to operate in many languages, not just one. In “PaLI: A Jointly-Scaled Multilingual Language-Image Model”, we introduce a uni…  ( 27 min )
  • Open

    The Application of Data Science in Health Technology
    As the population ages, there is an increasing need for healthcare services. More people live longer than ever before, so they are likely…  ( 10 min )
  • Open

    Use Amazon SageMaker Data Wrangler for data preparation and Studio Labs to learn and experiment with ML
    Amazon SageMaker Studio Lab is a free machine learning (ML) development environment based on open-source JupyterLab for anyone to learn and experiment with ML using AWS ML compute resources. It’s based on the same architecture and user interface as Amazon SageMaker Studio, but with a subset of Studio capabilities. When you begin working on ML […]  ( 8 min )
  • Open

    DQN agent doesn't learn
    Hello, I try to create DQN agent that has to learn to detect attacks/intrusions, however the agent achieves a very low accuracy (20%) so it didn't learn anything. I will be very grateful if anyone helps to find where is the problem exactly and how to improve the accuracy . This is the code: import gym import numpy as np import pandas as pd import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import LabelEncoder,OneHotEncoder from sklearn import preprocessing from sklearn.feature_selection import RFE from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix import json import random from collections import deque import time from tqdm.notebook import tqdm fr…  ( 92 min )
    Do you ever feel DRL boring?
    Hi, I am an independent researcher on deep reinforcement learning, with passion to achieve ultimate goal of general AI. However, recently I heard some point of view from another professor, which is that doing DRL is boring, just about ranking, better scores on simulations etc. Besides, it is also pointed out that it can not generalize well on other tasks though perform well on MuJoCo or Atari games. It is actually vague in my opinion. When I sit down and review this comment, I think though there are many things in DRL really tedious such as sweeping hyperparameters, or improving performance, it is still the most promising way to achieve that goal, at which I feel excited and motivated. That is why I go along such a long way and may continue. What are your thoughts on DRL, and this point of view? Any shares are appreciated! submitted by /u/OutOfCharm [link] [comments]  ( 96 min )
    Best Books to Learn Reinforcement Learning in 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 99 min )
  • Open

    Data-Driven Estimation of Capacity Upper Bounds. (arXiv:2205.06471v2 [cs.IT] UPDATED)
    We consider the problem of estimating an upper bound on the capacity of a memoryless channel with unknown channel law and continuous output alphabet. A novel data-driven algorithm is proposed that exploits the dual representation of capacity where the maximization over the input distribution is replaced with a minimization over a reference distribution on the channel output. To efficiently compute the required divergence maximization between the conditional channel and the reference distribution, we use a modified mutual information neural estimator that takes the channel input as an additional parameter. We numerically evaluate our approach on different memoryless channels and show empirically that the estimated upper bounds closely converge either to the channel capacity or to best-known lower bounds.  ( 2 min )
    CoditT5: Pretraining for Source Code and Natural Language Editing. (arXiv:2208.05446v2 [cs.SE] UPDATED)
    Pretrained language models have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.  ( 2 min )
    Efficient Beam Search for Initial Access Using Collaborative Filtering. (arXiv:2209.06669v1 [eess.SY])
    Beamforming-capable antenna arrays overcome the high free-space path loss at higher carrier frequencies. However, the beams must be properly aligned to ensure that the highest power is radiated towards (and received by) the user equipment (UE). While there are methods that improve upon an exhaustive search for optimal beams by some form of hierarchical search, they can be prone to return only locally optimal solutions with small beam gains. Other approaches address this problem by exploiting contextual information, e.g., the position of the UE or information from neighboring base stations (BS), but the burden of computing and communicating this additional information can be high. Methods based on machine learning so far suffer from the accompanying training, performance monitoring and deployment complexity that hinders their application at scale. This paper proposes a novel method for solving the initial beam-discovery problem. It is scalable, and easy to tune and to implement. Our algorithm is based on a recommender system that associates groups (i.e., UEs) and preferences (i.e., beams from a codebook) based on a training data set. Whenever a new UE needs to be served our algorithm returns the best beams in this user cluster. Our simulation results demonstrate the efficiency and robustness of our approach, not only in single BS setups but also in setups that require a coordination among several BSs. Our method consistently outperforms standard baseline algorithms in the given task.  ( 3 min )
    Empowering GNNs with Fine-grained Communication-Computation Pipelining on Multi-GPU Platforms. (arXiv:2209.06800v1 [cs.DC])
    The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN solutions suffer from inferior performance due to imbalanced computation and inefficient communication. To this end, we propose MGG, a novel system design to accelerate GNNs on multi-GPU platforms via a GPU-centric software pipeline. MGG explores the potential of hiding remote memory access latency in GNN workloads through fine-grained computation-communication pipelining. Specifically, MGG introduces a pipeline-aware workload management strategy and a hybrid data layout design to facilitate communication-computation overlapping. MGG implements an optimized pipeline-centric kernel. It includes workload interleaving and warp-based mapping for efficient GPU kernel operation pipelining and specialized memory designs and optimizations for better data access performance. Besides, MGG incorporates lightweight analytical modeling and optimization heuristics to dynamically improve the GNN execution performance for different settings at runtime. Comprehensive experiments demonstrate that MGG outperforms state-of-the-art multi-GPU systems across various GNN settings: on average 3.65X faster than multi-GPU systems with a unified virtual memory design and on average 7.38X faster than the DGCL framework.  ( 2 min )
    Bregman Deviations of Generic Exponential Families. (arXiv:2201.07306v3 [cs.LG] UPDATED)
    We revisit the method of mixture technique, also known as the Laplace method, to study the concentration phenomenon in generic exponential families. Combining the properties of Bregman divergence associated with log-partition function of the family with the method of mixtures for super-martingales, we establish a generic bound controlling the Bregman divergence between the parameter of the family and a finite sample estimate of the parameter. Our bound is time-uniform and makes appear a quantity extending the classical information gain to exponential families, which we call the Bregman information gain. For the practitioner, we instantiate this novel bound to several classical families, e.g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain. We further numerically compare the resulting confidence bounds to state-of-the-art alternatives for time-uniform concentration and show that this novel method yields competitive results. Finally, we highlight the benefit of our concentration bounds on some illustrative applications.  ( 2 min )
    Minimax risk classifiers with 0-1 loss. (arXiv:2201.06487v3 [stat.ML] UPDATED)
    Supervised classification techniques use training samples to learn a classification rule with small expected 0-1 loss (error probability). Conventional methods enable tractable learning and provide out-of-sample generalization by using surrogate losses instead of the 0-1 loss and considering specific families of rules (hypothesis classes). This paper presents minimax risk classifiers (MRCs) that minimize the worst-case 0-1 loss over general classification rules and provide tight performance guarantees at learning. We show that MRCs are strongly universally consistent using feature mappings given by characteristic kernels. The paper also proposes efficient optimization techniques for MRC learning and shows that the methods presented can provide accurate classification together with tight performance guarantees in practice.  ( 2 min )
    Online Deep Learning from Doubly-Streaming Data. (arXiv:2204.11793v4 [cs.LG] UPDATED)
    This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away. The challenges of this problem are two folds: 1) Data samples ceaselessly flowing in may carry shifted patterns over time, requiring learners to update hence adapt on-the-fly. 2) Newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A plausible idea to overcome the challenges is to establish relationship between the pre-and-post evolving feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional media streams with complex feature interplay, which suffers an tradeoff between onlineness (biasing shallow learners) and expressiveness(requiring deep learners). Motivated by this, we propose a novel OLD^3S paradigm, where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building intermediate feature mapping relationship. A key trait of OLD^3S is to treat the model capacity as a learnable semantics, yields optimal model depth and parameters jointly, in accordance with the complexity and non-linearity of the input data streams in an online fashion. Both theoretical analyses and empirical studies substantiate the viability and effectiveness of our proposal.  ( 3 min )
    Learned reconstruction methods with convergence guarantees. (arXiv:2206.05431v3 [cs.CV] UPDATED)
    In recent years, deep learning has achieved remarkable empirical success for image reconstruction. This has catalyzed an ongoing quest for precise characterization of correctness and reliability of data-driven methods in critical use-cases, for instance in medical imaging. Notwithstanding the excellent performance and efficacy of deep learning-based methods, concerns have been raised regarding their stability, or lack thereof, with serious practical implications. Significant advances have been made in recent years to unravel the inner workings of data-driven image recovery methods, challenging their widely perceived black-box nature. In this article, we will specify relevant notions of convergence for data-driven image reconstruction, which will form the basis of a survey of learned methods with mathematically rigorous reconstruction guarantees. An example that is highlighted is the role of ICNN, offering the possibility to combine the power of deep learning with classical convex regularization theory for devising methods that are provably convergent. This survey article is aimed at both methodological researchers seeking to advance the frontiers of our understanding of data-driven image reconstruction methods as well as practitioners, by providing an accessible description of useful convergence concepts and by placing some of the existing empirical practices on a solid mathematical foundation.  ( 3 min )
    On the Maximum Hessian Eigenvalue and Generalization. (arXiv:2206.10654v2 [cs.LG] UPDATED)
    The mechanisms by which certain training interventions, such as increasing learning rates and applying batch normalization, improve the generalization of deep networks remains a mystery. Prior works have speculated that "flatter" solutions generalize better than "sharper" solutions to unseen data, motivating several metrics for measuring flatness (particularly $\lambda_{max}$, the largest eigenvalue of the Hessian of the loss); and algorithms, such as Sharpness-Aware Minimization (SAM) [1], that directly optimize for flatness. Other works question the link between $\lambda_{max}$ and generalization. In this paper, we present findings that call $\lambda_{max}$'s influence on generalization further into question. We show that: (1) while larger learning rates reduce $\lambda_{max}$ for all batch sizes, generalization benefits sometimes vanish at larger batch sizes; (2) by scaling batch size and learning rate simultaneously, we can change $\lambda_{max}$ without affecting generalization; (3) while SAM produces smaller $\lambda_{max}$ for all batch sizes, generalization benefits (also) vanish with larger batch sizes; (4) for dropout, excessively high dropout probabilities can degrade generalization, even as they promote smaller $\lambda_{max}$; and (5) while batch-normalization does not consistently produce smaller $\lambda_{max}$, it nevertheless confers generalization benefits. While our experiments affirm the generalization benefits of large learning rates and SAM for minibatch SGD, the GD-SGD discrepancy demonstrates limits to $\lambda_{max}$'s ability to explain generalization in neural networks.  ( 3 min )
    Big Learning: A Universal Machine Learning Paradigm?. (arXiv:2207.03899v2 [cs.LG] UPDATED)
    Recent breakthroughs based on big/foundation models reveal a vague avenue for AI, that is, \emph{bid data, big/foundation models, big learning, $\cdots$}. Following that avenue, here we elaborate on our newly introduced big learning. Specifically, big learning exhaustively exploits the information/tasks inherent in its large-scale \emph{complete/incomplete} training data, by learning to simultaneously model many-to-all joint/conditional/marginal data distributions (thus named big learning) with one universal foundation model. We reveal that big learning is what existing foundation models are implicitly doing; accordingly, our big learning provides high-level guidance for flexible design and improvements of foundation models. Besides, big learning ($i$) is equipped with great flexibilities for complete/incomplete training data and for customizing trustworthy data tasks; ($ii$) potentially delivers all joint/conditional/marginal data capabilities after training; ($iii$) significantly reduces the training-test gap with improved model generalization; and ($iv$) potentially unifies conventional machine learning paradigms and enables their flexible cooperations, manifesting a universal learning paradigm. Preliminary experiments verified the effectiveness of the presented big learning.  ( 2 min )
    TWEET-FID: An Annotated Dataset for Multiple Foodborne Illness Detection Tasks. (arXiv:2205.10726v2 [cs.CL] UPDATED)
    Foodborne illness is a serious but preventable public health problem -- with delays in detecting the associated outbreaks resulting in productivity loss, expensive recalls, public safety hazards, and even loss of life. While social media is a promising source for identifying unreported foodborne illnesses, there is a dearth of labeled datasets for developing effective outbreak detection models. To accelerate the development of machine learning-based models for foodborne outbreak detection, we thus present TWEET-FID (TWEET-Foodborne Illness Detection), the first publicly available annotated dataset for multiple foodborne illness incident detection tasks. TWEET-FID collected from Twitter is annotated with three facets: tweet class, entity type, and slot type, with labels produced by experts as well as by crowdsource workers. We introduce several domain tasks leveraging these three facets: text relevance classification (TRC), entity mention detection (EMD), and slot filling (SF). We describe the end-to-end methodology for dataset design, creation, and labeling for supporting model development for these tasks. A comprehensive set of results for these tasks leveraging state-of-the-art single- and multi-task deep learning methods on the TWEET-FID dataset are provided. This dataset opens opportunities for future research in foodborne outbreak detection.  ( 3 min )
    Small Transformers Compute Universal Metric Embeddings. (arXiv:2209.06788v1 [cs.LG])
    We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univariate Gaussian mixtures with a transport metric (Delon and Desolneux 2020). We derive embedding guarantees for feature maps implemented by small neural networks called \emph{probabilistic transformers}. Our guarantees are of memorization type: we prove that a probabilistic transformer of depth about $n\log(n)$ and width about $n^2$ can bi-H\"{o}lder embed any $n$-point dataset from $\mathcal{X}$ with low metric distortion, thus avoiding the curse of dimensionality. We further derive probabilistic bi-Lipschitz guarantees which trade off the amount of distortion and the probability that a randomly chosen pair of points embeds with that distortion. If $\mathcal{X}$'s geometry is sufficiently regular, we obtain stronger, bi-Lipschitz guarantees for all points in the dataset. As applications we derive neural embedding guarantees for datasets from Riemannian manifolds, metric trees, and certain types of combinatorial graphs.  ( 2 min )
    FedNest: Federated Bilevel, Minimax, and Compositional Optimization. (arXiv:2205.02215v3 [cs.LG] UPDATED)
    Standard federated optimization methods successfully apply to stochastic problems with single-level structure. However, many contemporary ML problems -- including adversarial robustness, hyperparameter tuning, and actor-critic -- fall under nested bilevel programming that subsumes minimax and compositional optimization. In this work, we propose \fedblo: A federated alternating stochastic gradient method to address general nested problems. We establish provable convergence rates for \fedblo in the presence of heterogeneous data and introduce variations for bilevel, minimax, and compositional optimization. \fedblo introduces multiple innovations including federated hypergradient computation and variance reduction to address inner-level heterogeneity. We complement our theory with experiments on hyperparameter \& hyper-representation learning and minimax optimization that demonstrate the benefits of our method in practice. Code is available at https://github.com/ucr-optml/FedNest.  ( 2 min )
    Falsification of Cyber-Physical Systems using Bayesian Optimization. (arXiv:2209.06735v1 [eess.SY])
    Cyber-physical systems (CPSs) are usually complex and safety-critical; hence, it is difficult and important to guarantee that the system's requirements, i.e., specifications, are fulfilled. Simulation-based falsification of CPSs is a practical testing method that can be used to raise confidence in the correctness of the system by only requiring that the system under test can be simulated. As each simulation is typically computationally intensive, an important step is to reduce the number of simulations needed to falsify a specification. We study Bayesian optimization (BO), a sample-efficient method that learns a surrogate model that describes the relationship between the parametrization of possible input signals and the evaluation of the specification. In this paper, we improve the falsification using BO by; first adopting two prominent BO methods, one fits local surrogate models, and the other exploits the user's prior knowledge. Secondly, the formulation of acquisition functions for falsification is addressed in this paper. Benchmark evaluation shows significant improvements in using local surrogate models of BO for falsifying benchmark examples that were previously hard to falsify. Using prior knowledge in the falsification process is shown to be particularly important when the simulation budget is limited. For some of the benchmark problems, the choice of acquisition function clearly affects the number of simulations needed for successful falsification.  ( 3 min )
    Improving Voice Trigger Detection with Metric Learning. (arXiv:2204.02455v2 [cs.SD] UPDATED)
    Voice trigger detection is an important task, which enables activating a voice assistant when a target user speaks a keyword phrase. A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task. However, such a speaker independent voice trigger detector typically suffers from performance degradation on speech from underrepresented groups, such as accented speakers. In this work, we propose a novel voice trigger detector that can use a small number of utterances from a target speaker to improve detection accuracy. Our proposed model employs an encoder-decoder architecture. While the encoder performs speaker independent voice trigger detection, similar to the conventional detector, the decoder predicts a personalized embedding for each utterance. A personalized voice trigger score is then obtained as a similarity score between the embeddings of enrollment utterances and a test utterance. The personalized embedding allows adapting to target speaker's speech when computing the voice trigger score, hence improving voice trigger detection accuracy. Experimental results show that the proposed approach achieves a 38% relative reduction in a false rejection rate (FRR) compared to a baseline speaker independent voice trigger model.  ( 3 min )
    Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs. (arXiv:2209.06716v1 [cs.LG])
    Single-cell RNA-seq datasets are growing in size and complexity, enabling the study of cellular composition changes in various biological/clinical contexts. Scalable dimensionality reduction techniques are in need to disentangle biological variation in them, while accounting for technical and biological confounders. In this work, we extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets while explicitly accounting for technical and biological confounders. The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast stochastic variational inference. We demonstrate its ability to reconstruct latent signatures of innate immunity recovered in Kumasaka et al. (2021) with 9x lower training time. We further analyze a COVID dataset and demonstrate across a cohort of 130 individuals, that this framework enables data integration while capturing interpretable signatures of infection. Specifically, we explore COVID severity as a latent dimension to refine patient stratification and capture disease-specific gene expression.  ( 2 min )
    Removing the fat from your posterior samples with margarine. (arXiv:2205.12841v2 [astro-ph.IM] UPDATED)
    Bayesian workflows often require the introduction of nuisance parameters, yet for core science modelling one needs access to a marginal posterior density. In this work we use masked autoregressive flows and kernel density estimators to encapsulate the marginal posterior, allowing us to compute marginal Kullback-Leibler divergences and marginal Bayesian model dimensionalities in addition to generating samples and computing marginal log probabilities. We demonstrate this in application to topical cosmological examples of the Dark Energy Survey, and global 21cm signal experiments. In addition to the computation of marginal Bayesian statistics, this work is important for further applications in Bayesian experimental design, complex prior modelling and likelihood emulation. This technique is made publicly available in the pip-installable code margarine.  ( 2 min )
    Riemannian Langevin Algorithm for Solving Semidefinite Programs. (arXiv:2010.11176v4 [stat.ML] UPDATED)
    We propose a Langevin diffusion-based algorithm for non-convex optimization and sampling on a product manifold of spheres. Under a logarithmic Sobolev inequality, we establish a guarantee for finite iteration convergence to the Gibbs distribution in terms of Kullback--Leibler divergence. We show that with an appropriate temperature choice, the suboptimality gap to the global minimum is guaranteed to be arbitrarily small with high probability. As an application, we consider the Burer--Monteiro approach for solving a semidefinite program (SDP) with diagonal constraints, and analyze the proposed Langevin algorithm for optimizing the non-convex objective. In particular, we establish a logarithmic Sobolev inequality for the Burer--Monteiro problem when there are no spurious local minima, but under the presence saddle points. Combining the results, we then provide a global optimality guarantee for the SDP and the Max-Cut problem. More precisely, we show that the Langevin algorithm achieves $\epsilon$ accuracy with high probability in $\widetilde{\Omega}( \epsilon^{-5} )$ iterations.
    SORNet: Spatial Object-Centric Representations for Sequential Manipulation. (arXiv:2109.03891v3 [cs.RO] UPDATED)
    Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state. In such tasks, the ability to reason about spatial relations among object entities from raw sensor inputs is crucial in order to determine when a task has been completed and which actions can be executed. In this work, we propose SORNet (Spatial Object-Centric Representation Network), a framework for learning object-centric representations from RGB images conditioned on a set of object queries, represented as image patches called canonical object views. With only a single canonical view per object and no annotation, SORNet generalizes zero-shot to object entities whose shape and texture are both unseen during training. We evaluate SORNet on various spatial reasoning tasks such as spatial relation classification and relative direction regression in complex tabletop manipulation scenarios and show that SORNet significantly outperforms baselines including state-of-the-art representation learning techniques. We also demonstrate the application of the representation learned by SORNet on visual-servoing and task planning for sequential manipulation on a real robot.
    Natural Reweighted Wake-Sleep. (arXiv:2008.06687v4 [cs.LG] UPDATED)
    Helmholtz Machines (HMs) are a class of generative models composed of two Sigmoid Belief Networks (SBNs), acting respectively as an encoder and a decoder. These models are commonly trained using a two-step optimization algorithm called Wake-Sleep (WS) and more recently by improved versions, such as Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machines (BiHM). The locality of the connections in an SBN induces sparsity in the Fisher Information Matrices associated to the probabilistic models, in the form of a finely-grained block-diagonal structure. In this paper we exploit this property to efficiently train SBNs and HMs using the natural gradient. We present a novel algorithm, called Natural Reweighted Wake-Sleep (NRWS), that corresponds to the geometric adaptation of its standard version. In a similar manner, we also introduce Natural Bidirectional Helmholtz Machine (NBiHM). Differently from previous work, we will show how for HMs the natural gradient can be efficiently computed without the need of introducing any approximation in the structure of the Fisher information matrix. The experiments performed on standard datasets from the literature show a consistent improvement of NRWS and NBiHM not only with respect to their non-geometric baselines but also with respect to state-of-the-art training algorithms for HMs. The improvement is quantified both in terms of speed of convergence as well as value of the log-likelihood reached after training.
    Discrepancy-Based Active Learning for Domain Adaptation. (arXiv:2103.03757v3 [cs.LG] UPDATED)
    The goal of the paper is to design active learning strategies which lead to domain adaptation under an assumption of Lipschitz functions. Building on previous work by Mansour et al. (2009) we adapt the concept of discrepancy distance between source and target distributions to restrict the maximization over the hypothesis class to a localized class of functions which are performing accurate labeling on the source domain. We derive generalization error bounds for such active learning strategies in terms of Rademacher average and localized discrepancy for general loss functions which satisfy a regularity condition. A practical K-medoids algorithm that can address the case of large data set is inferred from the theoretical bounds. Our numerical experiments show that the proposed algorithm is competitive against other state-of-the-art active learning techniques in the context of domain adaptation, in particular on large data sets of around one hundred thousand images.
    Variational System Identification for Nonlinear State-Space Models. (arXiv:2012.05072v3 [stat.ML] UPDATED)
    This paper considers parameter estimation for nonlinear state-space models, which is an important but challenging problem. We address this challenge by employing a variational inference (VI) approach, which is a principled method that has deep connections to maximum likelihood estimation. This VI approach ultimately provides estimates of the model as solutions to an optimisation problem, which is deterministic, tractable and can be solved using standard optimisation tools. A specialisation of this approach for systems with additive Gaussian noise is also detailed. The proposed method is examined numerically on a range of simulated and real examples focusing on the robustness to parameter initialisation; additionally, favourable comparisons are performed against state-of-the-art alternatives.
    Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video. (arXiv:2201.10439v2 [cs.CV] UPDATED)
    Audio-visual automatic speech recognition (AV-ASR) extends speech recognition by introducing the video modality as an additional source of information. In this work, the information contained in the motion of the speaker's mouth is used to augment the audio features. The video modality is traditionally processed with a 3D convolutional neural network (e.g. 3D version of VGG). Recently, image transformer networks arXiv:2010.11929 demonstrated the ability to extract rich visual features for image classification tasks. Here, we propose to replace the 3D convolution with a video transformer to extract visual features. We train our baselines and the proposed model on a large scale corpus of YouTube videos. The performance of our approach is evaluated on a labeled subset of YouTube videos as well as on the LRS3-TED public corpus. Our best video-only model obtains 34.9% WER on YTDEV18 and 19.3% on LRS3-TED, a 10% and 9% relative improvements over our convolutional baseline. We achieve the state of the art performance of the audio-visual recognition on the LRS3-TED after fine-tuning our model (1.6% WER). In addition, in a series of experiments on multi-person AV-ASR, we obtained an average relative reduction of 2% WER over our convolutional video frontend.
    vec2text with Round-Trip Translations. (arXiv:2209.06792v1 [cs.CL])
    We investigate models that can generate arbitrary natural language text (e.g. all English sentences) from a bounded, convex and well-behaved control space. We call them universal vec2text models. Such models would allow making semantic decisions in the vector space (e.g. via reinforcement learning) while the natural language generation is handled by the vec2text model. We propose four desired properties: universality, diversity, fluency, and semantic structure, that such vec2text models should possess and we provide quantitative and qualitative methods to assess them. We implement a vec2text model by adding a bottleneck to a 250M parameters Transformer model and training it with an auto-encoding objective on 400M sentences (10B tokens) extracted from a massive web corpus. We propose a simple data augmentation technique based on round-trip translations and show in extensive experiments that the resulting vec2text model surprisingly leads to vector spaces that fulfill our four desired properties and that this model strongly outperforms both standard and denoising auto-encoders.
    Graph Neural Networks for Decentralized Multi-Robot Submodular Action Selection. (arXiv:2105.08601v3 [cs.RO] UPDATED)
    The problem of decentralized multi-robot target tracking asks for jointly selecting actions, e.g., motion primitives, for the robots to maximize target tracking performance with local communications. One major challenge for practical implementations is to make target tracking approaches scalable for large-scale problem instances. In this work, we propose a general-purpose learning architecture toward collaborative target tracking at scale, with decentralized communications. Particularly, our learning architecture leverages a graph neural network (GNN) to capture local interactions of the robots and learns decentralized decision-making for the robots. We train the learning model by imitating an expert solution and implement the resulting model for decentralized action selection involving local observations and communications only. We demonstrate the performance of our GNN-based learning approach in a scenario of active target tracking with large networks of robots. The simulation results show our approach nearly matches the tracking performance of the expert algorithm, and yet runs several orders faster with up to 100 robots. Moreover, it slightly outperforms a decentralized greedy algorithm but runs faster (especially with more than 20 robots). The results also exhibit our approach's generalization capability in previously unseen scenarios, e.g., larger environments and larger networks of robots.
    A Robust Scientific Machine Learning for Optimization: A Novel Robustness Theorem. (arXiv:2209.06642v1 [math.OC])
    Scientific machine learning (SciML) is a field of increasing interest in several different application fields. In an optimization context, SciML-based tools have enabled the development of more efficient optimization methods. However, implementing SciML tools for optimization must be rigorously evaluated and performed with caution. This work proposes the deductions of a robustness test that guarantees the robustness of multiobjective SciML-based optimization by showing that its results respect the universal approximator theorem. The test is applied in the framework of a novel methodology which is evaluated in a series of benchmarks illustrating its consistency. Moreover, the proposed methodology results are compared with feasible regions of rigorous optimization, which requires a significantly higher computational effort. Hence, this work provides a robustness test for guaranteed robustness in applying SciML tools in multiobjective optimization with lower computational effort than the existent alternative.
    Can Stochastic Gradient Langevin Dynamics Provide Differential Privacy for Deep Learning?. (arXiv:2110.05057v4 [cs.LG] UPDATED)
    Bayesian learning via Stochastic Gradient Langevin Dynamics (SGLD) has been suggested for differentially private learning. While previous research provides differential privacy bounds for SGLD at the initial steps of the algorithm or when close to convergence, the question of what differential privacy guarantees can be made in between remains unanswered. This interim region is of great importance, especially for Bayesian neural networks, as it is hard to guarantee convergence to the posterior. This paper shows that using SGLD might result in unbounded privacy loss for this interim region, even when sampling from the posterior is as differentially private as desired.
    Simplicial Convolutional Filters. (arXiv:2201.11720v2 [eess.SP] UPDATED)
    We study linear filters for processing signals supported on abstract topological spaces modeled as simplicial complexes, which may be interpreted as generalizations of graphs that account for nodes, edges, triangular faces etc. To process such signals, we develop simplicial convolutional filters defined as matrix polynomials of the lower and upper Hodge Laplacians. First, we study the properties of these filters and show that they are linear and shift-invariant, as well as permutation and orientation equivariant. These filters can also be implemented in a distributed fashion with a low computational complexity, as they involve only (multiple rounds of) simplicial shifting between upper and lower adjacent simplices. Second, focusing on edge-flows, we study the frequency responses of these filters and examine how we can use the Hodge-decomposition to delineate gradient, curl and harmonic frequencies. We discuss how these frequencies correspond to the lower- and the upper-adjacent couplings and the kernel of the Hodge Laplacian, respectively, and can be tuned independently by our filter designs. Third, we study different procedures for designing simplicial convolutional filters and discuss their relative advantages. Finally, we corroborate our simplicial filters in several applications: to extract different frequency components of a simplicial signal, to denoise edge flows, and to analyze financial markets and traffic networks.
    A Simple Approach for State-Action Abstraction using a Learned MDP Homomorphism. (arXiv:2209.06356v1 [cs.LG])
    Animals are able to rapidly infer from limited experience when sets of state action pairs have equivalent reward and transition dynamics. On the other hand, modern reinforcement learning systems must painstakingly learn through trial and error that sets of state action pairs are value equivalent -- requiring an often prohibitively large amount of samples from their environment. MDP homomorphisms have been proposed that reduce the observed MDP of an environment to an abstract MDP, which can enable more sample efficient policy learning. Consequently, impressive improvements in sample efficiency have been achieved when a suitable MDP homomorphism can be constructed a priori -- usually by exploiting a practioner's knowledge of environment symmetries. We propose a novel approach to constructing a homomorphism in discrete action spaces, which uses a partial model of environment dynamics to infer which state action pairs lead to the same state -- reducing the size of the state-action space by a factor equal to the cardinality of the action space. We call this method equivalent effect abstraction. In a gridworld setting, we demonstrate empirically that equivalent effect abstraction can improve sample efficiency in a model-free setting and planning efficiency for modelbased approaches. Furthermore, we show on cartpole that our approach outperforms an existing method for learning homomorphisms, while using 33x less training data.
    Will there be a construction? Predicting road constructions based on heterogeneous spatiotemporal data. (arXiv:2209.06813v1 [cs.LG])
    Road construction projects maintain transportation infrastructures. These projects range from the short-term (e.g., resurfacing or fixing potholes) to the long-term (e.g., adding a shoulder or building a bridge). Deciding what the next construction project is and when it is to be scheduled is traditionally done through inspection by humans using special equipment. This approach is costly and difficult to scale. An alternative is the use of computational approaches that integrate and analyze multiple types of past and present spatiotemporal data to predict location and time of future road constructions. This paper reports on such an approach, one that uses a deep-neural-network-based model to predict future constructions. Our model applies both convolutional and recurrent components on a heterogeneous dataset consisting of construction, weather, map and road-network data. We also report on how we addressed the lack of adequate publicly available data - by building a large scale dataset named "US-Constructions", that includes 6.2 million cases of road constructions augmented by a variety of spatiotemporal attributes and road-network features, collected in the contiguous United States (US) between 2016 and 2021. Using extensive experiments on several major cities in the US, we show the applicability of our work in accurately predicting future constructions - an average f1-score of 0.85 and accuracy 82.2% - that outperform baselines. Additionally, we show how our training pipeline addresses spatial sparsity of data.
    Noise2SR: Learning to Denoise from Super-Resolved Single Noisy Fluorescence Image. (arXiv:2209.06411v1 [eess.IV])
    Fluorescence microscopy is a key driver to promote discoveries of biomedical research. However, with the limitation of microscope hardware and characteristics of the observed samples, the fluorescence microscopy images are susceptible to noise. Recently, a few self-supervised deep learning (DL) denoising methods have been proposed. However, the training efficiency and denoising performance of existing methods are relatively low in real scene noise removal. To address this issue, this paper proposed self-supervised image denoising method Noise2SR (N2SR) to train a simple and effective image denoising model based on single noisy observation. Our Noise2SR denoising model is designed for training with paired noisy images of different dimensions. Benefiting from this training strategy, Noise2SR is more efficiently self-supervised and able to restore more image details from a single noisy observation. Experimental results of simulated noise and real microscopy noise removal show that Noise2SR outperforms two blind-spot based self-supervised deep learning image denoising methods. We envision that Noise2SR has the potential to improve more other kind of scientific imaging quality.
    Beyond Learning from Next Item: Sequential Recommendation via Personalized Interest Sustainability. (arXiv:2209.06644v1 [cs.IR])
    Sequential recommender systems have shown effective suggestions by capturing users' interest drift. There have been two groups of existing sequential models: user- and item-centric models. The user-centric models capture personalized interest drift based on each user's sequential consumption history, but do not explicitly consider whether users' interest in items sustains beyond the training time, i.e., interest sustainability. On the other hand, the item-centric models consider whether users' general interest sustains after the training time, but it is not personalized. In this work, we propose a recommender system taking advantages of the models in both categories. Our proposed model captures personalized interest sustainability, indicating whether each user's interest in items will sustain beyond the training time or not. We first formulate a task that requires to predict which items each user will consume in the recent period of the training time based on users' consumption history. We then propose simple yet effective schemes to augment users' sparse consumption history. Extensive experiments show that the proposed model outperforms 10 baseline models on 11 real-world datasets. The codes are available at https://github.com/dmhyun/PERIS.
    Rule-adhering synthetic data -- the lingua franca of learning. (arXiv:2209.06679v1 [cs.LG])
    AI-generated synthetic data allows to distill the general patterns of existing data, that can then be shared safely as granular-level representative, yet novel data samples within the original semantics. In this work we explore approaches of incorporating domain expertise into the data synthesis, to have the statistical properties as well as pre-existing domain knowledge of rules be represented. The resulting synthetic data generator, that can be probed for any number of new samples, can then serve as a common source of intelligence, as a lingua franca of learning, consumable by humans and machines alike. We demonstrate the concept for a publicly available data set, and evaluate its benefits via descriptive analysis as well as a downstream ML model.
    Scheduling Algorithms for Federated Learning with Minimal Energy Consumption. (arXiv:2209.06210v1 [cs.LG])
    Federated Learning (FL) has opened the opportunity for collaboratively training machine learning models on heterogeneous mobile or Edge devices while keeping local data private.With an increase in its adoption, a growing concern is related to its economic and environmental cost (as is also the case for other machine learning techniques).Unfortunately, little work has been done to optimize its energy consumption or emissions of carbon dioxide or equivalents, as energy minimization is usually left as a secondary objective.In this paper, we investigate the problem of minimizing the energy consumption of FL training on heterogeneous devices by controlling the workload distribution.We model this as the Minimal Cost FL Schedule problem, a total cost minimization problem with identical, independent, and atomic tasks that have to be assigned to heterogeneous resources with arbitrary cost functions.We propose a pseudo-polynomial optimal solution to the problem based on the previously unexplored Multiple-Choice Minimum-Cost Maximal Knapsack Packing Problem.We also provide four algorithms for scenarios where cost functions are monotonically increasing and follow the same behavior.These solutions are likewise applicable on the minimization of other kinds of costs, and in other one-dimensional data partition problems.
    Optimization without Backpropagation. (arXiv:2209.06302v1 [cs.LG])
    Forward gradients have been recently introduced to bypass backpropagation in autodifferentiation, while retaining unbiased estimators of true gradients. We derive an optimality condition to obtain best approximating forward gradients, which leads us to mathematical insights that suggest optimization in high dimension is challenging with forward gradients. Our extensive experiments on test functions support this claim.
    Towards Better Generalization with Flexible Representation of Multi-Module Graph Neural Networks. (arXiv:2209.06589v1 [cs.LG])
    Graph neural networks (GNNs) have become compelling models designed to perform learning and inference on graph-structured data, but little work has been done on understanding the fundamental limitations of GNNs to be scalable to larger graphs and generalized to out-of-distribution inputs. In this paper, we use a random graph generator that allows us to systematically investigate how the graph size and structural properties affect the predictive performance of GNNs. We present specific evidence that, among the many graph properties, the mean and modality of the node degree distribution are the key features that determine whether GNNs can generalize to unseen graphs. Accordingly, we propose flexible GNNs (Flex-GNNs), using multiple node update functions and the inner loop optimization as a generalization to the single type of canonical nonlinear transformation over aggregated inputs, allowing the network to adapt flexibly to new graphs. The Flex-GNN framework improves the generalization out of the training set on several inference tasks.
    Efficient Unsupervised Learning for Plankton Images. (arXiv:2209.06726v1 [cs.CV])
    Monitoring plankton populations in situ is fundamental to preserve the aquatic ecosystem. Plankton microorganisms are in fact susceptible of minor environmental perturbations, that can reflect into consequent morphological and dynamical modifications. Nowadays, the availability of advanced automatic or semi-automatic acquisition systems has been allowing the production of an increasingly large amount of plankton image data. The adoption of machine learning algorithms to classify such data may be affected by the significant cost of manual annotation, due to both the huge quantity of acquired data and the numerosity of plankton species. To address these challenges, we propose an efficient unsupervised learning pipeline to provide accurate classification of plankton microorganisms. We build a set of image descriptors exploiting a two-step procedure. First, a Variational Autoencoder (VAE) is trained on features extracted by a pre-trained neural network. We then use the learnt latent space as image descriptor for clustering. We compare our method with state-of-the-art unsupervised approaches, where a set of pre-defined hand-crafted features is used for clustering of plankton images. The proposed pipeline outperforms the benchmark algorithms for all the plankton datasets included in our analysis, providing better image embedding properties.
    Predicting probability distributions for cancer therapy drug selection optimization. (arXiv:2209.06211v1 [q-bio.QM])
    Large variability between cell lines brings a difficult optimization problem of drug selection for cancer therapy. Standard approaches use prediction of value for this purpose, corresponding e.g. to expected value of their distribution. This article shows superiority of working on, predicting the entire probability distributions - proposing basic tools for this purpose. We are mostly interested in the best drug in their batch to be tested - proper optimization of their selection for extreme statistics requires knowledge of the entire probability distributions, which for distributions of drug properties among cell lines often turn out binomial, e.g. depending on corresponding gene. Hence for basic prediction mechanism there is proposed mixture of two Gaussians, trying to predict its weight based on additional information.
    Joint User and Data Detection in Grant-Free NOMA with Attention-based BiLSTM Network. (arXiv:2209.06392v1 [eess.SP])
    We consider the multi-user detection (MUD) problem in uplink grant-free non-orthogonal multiple access (NOMA), where the access point has to identify the total number and correct identity of the active Internet of Things (IoT) devices and decode their transmitted data. We assume that IoT devices use complex spreading sequences and transmit information in a random-access manner following the burst-sparsity model, where some IoT devices transmit their data in multiple adjacent time slots with a high probability, while others transmit only once during a frame. Exploiting the temporal correlation, we propose an attention-based bidirectional long short-term memory (BiLSTM) network to solve the MUD problem. The BiLSTM network creates a pattern of the device activation history using forward and reverse pass LSTMs, whereas the attention mechanism provides essential context to the device activation points. By doing so, a hierarchical pathway is followed for detecting active devices in a grant-free scenario. Then, by utilising the complex spreading sequences, blind data detection for the estimated active devices is performed. The proposed framework does not require prior knowledge of device sparsity levels and channels for performing MUD. The results show that the proposed network achieves better performance compared to existing benchmark schemes.
    Scalable Spatiotemporal Graph Neural Networks. (arXiv:2209.06520v1 [cs.LG])
    Neural forecasting of spatiotemporal time series drives both research and industrial innovation in several relevant application domains. Graph neural networks (GNNs) are often the core component of the forecasting architecture. However, in most spatiotemporal GNNs, the computational complexity scales up to a quadratic factor with the length of the sequence times the number of links in the graph, hence hindering the application of these models to large graphs and long temporal sequences. While methods to improve scalability have been proposed in the context of static graphs, few research efforts have been devoted to the spatiotemporal case. To fill this gap, we propose a scalable architecture that exploits an efficient encoding of both temporal and spatial dynamics. In particular, we use a randomized recurrent neural network to embed the history of the input time series into high-dimensional state representations encompassing multi-scale temporal dynamics. Such representations are then propagated along the spatial dimension using different powers of the graph adjacency matrix to generate node embeddings characterized by a rich pool of spatiotemporal features. The resulting node embeddings can be efficiently pre-computed in an unsupervised manner, before being fed to a feed-forward decoder that learns to map the multi-scale spatiotemporal representations to predictions. The training procedure can then be parallelized node-wise by sampling the node embeddings without breaking any dependency, thus enabling scalability to large networks. Empirical results on relevant datasets show that our approach achieves results competitive with the state of the art, while dramatically reducing the computational burden.
    Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine. (arXiv:2209.06261v1 [cs.RO])
    Tensegrity robots, composed of rigid rods and flexible cables, exhibit high strength-to-weight ratios and extreme deformations, enabling them to navigate unstructured terrain and even survive harsh impacts. However, they are hard to control due to their high dimensionality, complex dynamics, and coupled architecture. Physics-based simulation is one avenue for developing locomotion policies that can then be transferred to real robots, but modeling tensegrity robots is a complex task, so simulations experience a substantial sim2real gap. To address this issue, this paper describes a Real2Sim2Real strategy for tensegrity robots. This strategy is based on a differential physics engine that can be trained given limited data from a real robot (i.e. offline measurements and one random trajectory) and achieve a high enough accuracy to discover transferable locomotion policies. Beyond the overall pipeline, key contributions of this work include computing non-zero gradients at contact points, a loss function, and a trajectory segmentation technique that avoid conflicts in gradient evaluation during training. The proposed pipeline is demonstrated and evaluated on a real 3-bar tensegrity robot.  ( 2 min )
    Tuple Packing: Efficient Batching of Small Graphs in Graph Neural Networks. (arXiv:2209.06354v1 [cs.LG])
    When processing a batch of graphs in machine learning models such as Graph Neural Networks (GNN), it is common to combine several small graphs into one overall graph to accelerate processing and reduce the overhead of padding. This is for example supported in the PyG library. However, the sizes of small graphs can vary substantially with respect to the number of nodes and edges, and hence the size of the combined graph can still vary considerably, especially for small batch sizes. So the costs of excessive padding and wasted compute are still incurred. This paper proposes a new approach -- tuple packing -- for generating batches that cause minimal overhead. The algorithm extends recently introduced sequence packing approaches to work on the 2D tuples of (|nodes|, |edges|). A monotone heuristic is applied to the 2D histogram of tuple values to define a priority for packing histogram bins together with the objective to reach a limit on the number of nodes as well as the number of edges. Experiments verify the effectiveness of the algorithm on multiple datasets.  ( 2 min )
    Data-Driven Machine Learning Models for a Multi-Objective Flapping Fin Unmanned Underwater Vehicle Control System. (arXiv:2209.06369v1 [cs.RO])
    Flapping-fin unmanned underwater vehicle (UUV) propulsion systems provide high maneuverability for naval tasks such as surveillance and terrain exploration. Recent work has explored the use of time-series neural network surrogate models to predict thrust from vehicle design and fin kinematics. We develop a search-based inverse model that leverages a kinematics-to-thrust neural network model for control system design. Our inverse model finds a set of fin kinematics with the multi-objective goal of reaching a target thrust and creating a smooth kinematic transition between flapping cycles. We demonstrate how a control system integrating this inverse model can make online, cycle-to-cycle adjustments to prioritize different system objectives.  ( 2 min )
    $\pi$VAE: a stochastic process prior for Bayesian deep learning with MCMC. (arXiv:2002.06873v6 [cs.LG] UPDATED)
    Stochastic processes provide a mathematically elegant way model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. In practice, however, efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational autoencoder (VAE) called the prior encoding variational autoencoder ($\pi$VAE). The $\pi$VAE is finitely exchangeable and Kolmogorov consistent, and thus is a continuous stochastic process. We use $\pi$VAE to learn low dimensional embeddings of function classes. We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions to enable statistical inference (such as the integral of a log Gaussian process). For popular tasks, such as spatial interpolation, $\pi$VAE achieves state-of-the-art performance both in terms of accuracy and computational efficiency. Perhaps most usefully, we demonstrate that the low dimensional independently distributed latent space representation learnt provides an elegant and scalable means of performing Bayesian inference for stochastic processes within probabilistic programming languages such as Stan.  ( 3 min )
    Prediction Intervals and Confidence Regions for Symbolic Regression Models based on Likelihood Profiles. (arXiv:2209.06454v1 [cs.LG])
    Symbolic regression is a nonlinear regression method which is commonly performed by an evolutionary computation method such as genetic programming. Quantification of uncertainty of regression models is important for the interpretation of models and for decision making. The linear approximation and so-called likelihood profiles are well-known possibilities for the calculation of confidence and prediction intervals for nonlinear regression models. These simple and effective techniques have been completely ignored so far in the genetic programming literature. In this work we describe the calculation of likelihood profiles in details and also provide some illustrative examples with models created with three different symbolic regression algorithms on two different datasets. The examples highlight the importance of the likelihood profiles to understand the limitations of symbolic regression models and to help the user taking an informed post-prediction decision.  ( 2 min )
    PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically. (arXiv:2209.06275v1 [cs.CL])
    Tongue twisters are meaningful sentences that are difficult to pronounce. The process of automatically generating tongue twisters is challenging since the generated utterance must satisfy two conditions at once: phonetic difficulty and semantic meaning. Furthermore, phonetic difficulty is itself hard to characterize and is expressed in natural tongue twisters through a heterogeneous mix of phenomena such as alliteration and homophony. In this paper, we propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically. We leverage phoneme representations to capture the notion of phonetic difficulty, and we train language models to generate original tongue twisters on two proposed task settings. To do this, we curate a dataset called PANCETTA, consisting of existing English tongue twisters. Through automatic and human evaluation, as well as qualitative analysis, we show that PANCETTA generates novel, phonetically difficult, fluent, and semantically meaningful tongue twisters.  ( 2 min )
    A Clustering Method Based on Information Entropy Payload. (arXiv:2209.06582v1 [cs.LG])
    Existing clustering algorithms such as K-means often need to preset parameters such as the number of categories K, and such parameters may lead to the failure to output objective and consistent clustering results. This paper introduces a clustering method based on the information theory, by which clusters in the clustering result have maximum average information entropy (called entropy payload in this paper). This method can bring the following benefits: firstly, this method does not need to preset any super parameter such as category number or other similar thresholds, secondly, the clustering results have the maximum information expression efficiency. it can be used in image segmentation, object classification, etc., and could be the basis of unsupervised learning.  ( 2 min )
    Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization. (arXiv:2209.06259v1 [cs.LG])
    The ability to accelerate the design of biological sequences can have a substantial impact on the progress of the medical field. The problem can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds. Bayesian Optimization is a principled method for tackling this problem. However, the astronomically large state space of biological sequences renders brute-force iterating over all possible sequences infeasible. In this paper, we propose MetaRLBO where we train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection via Bayesian Optimization. We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data acquired in the previous rounds. Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results compared to existing strong baselines.
    Classical Sequence Match is a Competitive Few-Shot One-Class Learner. (arXiv:2209.06394v1 [cs.LG])
    Nowadays, transformer-based models gradually become the default choice for artificial intelligence pioneers. The models also show superiority even in the few-shot scenarios. In this paper, we revisit the classical methods and propose a new few-shot alternative. Specifically, we investigate the few-shot one-class problem, which actually takes a known sample as a reference to detect whether an unknown instance belongs to the same class. This problem can be studied from the perspective of sequence match. It is shown that with meta-learning, the classical sequence match method, i.e. Compare-Aggregate, significantly outperforms transformer ones. The classical approach requires much less training cost. Furthermore, we perform an empirical comparison between two kinds of sequence match approaches under simple fine-tuning and meta-learning. Meta-learning causes the transformer models' features to have high-correlation dimensions. The reason is closely related to the number of layers and heads of transformer models. Experimental codes and data are available at https://github.com/hmt2014/FewOne
    Efficient low-thrust trajectory data generation based on generative adversarial network. (arXiv:2209.06427v1 [cs.LG])
    Deep learning-based techniques have been introduced into the field of trajectory optimization in recent years. Deep Neural Networks (DNNs) are trained and used as the surrogates of conventional optimization process. They can provide low thrust (LT) transfer cost estimation and enable more complex preliminary mission designs. However, it is a challenge to efficiently obtain the required amount of trajectory data for training. A Generative Adversarial Network (GAN) is adapted to generate the feasible LT trajectory data efficiently. The GAN consists of a generator and a discriminator, both of which are deep networks. The generator generates fake LT transfer features using random noise as input, while the discriminator distinguishes the generator's fake LT transfer features from real LT transfer features. The GAN is trained until the generator generates fake LT transfers that the discriminator cannot identify. This indicates the generator generates low thrust transfer features that have the same distribution as the real transfer features. The generated low thrust transfer data have a high convergence rate, and they can be used to efficiently produce training data for deep learning models. The proposed approach is validated by generating feasible LT transfers in a Near-Earth Asteroid (NEA) mission scenario. The convergence rate of GAN-generated samples is 84.3%.
    Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). (arXiv:2203.13366v5 [cs.IR] UPDATED)
    For a long time, different recommendation tasks typically require designing task-specific architectures and training objectives. As a result, it is hard to transfer the learned knowledge and representations from one task to another, thus restricting the generalization ability of existing recommendation approaches, e.g., a sequential recommendation model can hardly be applied or transferred to a review generation method. To deal with such issues, considering that language can describe almost anything and language grounding is a powerful medium to represent various problems or tasks, we present a flexible and unified text-to-text paradigm called "Pretrain, Personalized Prompt, and Predict Paradigm" (P5) for recommendation, which unifies various recommendation tasks in a shared framework. In P5, all data such as user-item interactions, user descriptions, item metadata, and user reviews are converted to a common format -- natural language sequences. The rich information from natural language assists P5 to capture deeper semantics for personalization and recommendation. Specifically, P5 learns different tasks with the same language modeling objective during pretraining. Thus, it serves as the foundation model for various downstream recommendation tasks, allows easy integration with other modalities, and enables instruction-based recommendation based on prompts. P5 advances recommender systems from shallow model to deep model to big model, and will revolutionize the technical form of recommender systems towards universal recommendation engine. With adaptive personalized prompt for different users, P5 is able to make predictions in a zero-shot or few-shot manner and largely reduces the necessity for extensive fine-tuning. On several recommendation benchmarks, we conduct experiments to show the effectiveness of P5. We release the source code at https://github.com/jeykigung/P5.
    NAAP-440 Dataset and Baseline for Network Architecture Accuracy Prediction. (arXiv:2209.06626v1 [cs.CV])
    Network architecture search (NAS) has become a common approach to developing and discovering new neural architectures for different target platforms and purposes. However, scanning the search space is comprised of long training processes of many candidate architectures, which is costly in terms of computational resources and time. Regression algorithms are a common tool to predicting a candidate architecture's accuracy, which can dramatically accelerate the search procedure. We aim at proposing a new baseline that will support the development of regression algorithms that can predict an architecture's accuracy just from its scheme, or by only training it for a minimal number of epochs. Therefore, we introduce the NAAP-440 dataset of 440 neural architectures, which were trained on CIFAR10 using a fixed recipe. Our experiments indicate that by using off-the-shelf regression algorithms and running up to 10% of the training process, not only is it possible to predict an architecture's accuracy rather precisely, but that the values predicted for the architectures also maintain their accuracy order with a minimal number of monotonicity violations. This approach may serve as a powerful tool for accelerating NAS-based studies and thus dramatically increase their efficiency. The dataset and code used in the study have been made public.  ( 2 min )
    Revisiting Neural Scaling Laws in Language and Vision. (arXiv:2209.06640v1 [cs.LG])
    The remarkable progress in deep learning in recent years is largely driven by improvements in scale, where bigger models are trained on larger datasets for longer schedules. To predict the benefit of scale empirically, we argue for a more rigorous methodology based on the extrapolation loss, instead of reporting the best-fitting (interpolating) parameters. We then present a recipe for estimating scaling law parameters reliably from learning curves. We demonstrate that it extrapolates more accurately than previous methods in a wide range of architecture families across several domains, including image classification, neural machine translation (NMT) and language modeling, in addition to tasks from the BIG-Bench evaluation benchmark. Finally, we release a benchmark dataset comprising of 90 evaluation tasks to facilitate research in this domain.  ( 2 min )
    Graph Perceiver IO: A General Architecture for Graph Structured Data. (arXiv:2209.06418v1 [cs.LG])
    Multimodal machine learning has been widely studied for the development of general intelligence. Recently, the remarkable multimodal algorithms, the Perceiver and Perceiver IO, show competitive results for diverse dataset domains and tasks. However, recent works, Perceiver and Perceiver IO, have focused on heterogeneous modalities, including image, text, and speech, and there are few research works for graph structured datasets. A graph is one of the most generalized dataset structures, and we can represent the other dataset, including images, text, and speech, as graph structured data. A graph has an adjacency matrix different from other dataset domains such as text and image, and it is not trivial to handle the topological information, relational information, and canonical positional information. In this study, we provide a Graph Perceiver IO, the Perceiver IO for the graph structured dataset. We keep the main structure of the Graph Perceiver IO as the Perceiver IO because the Perceiver IO already handles the diverse dataset well, except for the graph structured dataset. The Graph Perceiver IO is a general method, and it can handle diverse datasets such as graph structured data as well as text and images. Comparing the graph neural networks, the Graph Perceiver IO requires a lower complexity, and it can incorporate the local and global information efficiently. We show that Graph Perceiver IO shows competitive results for diverse graph-related tasks, including node classification, graph classification, and link prediction.  ( 3 min )
    Data Privacy and Trustworthy Machine Learning. (arXiv:2209.06529v1 [cs.LG])
    The privacy risks of machine learning models is a major concern when training them on sensitive and personal data. We discuss the tradeoffs between data privacy and the remaining goals of trustworthy machine learning (notably, fairness, robustness, and explainability).  ( 2 min )
    MLT-LE: predicting drug-target binding affinity with multi-task residual neural networks. (arXiv:2209.06274v1 [cs.LG])
    Assessing drug-target affinity is a critical step in the drug discovery and development process, but to obtain such data experimentally is both time consuming and expensive. For this reason, computational methods for predicting binding strength are being widely developed. However, these methods typically use a single-task approach for prediction, thus ignoring the additional information that can be extracted from the data and used to drive the learning process. Thereafter in this work, we present a multi-task approach for binding strength prediction. Our results suggest that these prediction can indeed benefit from a multi-task learning approach, by utilizing added information from related tasks and multi-task induced regularization.  ( 2 min )
    Learning state correspondence of reinforcement learning tasks for knowledge transfer. (arXiv:2209.06604v1 [cs.LG])
    Deep reinforcement learning has shown an ability to achieve super-human performance in solving complex reinforcement learning (RL) tasks only from raw-pixels. However, it fails to reuse knowledge from previously learnt tasks to solve new, unseen ones. Generalizing and reusing knowledge are the fundamental requirements for creating a truly intelligent agent. This work proposes a general method for one-to-one transfer learning based on generative adversarial network model tailored to RL task.  ( 2 min )
    Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation. (arXiv:2209.06620v1 [cs.LG])
    Among the reasons that hinder the application of reinforcement learning (RL) to real-world problems, two factors are critical: limited data and the mismatch of the testing environment compared to training one. In this paper, we attempt to address these issues simultaneously with the problem setup of distributionally robust offline RL. Particularly, we learn an RL agent with the historical data obtained from the source environment and optimize it to perform well in the perturbed one. Moreover, we consider the linear function approximation to apply the algorithm to large-scale problems. We prove our algorithm can achieve the suboptimality of $O(1/\sqrt{K})$ depending on the linear function dimension $d$, which seems to be the first result with sample complexity guarantee in this setting. Diverse experiments are conducted to demonstrate our theoretical findings, showing the superiority of our algorithm against the non-robust one.  ( 2 min )
    Personalized Emotion Detection using IoT and Machine Learning. (arXiv:2209.06464v1 [cs.LG])
    The Medical Internet of Things, a recent technological advancement in medicine, is incredibly helpful in providing real-time monitoring of health metrics. This paper presents a non-invasive IoT system that tracks patients' emotions, especially those with autism spectrum disorder. With a few affordable sensors and cloud computing services, the individual's heart rates are monitored and analyzed to study the effects of changes in sweat and heartbeats per minute for different emotions. Under normal resting conditions of the individual, the proposed system could detect the right emotion using machine learning algorithms with a performance of up to 92% accuracy. The result of the proposed approach is comparable with the state-of-the-art solutions in medical IoT.  ( 2 min )
    Age of Information in Federated Learning over Wireless Networks. (arXiv:2209.06623v1 [cs.LG])
    In this paper, federated learning (FL) over wireless networks is investigated. In each communication round, a subset of devices is selected to participate in the aggregation with limited time and energy. In order to minimize the convergence time, global loss and latency are jointly considered in a Stackelberg game based framework. Specifically, age of information (AoI) based device selection is considered at leader-level as a global loss minimization problem, while sub-channel assignment, computational resource allocation, and power allocation are considered at follower-level as a latency minimization problem. By dividing the follower-level problem into two sub-problems, the best response of the follower is obtained by a monotonic optimization based resource allocation algorithm and a matching based sub-channel assignment algorithm. By deriving the upper bound of convergence rate, the leader-level problem is reformulated, and then a list based device selection algorithm is proposed to achieve Stackelberg equilibrium. Simulation results indicate that the proposed device selection scheme outperforms other schemes in terms of the global loss, and the developed algorithms can significantly decrease the time consumption of computation and communication.  ( 2 min )
    Jointly Contrastive Representation Learning on Road Network and Trajectory. (arXiv:2209.06389v1 [cs.LG])
    Road network and trajectory representation learning are essential for traffic systems since the learned representation can be directly used in various downstream tasks (e.g., traffic speed inference, and travel time estimation). However, most existing methods only contrast within the same scale, i.e., treating road network and trajectory separately, which ignores valuable inter-relations. In this paper, we aim to propose a unified framework that jointly learns the road network and trajectory representations end-to-end. We design domain-specific augmentations for road-road contrast and trajectory-trajectory contrast separately, i.e., road segment with its contextual neighbors and trajectory with its detour replaced and dropped alternatives, respectively. On top of that, we further introduce the road-trajectory cross-scale contrast to bridge the two scales by maximizing the total mutual information. Unlike the existing cross-scale contrastive learning methods on graphs that only contrast a graph and its belonging nodes, the contrast between road segment and trajectory is elaborately tailored via novel positive sampling and adaptive weighting strategies. We conduct prudent experiments based on two real-world datasets with four downstream tasks, demonstrating improved performance and effectiveness. The code is available at https://github.com/mzy94/JCLRNT.  ( 2 min )
    PlaStIL: Plastic and Stable Memory-Free Class-Incremental Learning. (arXiv:2209.06606v1 [cs.CV])
    Plasticity and stability are needed in class-incremental learning in order to learn from new data while preserving past knowledge. Due to catastrophic forgetting, finding a compromise between these two properties is particularly challenging when no memory buffer is available. Mainstream methods need to store two deep models since they integrate new classes using fine tuning with knowledge distillation from the previous incremental state. We propose a method which has similar number of parameters but distributes them differently in order to find a better balance between plasticity and stability. Following an approach already deployed by transfer-based incremental methods, we freeze the feature extractor after the initial state. Classes in the oldest incremental states are trained with this frozen extractor to ensure stability. Recent classes are predicted using partially fine-tuned models in order to introduce plasticity. Our proposed plasticity layer can be incorporated to any transfer-based method designed for memory-free incremental learning, and we apply it to two such methods. Evaluation is done with three large-scale datasets. Results show that performance gains are obtained in all tested configurations compared to existing methods.  ( 2 min )
    Meta Pattern Concern Score: A Novel Metric for Customizable Evaluation of Multi-classification. (arXiv:2209.06408v1 [cs.LG])
    Classifiers have been widely implemented in practice, while how to evaluate them properly remains a problem. Commonly used two types of metrics respectively based on confusion matrix and loss function have different advantages in flexibility and mathematical completeness, while they struggle in different dilemmas like the insensitivity to slight improvements or the lack of customizability in different tasks. In this paper, we propose a novel metric named Meta Pattern Concern Score based on the abstract representation of the probabilistic prediction, as well as the targeted design for processing negative classes in multi-classification and reducing the discreteness of metric value, to achieve advantages of both the two kinds of metrics and avoid their weaknesses. Our metric provides customizability to pick out the model for specific requirements in different practices, and make sure it is also fine under traditional metrics at the same time. Evaluation in four kinds of models and six datasets demonstrates the effectiveness and efficiency of our metric, and a case study shows it can select a model to reduce 0.53% of dangerous misclassifications by sacrificing only 0.04% of training accuracy.
    Finite Sample Guarantees for Distributed Online Parameter Estimation with Communication Costs. (arXiv:2209.06678v1 [eess.SY])
    We study the problem of estimating an unknown parameter in a distributed and online manner. Existing work on distributed online learning typically either focuses on asymptotic analysis, or provides bounds on regret. However, these results may not directly translate into bounds on the error of the learned model after a finite number of time-steps. In this paper, we propose a distributed online estimation algorithm which enables each agent in a network to improve its estimation accuracy by communicating with neighbors. We provide non-asymptotic bounds on the estimation error, leveraging the statistical properties of the underlying model. Our analysis demonstrates a trade-off between estimation error and communication costs. Further, our analysis allows us to determine a time at which the communication can be stopped (due to the costs associated with communications), while meeting a desired estimation accuracy. We also provide a numerical example to validate our results.
    Efficient multi-relational network representation using primes. (arXiv:2209.06575v1 [cs.LG])
    Multi-relational networks play an important role in today's world and are utilized to capture complex relationships between the data. Their applications span many domains such as biomedical, financial, social, etc., and because of their increasing usability, it becomes crucial to find efficient ways to deal with the added complexity of multiple layers. In this work, we propose a novel approach to represent these complex networks using a single aggregated adjacency matrix, by utilizing primes as surrogates for the relations. Due to the fundamental theorem of arithmetic, this allows for a lossless, compact representation of the whole multi-relational graph, using a single adjacency matrix. Moreover, this representation enables the fast computation of multi-hop adjacency matrices, that can be useful for a variety of downstream tasks. We present simple and complex tasks in which this representation can be useful and showcase its efficiency and performance. Finally, we also provide insights on the advantages and the open challenges that still need to be addressed and motivate future work.  ( 2 min )
    Improving Self-Supervised Learning by Characterizing Idealized Representations. (arXiv:2209.06235v1 [cs.LG])
    Despite the empirical successes of self-supervised learning (SSL) methods, it is unclear what characteristics of their representations lead to high downstream accuracies. In this work, we characterize properties that SSL representations should ideally satisfy. Specifically, we prove necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes (e.g., linear or MLP) trained on that representation attain perfect accuracy. These requirements lead to a unifying conceptual framework for improving existing SSL methods and deriving new ones. For contrastive learning, our framework prescribes simple but significant improvements to previous methods such as using asymmetric projection heads. For non-contrastive learning, we use our framework to derive a simple and novel objective. Our resulting SSL algorithms outperform baselines on standard benchmarks, including SwAV+multicrops on linear probing of ImageNet.  ( 2 min )
    A Survey on Evolutionary Computation for Computer Vision and Image Analysis: Past, Present, and Future Trends. (arXiv:2209.06399v1 [cs.NE])
    Computer vision (CV) is a big and important field in artificial intelligence covering a wide range of applications. Image analysis is a major task in CV aiming to extract, analyse and understand the visual content of images. However, image-related tasks are very challenging due to many factors, e.g., high variations across images, high dimensionality, domain expertise requirement, and image distortions. Evolutionary computation (EC) approaches have been widely used for image analysis with significant achievement. However, there is no comprehensive survey of existing EC approaches to image analysis. To fill this gap, this paper provides a comprehensive survey covering all essential EC approaches to important image analysis tasks including edge detection, image segmentation, image feature analysis, image classification, object detection, and others. This survey aims to provide a better understanding of evolutionary computer vision (ECV) by discussing the contributions of different approaches and exploring how and why EC is used for CV and image analysis. The applications, challenges, issues, and trends associated to this research field are also discussed and summarised to provide further guidelines and opportunities for future research.
    SciMED: A Computational Framework For Physics-Informed Symbolic Regression with Scientist-In-The-Loop. (arXiv:2209.06257v1 [cs.LG])
    Discovering a meaningful, dimensionally homogeneous, symbolic expression that explains experimental data is a fundamental challenge in many scientific fields. We present a novel, open-source computational framework called Scientist-Machine Equation Detector (SciMED), which integrates scientific discipline wisdom in a scientist-in-the-loop approach with state-of-the-art symbolic regression (SR) methods. SciMED combines a genetic algorithm-based wrapper selection method with automatic machine learning and two levels of SR methods. We test SciMED on four configurations of the settling of a sphere with and without a non-linear aerodynamic drag force. We show that SciMED is sufficiently robust to discover the correct physically meaningful symbolic expressions from noisy data. Our results indicate better performance on these tasks than the state-of-the-art SR software package.  ( 2 min )
    Identification of Cognitive Workload during Surgical Tasks with Multimodal Deep Learning. (arXiv:2209.06208v1 [cs.LG])
    In operating Rooms (ORs), activities are usually different from other typical working environments. In particular, surgeons are frequently exposed to multiple psycho-organizational constraints that may cause negative repercussions on their health and performance. This is commonly attributed to an increase in the associated Cognitive Workload (CWL) that results from dealing with unexpected and repetitive tasks, as well as large amounts of information and potentially risky cognitive overload. In this paper, a cascade of two machine learning approaches is suggested for the multimodal recognition of CWL in a number of four different surgical tasks. First, a model based on the concept of transfer learning is used to identify if a surgeon is experiencing any CWL. Secondly, a Convolutional Neural Network (CNN) uses this information to identify different types of CWL associated to each surgical task. The suggested multimodal approach consider adjacent signals from electroencephalogram (EEG), functional near-infrared spectroscopy (fNIRS) and pupil eye diameter. The concatenation of signals allows complex correlations in terms of time (temporal) and channel location (spatial). Data collection is performed by a Multi-sensing AI Environment for Surgical Task $\&$ Role Optimisation platform (MAESTRO) developed at HARMS Lab. To compare the performance of the proposed methodology, a number of state-of-art machine learning techniques have been implemented. The tests show that the proposed model has a precision of 93%.  ( 3 min )
    Explainable AI for clinical and remote health applications: a survey on tabular and time series data. (arXiv:2209.06528v1 [cs.LG])
    Nowadays Artificial Intelligence (AI) has become a fundamental component of healthcare applications, both clinical and remote, but the best performing AI systems are often too complex to be self-explaining. Explainable AI (XAI) techniques are defined to unveil the reasoning behind the system's predictions and decisions, and they become even more critical when dealing with sensitive and personal health data. It is worth noting that XAI has not gathered the same attention across different research areas and data types, especially in healthcare. In particular, many clinical and remote health applications are based on tabular and time series data, respectively, and XAI is not commonly analysed on these data types, while computer vision and Natural Language Processing (NLP) are the reference applications. To provide an overview of XAI methods that are most suitable for tabular and time series data in the healthcare domain, this paper provides a review of the literature in the last 5 years, illustrating the type of generated explanations and the efforts provided to evaluate their relevance and quality. Specifically, we identify clinical validation, consistency assessment, objective and standardised quality evaluation, and human-centered quality assessment as key features to ensure effective explanations for the end users. Finally, we highlight the main research challenges in the field as well as the limitations of existing XAI methods.  ( 3 min )
    TrADe Re-ID -- Live Person Re-Identification using Tracking and Anomaly Detection. (arXiv:2209.06452v1 [cs.CV])
    Person Re-Identification (Re-ID) aims to search for a person of interest (query) in a network of cameras. In the classic Re-ID setting the query is sought in a gallery containing properly cropped images of entire bodies. Recently, the live Re-ID setting was introduced to represent the practical application context of Re-ID better. It consists in searching for the query in short videos, containing whole scene frames. The initial live Re-ID baseline used a pedestrian detector to build a large search gallery and a classic Re-ID model to find the query in the gallery. However, the galleries generated were too large and contained low-quality images, which decreased the live Re-ID performance. Here, we present a new live Re-ID approach called TrADe, to generate lower high-quality galleries. TrADe first uses a Tracking algorithm to identify sequences of images of the same individual in the gallery. Following, an Anomaly Detection model is used to select a single good representative of each tracklet. TrADe is validated on the live Re-ID version of the PRID-2011 dataset and shows significant improvements over the baseline.  ( 2 min )
    Applying wav2vec2 for Speech Recognition on Bengali Common Voices Dataset. (arXiv:2209.06581v1 [eess.AS])
    Speech is inherently continuous, where discrete words, phonemes and other units are not clearly segmented, and so speech recognition has been an active research problem for decades. In this work we have fine-tuned wav2vec 2.0 to recognize and transcribe Bengali speech -- training it on the Bengali Common Voice Speech Dataset. After training for 71 epochs, on a training set consisting of 36919 mp3 files, we achieved a training loss of 0.3172 and WER of 0.2524 on a validation set of size 7,747. Using a 5-gram language model, the Levenshtein Distance was 2.6446 on a test set of size 7,747. Then the training set and validation set were combined, shuffled and split into 85-15 ratio. Training for 7 more epochs on this combined dataset yielded an improved Levenshtein Distance of 2.60753 on the test set. Our model was the best performing one, achieving a Levenshtein Distance of 6.234 on a hidden dataset, which was 1.1049 units lower than other competing submissions.  ( 2 min )
    SEEK: model extraction attack against hybrid secure inference protocols. (arXiv:2209.06373v1 [cs.CR])
    Security concerns about a machine learning model used in a prediction-as-a-service include the privacy of the model, the query and the result. Secure inference solutions based on homomorphic encryption (HE) and/or multiparty computation (MPC) have been developed to protect all the sensitive information. One of the most efficient type of solution utilizes HE for linear layers, and MPC for non-linear layers. However, for such hybrid protocols with semi-honest security, an adversary can malleate the intermediate features in the inference process, and extract model information more effectively than methods against inference service in plaintext. In this paper, we propose SEEK, a general extraction method for hybrid secure inference services outputing only class labels. This method can extract each layer of the target model independently, and is not affected by the depth of the model. For ResNet-18, SEEK can extract a parameter with less than 50 queries on average, with average error less than $0.03\%$.  ( 2 min )
    Federated Pruning: Improving Neural Network Efficiency with Federated Learning. (arXiv:2209.06359v1 [cs.LG])
    Automatic Speech Recognition models require large amount of speech data for training, and the collection of such data often leads to privacy concerns. Federated learning has been widely used and is considered to be an effective decentralized technique by collaboratively learning a shared prediction model while keeping the data local on different clients devices. However, the limited computation and communication resources on clients devices present practical difficulties for large models. To overcome such challenges, we propose Federated Pruning to train a reduced model under the federated setting, while maintaining similar performance compared to the full model. Moreover, the vast amount of clients data can also be leveraged to improve the pruning results compared to centralized training. We explore different pruning schemes and provide empirical evidence of the effectiveness of our methods.  ( 2 min )
    A Review and Roadmap of Deep Learning Causal Discovery in Different Variable Paradigms. (arXiv:2209.06367v1 [cs.LG])
    Understanding causality helps to structure interventions to achieve specific goals and enables predictions under interventions. With the growing importance of learning causal relationships, causal discovery tasks have transitioned from using traditional methods to infer potential causal structures from observational data to the field of pattern recognition involved in deep learning. The rapid accumulation of massive data promotes the emergence of causal search methods with brilliant scalability. Existing summaries of causal discovery methods mainly focus on traditional methods based on constraints, scores and FCMs, there is a lack of perfect sorting and elaboration for deep learning-based methods, also lacking some considers and exploration of causal discovery methods from the perspective of variable paradigms. Therefore, we divide the possible causal discovery tasks into three types according to the variable paradigm and give the definitions of the three tasks respectively, define and instantiate the relevant datasets for each task and the final causal model constructed at the same time, then reviews the main existing causal discovery methods for different tasks. Finally, we propose some roadmaps from different perspectives for the current research gaps in the field of causal discovery and point out future research directions.  ( 3 min )
    Quantifying the Online Long-Term Interest in Research. (arXiv:2209.06212v1 [cs.DL])
    Research articles are being shared in increasing numbers on multiple online platforms. Although the scholarly impact of these articles has been widely studied, the online interest determined by how long the research articles are shared online remains unclear. Being cognizant of how long a research article is mentioned online could be valuable information to the researchers. In this paper, we analyzed multiple social media platforms on which users share and/or discuss scholarly articles. We built three clusters for papers, based on the number of yearly online mentions having publication dates ranging from the year 1920 to 2016. Using the online social media metrics for each of these three clusters, we built machine learning models to predict the long-term online interest in research articles. We addressed the prediction task with two different approaches: regression and classification. For the regression approach, the Multi-Layer Perceptron model performed best, and for the classification approach, the tree-based models performed better than other models. We found that old articles are most evident in the contexts of economics and industry (i.e., patents). In contrast, recently published articles are most evident in research platforms (i.e., Mendeley) followed by social media platforms (i.e., Twitter).  ( 3 min )
    PINCH: An Adversarial Extraction Attack Framework for Deep Learning Models. (arXiv:2209.06300v1 [cs.CR])
    Deep Learning (DL) models increasingly power a diversity of applications. Unfortunately, this pervasiveness also makes them attractive targets for extraction attacks which can steal the architecture, parameters, and hyper-parameters of a targeted DL model. Existing extraction attack studies have observed varying levels of attack success for different DL models and datasets, yet the underlying cause(s) behind their susceptibility often remain unclear. Ascertaining such root-cause weaknesses would help facilitate secure DL systems, though this requires studying extraction attacks in a wide variety of scenarios to identify commonalities across attack success and DL characteristics. The overwhelmingly high technical effort and time required to understand, implement, and evaluate even a single attack makes it infeasible to explore the large number of unique extraction attack scenarios in existence, with current frameworks typically designed to only operate for specific attack types, datasets and hardware platforms. In this paper we present PINCH: an efficient and automated extraction attack framework capable of deploying and evaluating multiple DL models and attacks across heterogeneous hardware platforms. We demonstrate the effectiveness of PINCH by empirically evaluating a large number of previously unexplored extraction attack scenarios, as well as secondary attack staging. Our key findings show that 1) multiple characteristics affect extraction attack success spanning DL model architecture, dataset complexity, hardware, attack type, and 2) partially successful extraction attacks significantly enhance the success of further adversarial attack staging.  ( 3 min )
    Graph Contrastive Learning with Personalized Augmentation. (arXiv:2209.06560v1 [cs.LG])
    Graph contrastive learning (GCL) has emerged as an effective tool for learning unsupervised representations of graphs. The key idea is to maximize the agreement between two augmented views of each graph via data augmentation. Existing GCL models mainly focus on applying \textit{identical augmentation strategies} for all graphs within a given scenario. However, real-world graphs are often not monomorphic but abstractions of diverse natures. Even within the same scenario (e.g., macromolecules and online communities), different graphs might need diverse augmentations to perform effective GCL. Thus, blindly augmenting all graphs without considering their individual characteristics may undermine the performance of GCL arts.To deal with this, we propose the first principled framework, termed as \textit{G}raph contrastive learning with \textit{P}ersonalized \textit{A}ugmentation (GPA), to advance conventional GCL by allowing each graph to choose its own suitable augmentation operations.In essence, GPA infers tailored augmentation strategies for each graph based on its topology and node attributes via a learnable augmentation selector, which is a plug-and-play module and can be effectively trained with downstream GCL models end-to-end. Extensive experiments across 11 benchmark graphs from different types and domains demonstrate the superiority of GPA against state-of-the-art competitors.Moreover, by visualizing the learned augmentation distributions across different types of datasets, we show that GPA can effectively identify the most suitable augmentations for each graph based on its characteristics.  ( 2 min )
    High-resolution semantically-consistent image-to-image translation. (arXiv:2209.06264v1 [cs.CV])
    Deep learning has become one of remote sensing scientists' most efficient computer vision tools in recent years. However, the lack of training labels for the remote sensing datasets means that scientists need to solve the domain adaptation problem to narrow the discrepancy between satellite image datasets. As a result, image segmentation models that are then trained, could better generalize and use an existing set of labels instead of requiring new ones. This work proposes an unsupervised domain adaptation model that preserves semantic consistency and per-pixel quality for the images during the style-transferring phase. This paper's major contribution is proposing the improved architecture of the SemI2I model, which significantly boosts the proposed model's performance and makes it competitive with the state-of-the-art CyCADA model. A second contribution is testing the CyCADA model on the remote sensing multi-band datasets such as WorldView-2 and SPOT-6. The proposed model preserves semantic consistency and per-pixel quality for the images during the style-transferring phase. Thus, the semantic segmentation model, trained on the adapted images, shows substantial performance gain compared to the SemI2I model and reaches similar results as the state-of-the-art CyCADA model. The future development of the proposed method could include ecological domain transfer, {\em a priori} evaluation of dataset quality in terms of data distribution, or exploration of the inner architecture of the domain adaptation model.  ( 3 min )
    A Hybrid Deep Learning Model-based Remaining Useful Life Estimation for Reed Relay with Degradation Pattern Clustering. (arXiv:2209.06429v1 [cs.LG])
    Reed relay serves as the fundamental component of functional testing, which closely relates to the successful quality inspection of electronics. To provide accurate remaining useful life (RUL) estimation for reed relay, a hybrid deep learning network with degradation pattern clustering is proposed based on the following three considerations. First, multiple degradation behaviors are observed for reed relay, and hence a dynamic time wrapping-based $K$-means clustering is offered to distinguish degradation patterns from each other. Second, although proper selections of features are of great significance, few studies are available to guide the selection. The proposed method recommends operational rules for easy implementation purposes. Third, a neural network for remaining useful life estimation (RULNet) is proposed to address the weakness of the convolutional neural network (CNN) in capturing temporal information of sequential data, which incorporates temporal correlation ability after high-level feature representation of convolutional operation. In this way, three variants of RULNet are constructed with health indicators, features with self-organizing map, or features with curve fitting. Ultimately, the proposed hybrid model is compared with the typical baseline models, including CNN and long short-term memory network (LSTM), through a practical reed relay dataset with two distinct degradation manners. The results from both degradation cases demonstrate that the proposed method outperforms CNN and LSTM regarding the index root mean squared error.  ( 3 min )
    Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset. (arXiv:2209.06358v1 [cs.SD])
    Non-reference speech quality models are important for a growing number of applications. The VoiceMOS 2022 challenge provided a dataset of synthetic voice conversion and text-to-speech samples with subjective labels. This study looks at the amount of variance that can be explained in subjective ratings of speech quality from metadata and the distribution imbalances of the dataset. Speech quality models were constructed using wav2vec 2.0 with additional metadata features that included rater groups and system identifiers and obtained competitive metrics including a Spearman rank correlation coefficient (SRCC) of 0.934 and MSE of 0.088 at the system-level, and 0.877 and 0.198 at the utterance-level. Using data and metadata that the test restricted or blinded further improved the metrics. A metadata analysis showed that the system-level metrics do not represent the model's system-level prediction as a result of the wide variation in the number of utterances used for each system on the validation and test datasets. We conclude that, in general, conditions should have enough utterances in the test set to bound the sample mean error, and be relatively balanced in utterance count between systems, otherwise the utterance-level metrics may be more reliable and interpretable.  ( 3 min )
    BERT-based Ensemble Approaches for Hate Speech Detection. (arXiv:2209.06505v1 [cs.CL])
    With the freedom of communication provided in online social media, hate speech has increasingly generated. This leads to cyber conflicts affecting social life at the individual and national levels. As a result, hateful content classification is becoming increasingly demanded for filtering hate content before being sent to the social networks. This paper focuses on classifying hate speech in social media using multiple deep models that are implemented by integrating recent transformer-based language models such as BERT, and neural networks. To improve the classification performances, we evaluated with several ensemble techniques, including soft voting, maximum value, hard voting and stacking. We used three publicly available Twitter datasets (Davidson, HatEval2019, OLID) that are generated to identify offensive languages. We fused all these datasets to generate a single dataset (DHO dataset), which is more balanced across different labels, to perform multi-label classification. Our experiments have been held on Davidson dataset and the DHO corpora. The later gave the best overall results, especially F1 macro score, even it required more resources (time execution and memory). The experiments have shown good results especially the ensemble models, where stacking gave F1 score of 97% on Davidson dataset and aggregating ensembles 77% on the DHO dataset.  ( 2 min )
    CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task. (arXiv:2209.06243v1 [cs.CL])
    We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated on all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equipping it with a word-level sequence tagger and an explanation extractor. Our results suggest that incorporating references during pretraining improves performance across several language pairs on downstream tasks, and that jointly training with sentence and word-level objectives yields a further boost. Furthermore, combining attention and gradient information proved to be the top strategy for extracting good explanations of sentence-level QE models. Overall, our submissions achieved the best results for all three tasks for almost all language pairs by a considerable margin.  ( 2 min )
    Prediction of the outcome of a Twenty-20 Cricket Match. (arXiv:2209.06346v1 [cs.LG])
    Twenty20 cricket, sometimes written Twenty-20, and often abbreviated to T20, is a short form of cricket. In a Twenty20 game the two teams of 11 players have a single innings each, which is restricted to a maximum of 20 overs. This version of cricket is especially unpredictable and is one of the reasons it has gained popularity over recent times. However, in this paper we try four different approaches for predicting the results of T20 Cricket Matches. Specifically we take in to account: previous performance statistics of the players involved in the competing teams, ratings of players obtained from reputed cricket statistics websites, clustering the players' with similar performance statistics and using an ELO based approach to rate players. We compare the performances of each of these approaches by using logistic regression, support vector machines, bayes network, decision tree, random forest.  ( 2 min )
    TSFool: Crafting High-quality Adversarial Time Series through Multi-objective Optimization to Fool Recurrent Neural Network Classifiers. (arXiv:2209.06388v1 [cs.LG])
    Deep neural network (DNN) classifiers are vulnerable to adversarial attacks. Although the existing gradient-based attacks have achieved good performance in feed-forward model and image recognition tasks, the extension for time series classification in the recurrent neural network (RNN) remains a dilemma, because the cyclical structure of RNN prevents direct model differentiation and the visual sensitivity to perturbations of time series data challenges the traditional local optimization objective to minimize perturbation. In this paper, an efficient and widely applicable approach called TSFool for crafting high-quality adversarial time series for the RNN classifier is proposed. We propose a novel global optimization objective named Camouflage Coefficient to consider how well the adversarial samples hide in class clusters, and accordingly redefine the high-quality adversarial attack as a multi-objective optimization problem. We also propose a new idea to use intervalized weighted finite automata (IWFA) to capture deeply embedded vulnerable samples having otherness between features and latent manifold to guide the approximation to the optimization solution. Experiments on 22 UCR datasets are conducted to confirm that TSFool is a widely effective, efficient and high-quality approach with 93.22% less local perturbation, 32.33% better global camouflage, and 1.12 times speedup to existing methods.  ( 3 min )
  • Open

    Balancing Statistical and Computational Precision: A General Theory and Applications to Sparse Regression. (arXiv:1609.07195v3 [stat.ME] UPDATED)
    Modern technologies are generating ever-increasing amounts of data. Making use of these data requires methods that are both statistically sound and computationally efficient. Typically, the statistical and computational aspects are treated separately. In this paper, we propose an approach to entangle these two aspects in the context of regularized estimation. Applying our approach to sparse and group-sparse regression, we show that it can improve on standard pipelines both statistically and computationally.  ( 2 min )
    Variational System Identification for Nonlinear State-Space Models. (arXiv:2012.05072v3 [stat.ML] UPDATED)
    This paper considers parameter estimation for nonlinear state-space models, which is an important but challenging problem. We address this challenge by employing a variational inference (VI) approach, which is a principled method that has deep connections to maximum likelihood estimation. This VI approach ultimately provides estimates of the model as solutions to an optimisation problem, which is deterministic, tractable and can be solved using standard optimisation tools. A specialisation of this approach for systems with additive Gaussian noise is also detailed. The proposed method is examined numerically on a range of simulated and real examples focusing on the robustness to parameter initialisation; additionally, favourable comparisons are performed against state-of-the-art alternatives.  ( 2 min )
    On the Maximum Hessian Eigenvalue and Generalization. (arXiv:2206.10654v2 [cs.LG] UPDATED)
    The mechanisms by which certain training interventions, such as increasing learning rates and applying batch normalization, improve the generalization of deep networks remains a mystery. Prior works have speculated that "flatter" solutions generalize better than "sharper" solutions to unseen data, motivating several metrics for measuring flatness (particularly $\lambda_{max}$, the largest eigenvalue of the Hessian of the loss); and algorithms, such as Sharpness-Aware Minimization (SAM) [1], that directly optimize for flatness. Other works question the link between $\lambda_{max}$ and generalization. In this paper, we present findings that call $\lambda_{max}$'s influence on generalization further into question. We show that: (1) while larger learning rates reduce $\lambda_{max}$ for all batch sizes, generalization benefits sometimes vanish at larger batch sizes; (2) by scaling batch size and learning rate simultaneously, we can change $\lambda_{max}$ without affecting generalization; (3) while SAM produces smaller $\lambda_{max}$ for all batch sizes, generalization benefits (also) vanish with larger batch sizes; (4) for dropout, excessively high dropout probabilities can degrade generalization, even as they promote smaller $\lambda_{max}$; and (5) while batch-normalization does not consistently produce smaller $\lambda_{max}$, it nevertheless confers generalization benefits. While our experiments affirm the generalization benefits of large learning rates and SAM for minibatch SGD, the GD-SGD discrepancy demonstrates limits to $\lambda_{max}$'s ability to explain generalization in neural networks.  ( 3 min )
    Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation. (arXiv:2209.06620v1 [cs.LG])
    Among the reasons that hinder the application of reinforcement learning (RL) to real-world problems, two factors are critical: limited data and the mismatch of the testing environment compared to training one. In this paper, we attempt to address these issues simultaneously with the problem setup of distributionally robust offline RL. Particularly, we learn an RL agent with the historical data obtained from the source environment and optimize it to perform well in the perturbed one. Moreover, we consider the linear function approximation to apply the algorithm to large-scale problems. We prove our algorithm can achieve the suboptimality of $O(1/\sqrt{K})$ depending on the linear function dimension $d$, which seems to be the first result with sample complexity guarantee in this setting. Diverse experiments are conducted to demonstrate our theoretical findings, showing the superiority of our algorithm against the non-robust one.  ( 2 min )
    Improving Self-Supervised Learning by Characterizing Idealized Representations. (arXiv:2209.06235v1 [cs.LG])
    Despite the empirical successes of self-supervised learning (SSL) methods, it is unclear what characteristics of their representations lead to high downstream accuracies. In this work, we characterize properties that SSL representations should ideally satisfy. Specifically, we prove necessary and sufficient conditions such that for any task invariant to given data augmentations, desired probes (e.g., linear or MLP) trained on that representation attain perfect accuracy. These requirements lead to a unifying conceptual framework for improving existing SSL methods and deriving new ones. For contrastive learning, our framework prescribes simple but significant improvements to previous methods such as using asymmetric projection heads. For non-contrastive learning, we use our framework to derive a simple and novel objective. Our resulting SSL algorithms outperform baselines on standard benchmarks, including SwAV+multicrops on linear probing of ImageNet.  ( 2 min )
    Finite Sample Guarantees for Distributed Online Parameter Estimation with Communication Costs. (arXiv:2209.06678v1 [eess.SY])
    We study the problem of estimating an unknown parameter in a distributed and online manner. Existing work on distributed online learning typically either focuses on asymptotic analysis, or provides bounds on regret. However, these results may not directly translate into bounds on the error of the learned model after a finite number of time-steps. In this paper, we propose a distributed online estimation algorithm which enables each agent in a network to improve its estimation accuracy by communicating with neighbors. We provide non-asymptotic bounds on the estimation error, leveraging the statistical properties of the underlying model. Our analysis demonstrates a trade-off between estimation error and communication costs. Further, our analysis allows us to determine a time at which the communication can be stopped (due to the costs associated with communications), while meeting a desired estimation accuracy. We also provide a numerical example to validate our results.  ( 2 min )
    Natural Reweighted Wake-Sleep. (arXiv:2008.06687v4 [cs.LG] UPDATED)
    Helmholtz Machines (HMs) are a class of generative models composed of two Sigmoid Belief Networks (SBNs), acting respectively as an encoder and a decoder. These models are commonly trained using a two-step optimization algorithm called Wake-Sleep (WS) and more recently by improved versions, such as Reweighted Wake-Sleep (RWS) and Bidirectional Helmholtz Machines (BiHM). The locality of the connections in an SBN induces sparsity in the Fisher Information Matrices associated to the probabilistic models, in the form of a finely-grained block-diagonal structure. In this paper we exploit this property to efficiently train SBNs and HMs using the natural gradient. We present a novel algorithm, called Natural Reweighted Wake-Sleep (NRWS), that corresponds to the geometric adaptation of its standard version. In a similar manner, we also introduce Natural Bidirectional Helmholtz Machine (NBiHM). Differently from previous work, we will show how for HMs the natural gradient can be efficiently computed without the need of introducing any approximation in the structure of the Fisher information matrix. The experiments performed on standard datasets from the literature show a consistent improvement of NRWS and NBiHM not only with respect to their non-geometric baselines but also with respect to state-of-the-art training algorithms for HMs. The improvement is quantified both in terms of speed of convergence as well as value of the log-likelihood reached after training.  ( 3 min )
    Minimax risk classifiers with 0-1 loss. (arXiv:2201.06487v3 [stat.ML] UPDATED)
    Supervised classification techniques use training samples to learn a classification rule with small expected 0-1 loss (error probability). Conventional methods enable tractable learning and provide out-of-sample generalization by using surrogate losses instead of the 0-1 loss and considering specific families of rules (hypothesis classes). This paper presents minimax risk classifiers (MRCs) that minimize the worst-case 0-1 loss over general classification rules and provide tight performance guarantees at learning. We show that MRCs are strongly universally consistent using feature mappings given by characteristic kernels. The paper also proposes efficient optimization techniques for MRC learning and shows that the methods presented can provide accurate classification together with tight performance guarantees in practice.  ( 2 min )
    $\pi$VAE: a stochastic process prior for Bayesian deep learning with MCMC. (arXiv:2002.06873v6 [cs.LG] UPDATED)
    Stochastic processes provide a mathematically elegant way model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. In practice, however, efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational autoencoder (VAE) called the prior encoding variational autoencoder ($\pi$VAE). The $\pi$VAE is finitely exchangeable and Kolmogorov consistent, and thus is a continuous stochastic process. We use $\pi$VAE to learn low dimensional embeddings of function classes. We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions to enable statistical inference (such as the integral of a log Gaussian process). For popular tasks, such as spatial interpolation, $\pi$VAE achieves state-of-the-art performance both in terms of accuracy and computational efficiency. Perhaps most usefully, we demonstrate that the low dimensional independently distributed latent space representation learnt provides an elegant and scalable means of performing Bayesian inference for stochastic processes within probabilistic programming languages such as Stan.  ( 3 min )
    Riemannian Langevin Algorithm for Solving Semidefinite Programs. (arXiv:2010.11176v4 [stat.ML] UPDATED)
    We propose a Langevin diffusion-based algorithm for non-convex optimization and sampling on a product manifold of spheres. Under a logarithmic Sobolev inequality, we establish a guarantee for finite iteration convergence to the Gibbs distribution in terms of Kullback--Leibler divergence. We show that with an appropriate temperature choice, the suboptimality gap to the global minimum is guaranteed to be arbitrarily small with high probability. As an application, we consider the Burer--Monteiro approach for solving a semidefinite program (SDP) with diagonal constraints, and analyze the proposed Langevin algorithm for optimizing the non-convex objective. In particular, we establish a logarithmic Sobolev inequality for the Burer--Monteiro problem when there are no spurious local minima, but under the presence saddle points. Combining the results, we then provide a global optimality guarantee for the SDP and the Max-Cut problem. More precisely, we show that the Langevin algorithm achieves $\epsilon$ accuracy with high probability in $\widetilde{\Omega}( \epsilon^{-5} )$ iterations.  ( 2 min )
    Learning Value-at-Risk and Expected Shortfall. (arXiv:2209.06476v1 [q-fin.CP])
    We propose a non-asymptotic convergence analysis of a two-step approach to learn a conditional value-at-risk (VaR) and expected shortfall (ES) in a nonparametric setting using Rademacher and Vapnik-Chervonenkis bounds. Our approach for the VaR is extended to the problem of learning at once multiple VaRs corresponding to different quantile levels. This results in efficient learning schemes based on neural network quantile and least-squares regressions. An a posteriori Monte Carlo (non-nested) procedure is introduced to estimate distances to the ground-truth VaR and ES without access to the latter. This is illustrated using numerical experiments in a Gaussian toy-model and a financial case-study where the objective is to learn a dynamic initial margin.  ( 2 min )
    Small Transformers Compute Universal Metric Embeddings. (arXiv:2209.06788v1 [cs.LG])
    We study representations of data from an arbitrary metric space $\mathcal{X}$ in the space of univariate Gaussian mixtures with a transport metric (Delon and Desolneux 2020). We derive embedding guarantees for feature maps implemented by small neural networks called \emph{probabilistic transformers}. Our guarantees are of memorization type: we prove that a probabilistic transformer of depth about $n\log(n)$ and width about $n^2$ can bi-H\"{o}lder embed any $n$-point dataset from $\mathcal{X}$ with low metric distortion, thus avoiding the curse of dimensionality. We further derive probabilistic bi-Lipschitz guarantees which trade off the amount of distortion and the probability that a randomly chosen pair of points embeds with that distortion. If $\mathcal{X}$'s geometry is sufficiently regular, we obtain stronger, bi-Lipschitz guarantees for all points in the dataset. As applications we derive neural embedding guarantees for datasets from Riemannian manifolds, metric trees, and certain types of combinatorial graphs.  ( 2 min )
    Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs. (arXiv:2209.06716v1 [cs.LG])
    Single-cell RNA-seq datasets are growing in size and complexity, enabling the study of cellular composition changes in various biological/clinical contexts. Scalable dimensionality reduction techniques are in need to disentangle biological variation in them, while accounting for technical and biological confounders. In this work, we extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets while explicitly accounting for technical and biological confounders. The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast stochastic variational inference. We demonstrate its ability to reconstruct latent signatures of innate immunity recovered in Kumasaka et al. (2021) with 9x lower training time. We further analyze a COVID dataset and demonstrate across a cohort of 130 individuals, that this framework enables data integration while capturing interpretable signatures of infection. Specifically, we explore COVID severity as a latent dimension to refine patient stratification and capture disease-specific gene expression.  ( 2 min )
    Spectral embedding and the latent geometry of multipartite networks. (arXiv:2202.03945v2 [stat.ME] UPDATED)
    Spectral embedding finds vector representations of the nodes of a network, based on the eigenvectors of its adjacency or Laplacian matrix, and has found applications throughout the sciences. Many such networks are multipartite, meaning their nodes can be divided into groups and nodes of the same group are never connected. When the network is multipartite, this paper demonstrates that the node representations obtained via spectral embedding live near group-specific low-dimensional subspaces of a higher-dimensional ambient space. For this reason we propose a follow-on step after spectral embedding, to recover node representations in their intrinsic rather than ambient dimension, proving uniform consistency under a low-rank, inhomogeneous random graph model. Our method naturally generalizes bipartite spectral embedding, in which node representations are obtained by singular value decomposition of the biadjacency or bi-Laplacian matrix.  ( 2 min )
    Model-based recursive partitioning for discrete event times. (arXiv:2209.06592v1 [stat.ME])
    Model-based recursive partitioning (MOB) is a semi-parametric statistical approach allowing the identification of subgroups that can be combined with a broad range of outcome measures including continuous time-to-event outcomes. When time is measured on a discrete scale, methods and models need to account for this discreetness as otherwise subgroups might be spurious and effects biased. The test underlying the splitting criterion of MOB, the M-fluctuation test, assumes independent observations. However, for fitting discrete time-to-event models the data matrix has to be modified resulting in an augmented data matrix violating the independence assumption. We propose MOB for discrete Survival data (MOB-dS) which controls the type I error rate of the test used for data splitting and therefore the rate of identifying subgroups although none is present. MOB-ds uses a permutation approach accounting for dependencies in the augmented time-to-event data to obtain the distribution under the null hypothesis of no subgroups being present. Through simulations we investigate the type I error rate of the new MOB-dS and the standard MOB for different patterns of survival curves and event rates. We find that the type I error rates of the test is well controlled for MOB-dS, but observe some considerable inflations of the error rate for MOB. To illustrate the proposed methods, MOB-dS is applied to data on unemployment duration.  ( 2 min )

  • Open

    [D] Classifier for mixture of components
    Given a dataset, features (binary values per feature) and a set of multiple labels (single label per entity), I can train a model to classify incoming data to a single label. But let's say I want to classify incoming data that is a mixture of two labels. Imagine if we took the binary features of one sample and just performed an "OR" operation with a second sample. And we want to find out the two most likely labels that were mixed. On first glance, this might seem like a multi-label classification problem. However, the training data are single label entities. Will training a multi-label model on single label entities give me the ability to classify mixtures? An alternative, brute force method might be to transform my single label entities into all possible pair-wise combinations and train based on the pair-wise mixture dataset? Are there existing solutions for this? Any input would be greatly appreciated. submitted by /u/daemonk [link] [comments]  ( 89 min )
    [N] Discord Talk: Data Labeling and Versioning for Production Retraining
    Data-centric AI doesn't just stop with cleaning and preparing data for model training - there are rich insights to be gleaned from production data. By analyzing, segmenting, and selectively re-labeling your production inference data, you can generate datasets for future model retraining. This talk will show you how you can use human-in-the-loop oversight to generate high-quality, labeled datasets from your prediction data for future model retraining. Tune in on Sept 22nd at 12:30 PM EDT. submitted by /u/modzykirsten [link] [comments]  ( 89 min )
    [P] How to handle large chunks of missing data for a subset of variables?
    I have a scenario where I will be modeling many different parts on many different machines using IoT sensor data. That alone is going to be a large amount of models. However, an added feature (that I was not told about) will be a list of questions that will be asked of the users on top of the fully automated sensor data. The idea is to raise an alert when a part might be malfunctioning, then ask the user a list of questions that cannot be found via existing sensors and give them an updated score. My question is: is there a way to incorporate this large chunk of missing data coming from the questions into the models (there will be a fair amount of test cases where no questions have been asked), or will I have to perform at least two models for each part (i.e. a model without the questions answered and a model with questions answered)? I know that I cannot just simply use the questions as is in the model because there would be a HUGE correlation between the question variables. I can probably impute those missing question values using a number of techniques, but I've never done that and am not sure how this will affect the model's output. Thanks in advance for any advice/recommendations! submitted by /u/the_planes_walker [link] [comments]  ( 89 min )
    Fitness function in Meta heuristics [Research]
    [R] I am working on research project using BAT algorithm with Feed Forward Neural Networks. I want to know that there are many fitness functions available for these algorithms, how we select a fitness function for our work. Means what makes a fitness function suitable for a problem. ​ I have done all of the other work for my project, the only confusion is the fitness function involved. ​ I have read papers of other researchers using Meta heuristics algorithms, but they even don't discuss the fitness function used. submitted by /u/Horseman099 [link] [comments]  ( 88 min )
    [P] Looking for state of the art clustering algorithms
    Good day, may I ask if anyone knows the current state of the art for clustering algorithms. While searching I only found variations of DBSCAN and Gaussian Mixture Models. Are these two still relevant or are there newer approaches used in the industry? I have some driving data to work with and I'm currently planning to do a comparison of various clustering algorithms with SVM, PCA, ANN's and more. Thank you :) submitted by /u/aswd1908 [link] [comments]  ( 89 min )
    [D] websites for projects volounteering
    Hi everyone, I'm finishing my MSc in Data Science, specialising in NLP and Social Network Analysis. I'd like to improve my skills and enrich my CV with a bunch of extra projects, preferably in NLP. Could you suggest me some sites where I can apply? Thank you. submitted by /u/Similar-Year4215 [link] [comments]  ( 88 min )
    [P] Pretty Jupyter 2.0: An Easy-To-Use Python Package For Beautiful Html Reports From Jupyter Notebooks
    Pretty Jupyter is an easy-to-use package that creates beautifully styled and dynamic html webpage from Jupyter notebook. Its repo is available here: https://github.com/JanPalasek/pretty-jupyter . Check out the demo and compare it with the default jupyter. You can try also Pretty Jupyter online without the need to install it. Main Features Visually appealing styles. Table of Contents can be automatically generated. Using Python variables in Markdown. Tabsets for hiding section content behind clickable tabs. Code Folding: Show/Hide code to filter out unnecessary content. Themes: Selection from a wide variaty of available themes. Wide range of configuration options with sensible defaults. Unobtrusive syntax that works well in notebook environments. Everything is integrated via JavaScript in the output web-page, so no Python kernel running in the background is needed. Whats new in 2.0? Wide range of notebook-level metadata to customize the default output. For example we can turn off inputs. Cell-level metadata to override various settings on the cell level (e.g. we can remove individual cells from the output). For example an input can be turned off in one cell, while generally being turned off in the notebook. Overriding all notebook-level metadata from command-line. Wide range of integrated themes. Easier syntax. New design. New wide range of examples. submitted by /u/Jan2579 [link] [comments]  ( 102 min )
    [P] Deep Learning-Powered Speech Recognition Service for Subtitling
    Hi everyone, Our team has built an automated Speech Recognition service that generates subtitle files for any video or audio file and can translate into 45+ languages. It's powered fully by Deep Learning and beats other implementations like YouTube captions. If you'd like to try it with your own content, it's free to create an account and use. Learn more about it here: https://www.smartmine.net/video-services/subtitling-description Try it free here: https://ai.smartmine.net/service/speech-recognition/captioning It has some great features like: Accurate subtitles powered by DL Speech recognition in 11 languages DL-powered translation into 45+ languages Multiple speaker recognition Subtitle editor to fine-tune results We're a small company, so we would really appreciate any feedback you have! submitted by /u/aL_eX49 [link] [comments]  ( 90 min )
    [P] Flask - YOLOv5
    Hello, At the moment i am implementing YOLOv5 on a flask server which is working. But now i want to use the detected objects in a function to process the detected objects and send the result in realtime to a react frontend. For this I am using a the moment SSE but it is very slow. Therefore I would like to have a faster solution. It would be nice if someone could help me. submitted by /u/stoemb- [link] [comments]  ( 89 min )
    [D] Convolutional Neural Networks terminology and definitions for medicine (neurorad)
    I am about to conduct a systematic review, in short: convolutional neural networks (CNNs) used for neuroradiology (CT imaging) and their relevance/prospects for clinical use. Developing a scientific reproducible search protocol for finding relevant papers on CNNs applied to CT imaging has turned out tricky. How to semantically and scientifically define "CNN architecture" when the variety of models, (sometimes overlapping) terms and 'nicknames' for different solutions is vast and ever-expanding? The search requires unambiguous definitions. "Convolut*/ConvNet/CNN" as search terms/keywords for "convolutional (NNs)" >> I am afraid is not a wide enough scope as convolution operations are used in various ML models, and spelling out "convolutional (neural network)" is not done but considered obvious? As I am not familiar with all existing backbone convolution networks and their names (ShuffleNet, Inception, GoogLeNet etc), it is impossible to systematically add them all to the search protocol. Inclusion and exclusion criteria based on algorithm details is hard: Consulting full texts for details, convolution/CNN/ConvNet is not necessarily explicitly stated: a quick Cntrl+F: "conv" doesn't catch a thing. However, the algorithm used may be mentioned by its name e.g. AlexNet, ResNet, some of which are often (?) at least partly convolution-based. The AI tree gets a bit messy at the level of CNNs: RNNs, ResNets, AEs, GANs, TransfLearn... Methods and terminology overlap and evolve constantly. Where do I draw the lines for CNNs. I cannot afford consulting the code, I must rely on text descriptions. The problem is the number of papers which makes it almost impossible to go through in such detail to understand the model architecture. Automation for keyword hunting is also hard, as I don't have a set of all possible CNN keywords. I am a medic and my abilities in understanding ML terminology are clearly limited. Any ideas? submitted by /u/Ok-Professional-6788 [link] [comments]  ( 94 min )
    [N] THUNET: 10%+ higher compression rate than zip, THUNET models support saving in 7z format.
    What is THUNET? A deep learning net/framework named "TsingHua University NET", short for "THUNET", is for non-commercial, educational, scientific purpose for the deep learning community. How to build a neural network with THUNET? Next, I will explain how to use THUNET to save and load a model. Models are saved in 7z format, thus gain higher compression rate than zip format by 10%+.Zip / 7zip Compression Differences Tutorial-3: Model Saving and Loading Model serialization makes use of the 7z format instead of the legacy zip format for higher compression rate. Referred from Wiki article on comparison of zip and 7z In 2011, TopTenReviews found that the 7z compression was at least 17% better than ZIP,[15] and 7-Zip's own site has since 2002 reported that while compression ratio results …  ( 92 min )
    [D] [R] How to have a single output that is based on the ‘best of 2 worlds’?
    Hi everyone, I am currently doing a research regarding this idea, and was wondering if anyone has any suggestions or advice that I could further look into. Imagine that I have 2 ground truths available, let’s call them y1 and y2. y1 is very good at doing A, whereas y2 is better at doing B. y1 is bad at B, and y2 is bad at A. Does anyone have any idea how I can have a model that can give me a single output, that make use of both y1 and y2 such that the final output is good at both A and B? All the variables are continuous, can also be considered as images. I was thinking along the ways of multi-task learning, or having a combination of loss functions into 1 single loss, but I am still stumped on what’s the best way going forward. Some advice would be beneficial. Thanks submitted by /u/plsendfast [link] [comments]  ( 90 min )
    [P] TabTransformer: Deep Dive w. Blog, Package & Notebook
    Hi! I've written an article which explores TabTransformer, in particular how the multi-headed attention can be applied to the categorical variables. Here's the link if you're interested in finding out more. The main idea is quite simple - to use Transformer blocks to contextualise categorical embeddings. These embeddings are then concatenated with numerical features and passed to MLP to make a prediction. How well does it work? It turns out that the addition of Transformer blocks with multi-headed attention layers can significantly improve the MLP's performance (but it still underperforms compared to GBDTs). TabTransformer To simplify the usage of the model, I've started working on TabTransformerTF package, but it's still WIP so feel free to report issues when you find some. Also, to showcase the use of the package, I've participated in the last month's tabular playground and got into top 35% with this notebook. Not great, but also not terrible given that the dataset was quite small and messy. submitted by /u/blessedorcursed [link] [comments]  ( 89 min )
    [Project] Deep Learning Docker Image
    Hey all, I am sharing a deep learning docker image that can be used to save you time setting up a new deep learning system. It gas PyTorch, TensorFlow, Pandas, Numpy, Scipy, Matplotlib, and many more common packages installed and configured. please visit: matifali/DockerDL: Deep Learning Docker Image (github.com) Feel free to create issues or PRs, if you face problems or have valuable feedback. submitted by /u/tech_geeky [link] [comments]  ( 88 min )
    [D] NeurIPS 2022 Paper Acceptance Result
    NeurIPS 2022 paper acceptance results are supposed to be released at 1pm (PDT) on September 14. I thought to create a discussion thread for us to countdown and discuss any celebration/issue/complain/feedback or anything else. There is so much noise in the reviews every year. Some good work that the authors are proud of might get rejected because of the noisy system, given that NeurIPS is growing so large these years. We should keep in mind that the work is still valuable no matter what the final result is. ---------------- PS: More than 150 people are looking at this thread at 1:30pm PDT (on September 14). That's quite a lot. https://preview.redd.it/fxgrk4vjyvn91.png?width=913&format=png&auto=webp&s=8d58e76fd9996f4650d5cdd6ce1d2e5bdf9b0245 submitted by /u/zy415 [link] [comments]  ( 109 min )
    [D] Who here are convinced that they have a really good setup that keeps track of their ML experiments?
    Who here are convinced that they have a really good setup that keeps track of their ML experiments? What is your setup (what tools / practices etc.)? Flex on me -- I just genuinely want to know all the possibilities of good ML experiment tracking practices. submitted by /u/glai9665 [link] [comments]  ( 97 min )
    [R] Proceedings from the "Conference on Causal Learning and Reasoning" (PMLR page)
    Came across the proceedings from this conference, and I thought the papers therein would be of interesting to people here. One that stood out to me was Partial Identification with Noisy Covariates: A Robust Optimization Approach. submitted by /u/bikeskata [link] [comments]  ( 88 min )
    [D] Identify similar observations in a large data set
    I have a large time series dataset were some people get flagged manually for being at risk for an outcome and others automatically by the system. However these two methods end up missing a large number of people. I want to try to identify similar rows in the data using the information I have from rows of manually and automatically flagged people. It's not a fraud detection problem but I was thinking of looking at those methods for ideas since I feel like it could be somewhat similar. I'm thinking that I need to to build a model to train it on the labeled rows and then use it to predict unlabeled rows in the data. References and suggestions would be greatly appreciated. I'm hoping to just be pointed in the right direction. submitted by /u/SomaDomaBoma [link] [comments]  ( 89 min )
  • Open

    I used NoodleSoup’s prompt generator and got some… trippy results
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    AI Dream 79 - Wild new Project! Freestyle 1
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    Discord Talk: Data Labeling and Versioning for Production Retraining
    Data-centric AI doesn't just stop with cleaning and preparing data for model training - there are rich insights to be gleaned from production data. By analyzing, segmenting, and selectively re-labeling your production inference data, you can generate datasets for future model retraining. This talk will show you how you can use human-in-the-loop oversight to generate high-quality, labeled datasets from your prediction data for future model retraining. Tune in on Sept 22nd at 12:30 PM EDT. submitted by /u/modzykirsten [link] [comments]  ( 88 min )
    AI that edits text based on prompt?
    is there an AI that can change pre existing text based on a theme/prompt? a random example: if i gave it a paragraph from Lord of the rings but i wanted Frodo to be Donald duck can GPT-3 do this? or any others submitted by /u/Specialist_Village_5 [link] [comments]  ( 87 min )
    it took me a lot of time and effort to create the generation of this clip, in spite of the numerous failures and technical errors, I still achieved the result that I wanted so much, I hope you enjoy it!
    submitted by /u/nalr00n [link] [comments]  ( 90 min )
    Amazing bed time story video with ai
    submitted by /u/Due-Ad9795 [link] [comments]  ( 86 min )
    Numbuh 841 XD
    submitted by /u/VIRUS-AOTOXIN [link] [comments]  ( 86 min )
    Futurisric Astronaut - Dall-E 2
    submitted by /u/Babylon_6 [link] [comments]  ( 86 min )
    4 Benefits of Using Artificial Intelligence in Schools
    For years, teachers have struggled to help every student with their individualized educational needs. This gets, even more, tougher in a class of twenty, thirty, or forty students in which every student has to pass through the same tests, irrespective of the student's personalized needs. That said, the schoolrooms of today have not changed much in the past 50 years. Students sit in a room together and complete the same lessons—typically using the same textbooks—no matter their learning skills or expertise in a particular subject. Some students get left behind. Others are left unchallenged and bored by this one-size-for-all approach. AI could do all this now. AI today is helping teachers in creating a series of smart content programs and intelligent tutoring systems that help students learn in a more customized approach. Students will be able to learn new things in a much better way with advanced tutor apps. Such apps can become education mentors for students and provide engaging learning opportunities.AI can now track the performance of an individual student based on his previous grades, participation, and performances and help a student realize his maximum potential. The rest of this article covers some ways in which AI is making education smarter, cheaper, and more accessible to all: AI as a tutor AI can automate grading AI can provide cognitive insights in classrooms AI can improve the education system Read more.... https://owlcation.com/academia/4-Amazing-Benefits-of-Artificial-Intelligence-in-Schools submitted by /u/IcyCartoonist1955 [link] [comments]  ( 89 min )
    AI is getting scary good
    submitted by /u/BigboiJoJi [link] [comments]  ( 89 min )
    How Can AI Change Your Financial Future?
    submitted by /u/sowmyasirisetty [link] [comments]  ( 87 min )
    What're some other AIs?
    So I've seen upscaling (video or image) AIs, AIs that increase FPS, AIs that transfer facial or body movements, AIs that transfer clothes. What're some other "useful" AI? submitted by /u/Got70Types0fMalware [link] [comments]  ( 90 min )
    Is anyone familiar with Real-ESRGAN?
    How do I increase of decrease scale ratio? How do I activae TTA? How does title size work? How do I covert bathes of images? submitted by /u/Got70Types0fMalware [link] [comments]  ( 87 min )
    Rick Sanchez by Tilda Swinton [xpost /r/dreamcasting]
    submitted by /u/dream_casting [link] [comments]  ( 89 min )
    Bruce Willis as Mark Zuckerberg [crosspost /r/dreamcasting]
    submitted by /u/dream_casting [link] [comments]  ( 87 min )
    This will give you nightmares for months!
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 91 min )
  • Open

    The Transformer Attention Mechanism
    Before the introduction of the Transformer model, the use of attention for neural machine translation was being implemented by RNN-based encoder-decoder architectures. The Transformer model revolutionized the implementation of attention by dispensing of recurrence and convolutions and, alternatively, relying solely on a self-attention mechanism.  We will first be focusing on the Transformer attention mechanism in […] The post The Transformer Attention Mechanism appeared first on Machine Learning Mastery.
  • Open

    Recomendations of framework/library for MARL
    I'm new to MARL and I'm looking for some open source implementations that I could use in a project. I have some previous experience in single agent RL, mainly with SB3 and gym, but just now started reading some MARL papers. I'm mainly looking for a good balance between performance, good documentation and ease of use. So far, I've taken look at Mava and RLlib. Mava seems like a very complete option, though I'm not at all familiar with the API and it maybe something simpler could also do the trick. As for the environment library, I was considering PettingZoo, since it has a very similar api to gym. Thought I might as well ask here first, as people can suggest other options for me to investigate or even give me some pros and cons they have learned from past experience. submitted by /u/Ok_Signature_4944 [link] [comments]  ( 88 min )
    "Foundations of Deep Reinforcement Learning" - an Interview with Pieter Abbeel, PhD
    submitted by /u/Open_Data_Science [link] [comments]  ( 87 min )
    SpaceRobotEnv is an open-sourced environments for trajectory planning of free-floating space robots.
    SpaceRobotEnv is an open-sourced environments for trajectory planning of free-floating space robots. Reaching high-level planning accuracy, bimanual coordination and end-to-end control remains an open challenge for space robotics researchers. To better help the community study this problem, SpaceRobotEnv are developed with the following key features: Real Space Environment; Dynamic coupling control; Image input. URL: https://github.com/Tsinghua-Space-Robot-Learning-Group/SpaceRobotEnv. Note: our repo can be found in the OpenAI Gym Documentation now. Please see SpaceRobotEnv. Hope everyone enjoy it! https://reddit.com/link/xe3e54/video/u92i2yqg3un91/player submitted by /u/Shengjie_Wang [link] [comments]  ( 100 min )
    How does action evaluation work in PPO?
    I'm looking at this code (https://github.com/marlbenchmark/on-policy/blob/c662b377694bd0311b760ff1501384a424b90b24/onpolicy/algorithms/r_mappo/algorithm/r_actor_critic.py#L72) and there's something I don't understand. As far as I understand, a PPO update consists of taking a batch of previous observations, actions, etc, so as to do action evaluation and give three things as outputs: 1 A value 2 Actions log probs 3 Action entropies Say the observation is an image. What I don't understand is how this is gonna work if the inputs consists of multiple images stacked together. The agent is gonna receive all these images and... How does this make sense? I'm sure someone can help me understand. Thanks! submitted by /u/No_Possibility_7588 [link] [comments]  ( 88 min )
    Reward like sin/cos
    Is it possible to get reward if my value should be changing like sin/cos? For example, i know that value should be like sin/cos ( should rise and then down), but i cannot tell exact numbers,its should be like sin wave on graph, so there must be min then max, then min etc . I know how to achieve exact number, but how to check is my value behaves like sin/cos? is there any reward solution for this type of tasks? P.S is there some resources for reward design ideas? submitted by /u/IndependenceCivil576 [link] [comments]  ( 89 min )
  • Open

    Announcing Visual Conversation Builder for Amazon Lex
    Amazon Lex is a service for building conversational interfaces using voice and text. Amazon Lex provides high-quality speech recognition and language understanding capabilities. With Amazon Lex, you can add sophisticated, natural language bots to new and existing applications. Amazon Lex reduces multi-platform development efforts, allowing you to easily publish your speech or text chatbots to […]  ( 8 min )
  • Open

    New method for comparing neural networks exposes how artificial intelligence works
    submitted by /u/keghn [link] [comments]  ( 86 min )
    Visualizing Convolutional Neural Networks - Layer by Layer
    submitted by /u/Mor1Din_ [link] [comments]  ( 87 min )
    Neural search plugin for Open/Elasticsearch
    Hey everyone, what are your thoughts on this repo? It's essentially a neural search frontend for Opensearch https://github.com/marqo-ai/marqo Seems quite a cool way to leverage the new kNN functionality painlessly. submitted by /u/everythingserverless [link] [comments]  ( 87 min )
    Marqo: A Tensor Search Framework
    I wanted to implement semantic search for a project I was developing, and I found this really useful open-source repo (https://github.com/marqo-ai/marqo). They've made it super easy to implement Google-like intelligent search capabilities but on your own dataset. It made me wonder that in today's world where everything is machine learning based, a lot of applications still use basic keyword-matching search. Let me know what you think about transitioning to intelligent search! submitted by /u/aryanagarwal09 [link] [comments]  ( 87 min )
  • Open

    The Development of AI Art Over the Years + This AI Writing Assistant Joins the List of Platforms…
    Let’s call it Ai-rt Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 9 min )
  • Open

    CCF: Bringing efficiency and usability to a decentralized trust model
    Online trust has come a long way since the time of centralized databases, where information was concentrated in one location and the security and validation of that information relied on a core set of people and systems. While convenient, this model of centralized management and oversight had a number of drawbacks. Trust depended on how […] The post CCF: Bringing efficiency and usability to a decentralized trust model appeared first on Microsoft Research.  ( 12 min )
    Microsoft Research Summit 2022: What’s Next for Technology and Humanity?
    Today, we are experiencing waves of breakthroughs in computing that are transforming just about every aspect of our lives. Artificial intelligence is changing the way we develop and create. Human language technologies are revolutionizing the workflows of healthcare professionals. Deep learning is accelerating our ability to understand and predict natural phenomena, from atomic to galactic […] The post Microsoft Research Summit 2022: What’s Next for Technology and Humanity? appeared first on Microsoft Research.  ( 7 min )
  • Open

    Reinventing the Wheel: Gatik’s Apeksha Kumavat Accelerates Autonomous Delivery for Wal-Mart and More
    As consumers expect faster, cheaper deliveries, companies are turning to AI to rethink how they move goods. Foremost among these new systems are “hub-and-spoke,” or middle-mile, operations, where companies place distribution centers closer to retail operations for quicker access to inventory. However, faster delivery is just part of the equation. These systems must also be Read article > The post Reinventing the Wheel: Gatik’s Apeksha Kumavat Accelerates Autonomous Delivery for Wal-Mart and More appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Deep Reinforcement Learning for Cryptocurrency Trading: Practical Approach to Address Backtest Overfitting. (arXiv:2209.05559v1 [q-fin.ST])
    Designing profitable and reliable trading strategies is challenging in the highly volatile cryptocurrency market. Existing works applied deep reinforcement learning methods and optimistically reported increased profits in backtesting, which may suffer from the false positive issue due to overfitting. In this paper, we propose a practical approach to address backtest overfitting for cryptocurrency trading using deep reinforcement learning. First, we formulate the detection of backtest overfitting as a hypothesis test. Then, we train the DRL agents, estimate the probability of overfitting, and reject the overfitted agents, increasing the chance of good trading performance. Finally, on 10 cryptocurrencies over a testing period from 05/01/2022 to 06/27/2022 (during which the crypto market crashed two times), we show that the less overfitted deep reinforcement learning agents have a higher Sharpe ratio than that of more over-fitted agents, an equal weight strategy, and the S&P DBM Index (market benchmark), offering confidence in possible deployment to a real market.  ( 2 min )
    Checklist Models for Improved Output Fluency in Piano Fingering Prediction. (arXiv:2209.05622v1 [cs.LG])
    In this work we present a new approach for the task of predicting fingerings for piano music. While prior neural approaches have often treated this as a sequence tagging problem with independent predictions, we put forward a checklist system, trained via reinforcement learning, that maintains a representation of recent predictions in addition to a hidden state, allowing it to learn soft constraints on output structure. We also demonstrate that by modifying input representations -- which in prior work using neural models have often taken the form of one-hot encodings over individual keys on the piano -- to encode relative position on the keyboard to the prior note instead, we can achieve much better performance. Additionally, we reassess the use of raw per-note labeling precision as an evaluation metric, noting that it does not adequately measure the fluency, i.e. human playability, of a model's output. To this end, we compare methods across several statistics which track the frequency of adjacent finger predictions that while independently reasonable would be physically challenging to perform in sequence, and implement a reinforcement learning strategy to minimize these as part of our training loss. Finally through human expert evaluation, we demonstrate significant gains in performability directly attributable to improvements with respect to these metrics.
    Fast Server Learning Rate Tuning for Coded Federated Dropout. (arXiv:2201.11036v3 [cs.LG] UPDATED)
    In cross-device Federated Learning (FL), clients with low computational power train a common\linebreak[4] machine model by exchanging parameters via updates instead of potentially private data. Federated Dropout (FD) is a technique that improves the communication efficiency of a FL session by selecting a \emph{subset} of model parameters to be updated in each training round. However, compared to standard FL, FD produces considerably lower accuracy and faces a longer convergence time. In this paper, we leverage \textit{coding theory} to enhance FD by allowing different sub-models to be used at each client. We also show that by carefully tuning the server learning rate hyper-parameter, we can achieve higher training speed while also achieving up to the same final accuracy as the no dropout case. For the EMNIST dataset, our mechanism achieves 99.6\% of the final accuracy of the no dropout case while requiring $2.43\times$ less bandwidth to achieve this level of accuracy.
    Distribution Compression in Near-linear Time. (arXiv:2111.07941v5 [stat.ML] UPDATED)
    In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$. Unfortunately, these algorithms suffer from quadratic or super-quadratic runtime in the sample size $n$. To address this deficiency, we introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm while suffering at most a factor of $4$ in error. When combined with the quadratic-time kernel halving and kernel thinning algorithms of Dwivedi and Mackey (2021), Compress++ delivers $\sqrt{n}$ points with $\mathcal{O}(\sqrt{\log n/n})$ integration error and better-than-Monte-Carlo maximum mean discrepancy in $\mathcal{O}(n \log^3 n)$ time and $\mathcal{O}( \sqrt{n} \log^2 n )$ space. Moreover, Compress++ enjoys the same near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. In our benchmarks with high-dimensional Monte Carlo samples and Markov chains targeting challenging differential equation posteriors, Compress++ matches or nearly matches the accuracy of its input algorithm in orders of magnitude less time.
    Identifying magnetic antiskyrmions while they form with convolutional neural networks. (arXiv:2205.11535v2 [cond-mat.str-el] UPDATED)
    Chiral magnets have attracted a large amount of research interest in recent years because they support a variety of topological defects, such as skyrmions and bimerons, and allow for their observation and manipulation through several techniques. They also have a wide range of applications in the field of spintronics, particularly in developing new technologies for memory storage devices. However, the vast amount of data generated in these experimental and theoretical studies requires adequate tools, among which machine learning is crucial. We use a Convolutional Neural Network (CNN) to identify the relevant features in the thermodynamical phases of chiral magnets, including (anti-)skyrmions, bimerons, and helical and ferromagnetic states. We use a flexible multi-label classification framework that can correctly classify states in which different features and phases are mixed. We then train the CNN to predict the features of the final state from snapshots of intermediate states of a lattice Monte Carlo simulation. The trained model allows identifying the different phases reliably and early in the formation process. Thus, the CNN can significantly speed up the large-scale simulations for 3D materials that have been the bottleneck for quantitative studies so far. Moreover, this approach can be applied to the identification of mixed states and emerging features in real-world images of chiral magnets.
    Functional Optimization Reinforcement Learning for Real-Time Bidding. (arXiv:2206.13939v3 [cs.AI] UPDATED)
    Real-time bidding is the new paradigm of programmatic advertising. An advertiser wants to make the intelligent choice of utilizing a \textbf{Demand-Side Platform} to improve the performance of their ad campaigns. Existing approaches are struggling to provide a satisfactory solution for bidding optimization due to stochastic bidding behavior. In this paper, we proposed a multi-agent reinforcement learning architecture for RTB with functional optimization. We designed four agents bidding environment: three Lagrange-multiplier based functional optimization agents and one baseline agent (without any attribute of functional optimization) First, numerous attributes have been assigned to each agent, including biased or unbiased win probability, Lagrange multiplier, and click-through rate. In order to evaluate the proposed RTB strategy's performance, we demonstrate the results on ten sequential simulated auction campaigns. The results show that agents with functional actions and rewards had the most significant average winning rate and winning surplus, given biased and unbiased winning information respectively. The experimental evaluations show that our approach significantly improve the campaign's efficacy and profitability.
    Optimization of the Shape of a Hydrokinetic Turbine's Draft Tube and Hub Assembly Using Design-by-Morphing with Bayesian Optimization. (arXiv:2207.11451v3 [cs.CG] UPDATED)
    Finding the optimal design of a hydrodynamic or aerodynamic surface is often impossible due to the expense of evaluating the cost functions (say, with computational fluid dynamics) needed to determine the performances of the flows that the surface controls. In addition, inherent limitations of the design space itself due to imposed geometric constraints, conventional parameterization methods, and user bias can restrict {\it all} of the designs within a chosen design space regardless of whether traditional optimization methods or newer, data-driven design algorithms with machine learning are used to search the design space. We present a 2-pronged attack to address these difficulties: we propose (1) a methodology to create the design space using morphing that we call {\it Design-by-Morphing} (DbM); and (2) an optimization algorithm to search that space that uses a novel Bayesian Optimization (BO) strategy that we call {\it Mixed variable, Multi-Objective Bayesian Optimization} (MixMOBO). We apply this shape optimization strategy to maximize the power output of a hydrokinetic turbine. Applying these two strategies in tandem, we demonstrate that we can create a novel, geometrically-unconstrained, design space of a draft tube and hub shape and then optimize them simultaneously with a {\it minimum} number of cost function calls. Our framework is versatile and can be applied to the shape optimization of a variety of fluid problems.
    A Capsule Network for Hierarchical Multi-Label Image Classification. (arXiv:2209.05723v1 [cs.CV])
    Image classification is one of the most important areas in computer vision. Hierarchical multi-label classification applies when a multi-class image classification problem is arranged into smaller ones based upon a hierarchy or taxonomy. Thus, hierarchical classification modes generally provide multiple class predictions on each instance, whereby these are expected to reflect the structure of image classes as related to one another. In this paper, we propose a multi-label capsule network (ML-CapsNet) for hierarchical classification. Our ML-CapsNet predicts multiple image classes based on a hierarchical class-label tree structure. To this end, we present a loss function that takes into account the multi-label predictions of the network. As a result, the training approach for our ML-CapsNet uses a coarse to fine paradigm while maintaining consistency with the structure in the classification levels in the label-hierarchy. We also perform experiments using widely available datasets and compare the model with alternatives elsewhere in the literature. In our experiments, our ML-CapsNet yields a margin of improvement with respect to these alternative methods.
    Borch: A Deep Universal Probabilistic Programming Language. (arXiv:2209.06168v1 [cs.AI])
    Ever since the Multilayered Perceptron was first introduced the connectionist community has struggled with the concept of uncertainty and how this could be represented in these types of models. This past decade has seen a lot of effort in trying to join the principled approach of probabilistic modeling with the scalable nature of deep neural networks. While the theoretical benefits of this consolidation are clear, there are also several important practical aspects of these endeavors; namely to force the models we create to represent, learn, and report uncertainty in every prediction that is made. Many of these efforts have been based on extending existing frameworks with additional structures. We present Borch, a scalable deep universal probabilistic programming language, built on top of PyTorch. The code is available for download and use in our repository https://gitlab.com/desupervised/borch.
    Domain Invariant Adversarial Learning. (arXiv:2104.00322v4 [cs.LG] UPDATED)
    The phenomenon of adversarial examples illustrates one of the most basic vulnerabilities of deep neural networks. Among the variety of techniques introduced to surmount this inherent weakness, adversarial training has emerged as the most effective strategy for learning robust models. Typically, this is achieved by balancing robust and natural objectives. In this work, we aim to further optimize the trade-off between robust and standard accuracy by enforcing a domain-invariant feature representation. We present a new adversarial training method, Domain Invariant Adversarial Learning (DIAL), which learns a feature representation that is both robust and domain invariant. DIAL uses a variant of Domain Adversarial Neural Network (DANN) on the natural domain and its corresponding adversarial domain. In the case where the source domain consists of natural examples and the target domain is the adversarially perturbed examples, our method learns a feature representation constrained not to discriminate between the natural and adversarial examples, and can therefore achieve a more robust representation. DIAL is a generic and modular technique that can be easily incorporated into any adversarial training method. Our experiments indicate that incorporating DIAL in the adversarial training process improves both robustness and standard accuracy.
    Data efficient reinforcement learning and adaptive optimal perimeter control of network traffic dynamics. (arXiv:2209.05726v1 [eess.SY])
    Existing data-driven and feedback traffic control strategies do not consider the heterogeneity of real-time data measurements. Besides, traditional reinforcement learning (RL) methods for traffic control usually converge slowly for lacking data efficiency. Moreover, conventional optimal perimeter control schemes require exact knowledge of the system dynamics and thus would be fragile to endogenous uncertainties. To handle these challenges, this work proposes an integral reinforcement learning (IRL) based approach to learning the macroscopic traffic dynamics for adaptive optimal perimeter control. This work makes the following primary contributions to the transportation literature: (a) A continuous-time control is developed with discrete gain updates to adapt to the discrete-time sensor data. (b) To reduce the sampling complexity and use the available data more efficiently, the experience replay (ER) technique is introduced to the IRL algorithm. (c) The proposed method relaxes the requirement on model calibration in a "model-free" manner that enables robustness against modeling uncertainty and enhances the real-time performance via a data-driven RL algorithm. (d) The convergence of the IRL-based algorithms and the stability of the controlled traffic dynamics are proven via the Lyapunov theory. The optimal control law is parameterized and then approximated by neural networks (NN), which moderates the computational complexity. Both state and input constraints are considered while no model linearization is required. Numerical examples and simulation experiments are presented to verify the effectiveness and efficiency of the proposed method.
    BR-SNIS: Bias Reduced Self-Normalized Importance Sampling. (arXiv:2207.06364v2 [stat.ML] UPDATED)
    Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights. In many applications, the target distribution is known only up to a normalization constant, in which case self-normalized IS (SNIS) can be used. While the use of self-normalization can have a positive effect on the dispersion of the estimator, it introduces bias. In this work, we propose a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without increasing the variance. This method is a wrapper in the sense that it uses the same proposal samples and importance weights as SNIS, but makes clever use of iterated sampling--importance resampling (ISIR) to form a bias-reduced version of the estimator. We furnish the proposed algorithm with rigorous theoretical results, including new bias, variance and high-probability bounds, and these are illustrated by numerical examples.
    Generate novel and robust samples from data: accessible sharing without privacy concerns. (arXiv:2209.06113v1 [cs.LG])
    Generating new samples from data sets can mitigate extra expensive operations, increased invasive procedures, and mitigate privacy issues. These novel samples that are statistically robust can be used as a temporary and intermediate replacement when privacy is a concern. This method can enable better data sharing practices without problems relating to identification issues or biases that are flaws for an adversarial attack.
    Tac2Pose: Tactile Object Pose Estimation from the First Touch. (arXiv:2204.11701v2 [cs.CV] UPDATED)
    In this paper, we present Tac2Pose, an object-specific approach to tactile pose estimation from the first touch for known objects. Given the object geometry, we learn a tailored perception model in simulation that estimates a probability distribution over possible object poses given a tactile observation. To do so, we simulate the contact shapes that a dense set of object poses would produce on the sensor. Then, given a new contact shape obtained from the sensor, we match it against the pre-computed set using an object-specific embedding learned using contrastive learning. We obtain contact shapes from the sensor with an object-agnostic calibration step that maps RGB tactile observations to binary contact shapes. This mapping, which can be reused across object and sensor instances, is the only step trained with real sensor data. This results in a perception model that localizes objects from the first real tactile observation. Importantly, it produces pose distributions and can incorporate additional pose constraints coming from other perception systems, contacts, or priors. We provide quantitative results for 20 objects. Tac2Pose provides high accuracy pose estimations from distinctive tactile observations while regressing meaningful pose distributions to account for those contact shapes that could result from different object poses. We also test Tac2Pose on object models reconstructed from a 3D scanner, to evaluate the robustness to uncertainty in the object model. Finally, we demonstrate the advantages of Tac2Pose compared with three baseline methods for tactile pose estimation: directly regressing the object pose with a neural network, matching an observed contact to a set of possible contacts using a standard classification neural network, and direct pixel comparison of an observed contact with a set of possible contacts. Website: this http URL
    The Mori-Zwanzig formulation of deep learning. (arXiv:2209.05544v1 [cs.LG])
    We develop a new formulation of deep learning based on the Mori-Zwanzig (MZ) formalism of irreversible statistical mechanics. The new formulation is built upon the well-known duality between deep neural networks and discrete stochastic dynamical systems, and it allows us to directly propagate quantities of interest (conditional expectations and probability density functions) forward and backward through the network by means of exact linear operator equations. Such new equations can be used as a starting point to develop new effective parameterizations of deep neural networks, and provide a new framework to study deep-learning via operator theoretic methods. The proposed MZ formulation of deep learning naturally introduces a new concept, i.e., the memory of the neural network, which plays a fundamental role in low-dimensional modeling and parameterization. By using the theory of contraction mappings, we develop sufficient conditions for the memory of the neural network to decay with the number of layers. This allows us to rigorously transform deep networks into shallow ones, e.g., by reducing the number of neurons per layer (using projection operators), or by reducing the total number of layers (using the decaying property of the memory operator).
    Tractable hierarchies of convex relaxations for polynomial optimization on the nonnegative orthant. (arXiv:2209.06175v1 [math.OC])
    We consider polynomial optimization problems (POP) on a semialgebraic set contained in the nonnegative orthant (every POP on a compact set can be put in this format by a simple translation of the origin). Such a POP can be converted to an equivalent POP by squaring each variable. Using even symmetry and the concept of factor width, we propose a hierarchy of semidefinite relaxations based on the extension of P\'olya's Positivstellensatz by Dickinson-Povh. As its distinguishing and crucial feature, the maximal matrix size of each resulting semidefinite relaxation can be chosen arbitrarily and in addition, we prove that the sequence of values returned by the new hierarchy converges to the optimal value of the original POP at the rate $O(\varepsilon^{-c})$ if the semialgebraic set has nonempty interior. When applied to (i) robustness certification of multi-layer neural networks and (ii) computation of positive maximal singular values, our method based on P\'olya's Positivstellensatz provides better bounds and runs several hundred times faster than the standard Moment-SOS hierarchy.
    Black-box Ownership Verification for Dataset Protection via Backdoor Watermarking. (arXiv:2209.06015v1 [cs.CR])
    Deep learning, especially deep neural networks (DNNs), has been widely and successfully adopted in many critical applications for its high effectiveness and efficiency. The rapid development of DNNs has benefited from the existence of some high-quality datasets ($e.g.$, ImageNet), which allow researchers and developers to easily verify the performance of their methods. Currently, almost all existing released datasets require that they can only be adopted for academic or educational purposes rather than commercial purposes without permission. However, there is still no good way to ensure that. In this paper, we formulate the protection of released datasets as verifying whether they are adopted for training a (suspicious) third-party model, where defenders can only query the model while having no information about its parameters and training details. Based on this formulation, we propose to embed external patterns via backdoor watermarking for the ownership verification to protect them. Our method contains two main parts, including dataset watermarking and dataset verification. Specifically, we exploit poison-only backdoor attacks ($e.g.$, BadNets) for dataset watermarking and design a hypothesis-test-guided method for dataset verification. Experiments on multiple benchmark datasets of different tasks are conducted, which verify the effectiveness of our method. The code for reproducing main experiments is available at \url{https://github.com/THUYimingLi/DVBW}.
    Test-Time Adaptation with Principal Component Analysis. (arXiv:2209.05779v1 [cs.LG])
    Machine Learning models are prone to fail when test data are different from training data, a situation often encountered in real applications known as distribution shift. While still valid, the training-time knowledge becomes less effective, requiring a test-time adaptation to maintain high performance. Following approaches that assume batch-norm layer and use their statistics for adaptation, we propose a Test-Time Adaptation with Principal Component Analysis (TTAwPCA), which presumes a fitted PCA and adapts at test time a spectral filter based on the singular values of the PCA for robustness to corruptions. TTAwPCA combines three components: the output of a given layer is decomposed using a Principal Component Analysis (PCA), filtered by a penalization of its singular values, and reconstructed with the PCA inverse transform. This generic enhancement adds fewer parameters than current methods. Experiments on CIFAR-10-C and CIFAR- 100-C demonstrate the effectiveness and limits of our method using a unique filter of 2000 parameters.
    Data Augmentation in Temporal and Polar Domains for Event-Based Learning. (arXiv:2207.11659v2 [cs.CV] UPDATED)
    Event cameras are inherently suitable for spikingneural networks (SNNS) and have great potential in challenging scenesdue to the advantages of bionics, asynchrony, high dynamic range, and no motion blur.However, novel data augmentations designed for event properties are required to process the unconventional output of these cameras in order to unlock their potential.In this paper, we explore the extraordinary influence of brightness variations due to event properties. Along the way, two novel data augmentation methods, lemph[EventInvert) and lemph(EventDrift) (EventID), are proposedto simulate two basic transformations of this influence.Specifically, EventID inverts or drifts events in the stream through transformationsin temporal and polar domains, thereby generating samples affected by brightness variances.Extensive experiments are carried out on the CIFAR10-DVS, N-Caltech101, and N-CARS datasets.It turns out that this simulation improves generalization by increasing the robustness of models against brightness variations.In addition, EventID is broadly effective, surpassing previous state-of-the-art performances.For example, the spiking neural network model with EventID achieves a state-of-the-art accuracy of 83.501% on the CIFAR10-DVS dataset.
    Online Continual Learning via the Meta-learning Update with Multi-scale Knowledge Distillation and Data Augmentation. (arXiv:2209.06107v1 [cs.LG])
    Continual learning aims to rapidly and continually learn the current task from a sequence of tasks. Compared to other kinds of methods, the methods based on experience replay have shown great advantages to overcome catastrophic forgetting. One common limitation of this method is the data imbalance between the previous and current tasks, which would further aggravate forgetting. Moreover, how to effectively address the stability-plasticity dilemma in this setting is also an urgent problem to be solved. In this paper, we overcome these challenges by proposing a novel framework called Meta-learning update via Multi-scale Knowledge Distillation and Data Augmentation (MMKDDA). Specifically, we apply multiscale knowledge distillation to grasp the evolution of long-range and short-range spatial relationships at different feature levels to alleviate the problem of data imbalance. Besides, our method mixes the samples from the episodic memory and current task in the online continual training procedure, thus alleviating the side influence due to the change of probability distribution. Moreover, we optimize our model via the meta-learning update resorting to the number of tasks seen previously, which is helpful to keep a better balance between stability and plasticity. Finally, our experimental evaluation on four benchmark datasets shows the effectiveness of the proposed MMKDDA framework against other popular baselines, and ablation studies are also conducted to further analyze the role of each component in our framework.
    Comparative analysis of segmentation and generative models for fingerprint retrieval task. (arXiv:2209.06172v1 [cs.CV])
    Biometric Authentication like Fingerprints has become an integral part of the modern technology for authentication and verification of users. It is pervasive in more ways than most of us are aware of. However, these fingerprint images deteriorate in quality if the fingers are dirty, wet, injured or when sensors malfunction. Therefore, extricating the original fingerprint by removing the noise and inpainting it to restructure the image is crucial for its authentication. Hence, this paper proposes a deep learning approach to address these issues using Generative (GAN) and Segmentation models. Qualitative and Quantitative comparison has been done between pix2pixGAN and cycleGAN (generative models) as well as U-net (segmentation model). To train the model, we created our own dataset NFD - Noisy Fingerprint Dataset meticulously with different backgrounds along with scratches in some images to make it more realistic and robust. In our research, the u-net model performed better than the GAN networks
    CovidMis20: COVID-19 Misinformation Detection System on Twitter Tweets using Deep Learning Models. (arXiv:2209.05667v1 [cs.LG])
    Online news and information sources are convenient and accessible ways to learn about current issues. For instance, more than 300 million people engage with posts on Twitter globally, which provides the possibility to disseminate misleading information. There are numerous cases where violent crimes have been committed due to fake news. This research presents the CovidMis20 dataset (COVID-19 Misinformation 2020 dataset), which consists of 1,375,592 tweets collected from February to July 2020. CovidMis20 can be automatically updated to fetch the latest news and is publicly available at: https://github.com/everythingguy/CovidMis20. This research was conducted using Bi-LSTM deep learning and an ensemble CNN+Bi-GRU for fake news detection. The results showed that, with testing accuracy of 92.23% and 90.56%, respectively, the ensemble CNN+Bi-GRU model consistently provided higher accuracy than the Bi-LSTM model.
    Model-based Reinforcement Learning with Multi-step Plan Value Estimation. (arXiv:2209.05530v1 [cs.LG])
    A promising way to improve the sample efficiency of reinforcement learning is model-based methods, in which many explorations and evaluations can happen in the learned models to save real-world samples. However, when the learned model has a non-negligible model error, sequential steps in the model are hard to be accurately evaluated, limiting the model's utilization. This paper proposes to alleviate this issue by introducing multi-step plans to replace multi-step actions for model-based RL. We employ the multi-step plan value estimation, which evaluates the expected discounted return after executing a sequence of action plans at a given state, and updates the policy by directly computing the multi-step policy gradient via plan value estimation. The new model-based reinforcement learning algorithm MPPVE (Model-based Planning Policy Learning with Multi-step Plan Value Estimation) shows a better utilization of the learned model and achieves a better sample efficiency than state-of-the-art model-based RL approaches.
    Concept Drift Monitoring and Diagnostics of Supervised Learning Models via Score Vectors. (arXiv:2012.06916v2 [stat.ML] UPDATED)
    Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive relationship between covariates $\mathbf{X}$ and a response variable $Y$ and can render trained models suboptimal or obsolete. We develop a comprehensive and computationally efficient framework for detecting, monitoring, and diagnosing concept drift. Specifically, we monitor the Fisher score vector, defined as the gradient of the log-likelihood for the fitted model, using a form of multivariate exponentially weighted moving average, which monitors for general changes in the mean of a random vector. In spite of the substantial performance advantages that we demonstrate over popular error-based methods, a score-based approach has not been previously considered for concept drift monitoring. Advantages of the proposed score-based framework include applicability to any parametric model, more powerful detection of changes as shown in theory and experiments, and inherent diagnostic capabilities for helping to identify the nature of the changes.
    MDM: Molecular Diffusion Model for 3D Molecule Generation. (arXiv:2209.05710v1 [cs.LG])
    Molecule generation, especially generating 3D molecular geometries from scratch (i.e., 3D \textit{de novo} generation), has become a fundamental task in drug designs. Existing diffusion-based 3D molecule generation methods could suffer from unsatisfactory performances, especially when generating large molecules. At the same time, the generated molecules lack enough diversity. This paper proposes a novel diffusion model to address those two challenges. First, interatomic relations are not in molecules' 3D point cloud representations. Thus, it is difficult for existing generative models to capture the potential interatomic forces and abundant local constraints. To tackle this challenge, we propose to augment the potential interatomic forces and further involve dual equivariant encoders to encode interatomic forces of different strengths. Second, existing diffusion-based models essentially shift elements in geometry along the gradient of data density. Such a process lacks enough exploration in the intermediate steps of the Langevin dynamics. To address this issue, we introduce a distributional controlling variable in each diffusion/reverse step to enforce thorough explorations and further improve generation diversity. Extensive experiments on multiple benchmarks demonstrate that the proposed model significantly outperforms existing methods for both unconditional and conditional generation tasks. We also conduct case studies to help understand the physicochemical properties of the generated molecules.
    Robin: A Novel Online Suicidal Text Corpus of Substantial Breadth and Scale. (arXiv:2209.05707v1 [cs.CL])
    Suicide is a major public health crisis. With more than 20,000,000 suicide attempts each year, the early detection of suicidal intent has the potential to save hundreds of thousands of lives. Traditional mental health screening methods are time-consuming, costly, and often inaccessible to disadvantaged populations; online detection of suicidal intent using machine learning offers a viable alternative. Here we present Robin, the largest non-keyword generated suicidal corpus to date, consisting of over 1.1 million online forum postings. In addition to its unprecedented size, Robin is specially constructed to include various categories of suicidal text, such as suicide bereavement and flippant references, better enabling models trained on Robin to learn the subtle nuances of text expressing suicidal ideation. Experimental results achieve state-of-the-art performance for the classification of suicidal text, both with traditional methods like logistic regression (F1=0.85), as well as with large-scale pre-trained language models like BERT (F1=0.92). Finally, we release the Robin dataset publicly as a machine learning resource with the potential to drive the next generation of suicidal sentiment research.
    Graph Neural Networks for Molecules. (arXiv:2209.05582v1 [cs.LG])
    Graph neural networks (GNNs), which are capable of learning representations from graphical data, are naturally suitable for modeling molecular systems. This review introduces GNNs and their various applications for small organic molecules. GNNs rely on message-passing operations, a generic yet powerful framework, to update node features iteratively. Many researches design GNN architectures to effectively learn topological information of 2D molecule graphs as well as geometric information of 3D molecular systems. GNNs have been implemented in a wide variety of molecular applications, including molecular property prediction, molecular scoring and docking, molecular optimization and de novo generation, molecular dynamics simulation, etc. Besides, the review also summarizes the recent development of self-supervised learning for molecules with GNNs.
    A Scalable Recommendation Engine for New Users and Items. (arXiv:2209.06128v1 [cs.IR])
    In many digital contexts such as online news and e-tailing with many new users and items, recommendation systems face several challenges: i) how to make initial recommendations to users with little or no response history (i.e., cold-start problem), ii) how to learn user preferences on items (test and learn), and iii) how to scale across many users and items with myriad demographics and attributes. While many recommendation systems accommodate aspects of these challenges, few if any address all. This paper introduces a Collaborative Filtering (CF) Multi-armed Bandit (B) with Attributes (A) recommendation system (CFB-A) to jointly accommodate all of these considerations. Empirical applications including an offline test on MovieLens data, synthetic data simulations, and an online grocery experiment indicate the CFB-A leads to substantial improvement on cumulative average rewards (e.g., total money or time spent, clicks, purchased quantities, average ratings, etc.) relative to the most powerful extant baseline methods.
    The Discovery of Dynamics via Linear Multistep Methods and Deep Learning: Error Estimation. (arXiv:2103.11488v2 [math.NA] UPDATED)
    Identifying hidden dynamics from observed data is a significant and challenging task in a wide range of applications. Recently, the combination of linear multistep methods (LMMs) and deep learning has been successfully employed to discover dynamics, whereas a complete convergence analysis of this approach is still under development. In this work, we consider the deep network-based LMMs for the discovery of dynamics. We put forward error estimates for these methods using the approximation property of deep networks. It indicates, for certain families of LMMs, that the $\ell^2$ grid error is bounded by the sum of $O(h^p)$ and the network approximation error, where $h$ is the time step size and $p$ is the local truncation error order. Numerical results of several physically relevant examples are provided to demonstrate our theory.
    Design Guidelines for Inclusive Speaker Verification Evaluation Datasets. (arXiv:2204.02281v2 [eess.AS] UPDATED)
    Speaker verification (SV) provides billions of voice-enabled devices with access control, and ensures the security of voice-driven technologies. As a type of biometrics, it is necessary that SV is unbiased, with consistent and reliable performance across speakers irrespective of their demographic, social and economic attributes. Current SV evaluation practices are insufficient for evaluating bias: they are over-simplified and aggregate users, not representative of real-life usage scenarios, and consequences of errors are not accounted for. This paper proposes design guidelines for constructing SV evaluation datasets that address these short-comings. We propose a schema for grading the difficulty of utterance pairs, and present an algorithm for generating inclusive SV datasets. We empirically validate our proposed method in a set of experiments on the VoxCeleb1 dataset. Our results confirm that the count of utterance pairs/speaker, and the difficulty grading of utterance pairs have a significant effect on evaluation performance and variability. Our work contributes to the development of SV evaluation practices that are inclusive and fair.
    Defense against Privacy Leakage in Federated Learning. (arXiv:2209.05724v1 [cs.LG])
    Federated Learning (FL) provides a promising distributed learning paradigm, since it seeks to protect users privacy by not sharing their private training data. Recent research has demonstrated, however, that FL is susceptible to model inversion attacks, which can reconstruct users' private data by eavesdropping on shared gradients. Existing defense solutions cannot survive stronger attacks and exhibit a poor trade-off between privacy and performance. In this paper, we present a straightforward yet effective defense strategy based on obfuscating the gradients of sensitive data with concealing data. Specifically, we alter a few samples within a mini batch to mimic the sensitive data at the gradient levels. Using a gradient projection technique, our method seeks to obscure sensitive data without sacrificing FL performance. Our extensive evaluations demonstrate that, compared to other defenses, our technique offers the highest level of protection while preserving FL performance. Our source code is located in the repository.
    Towards Understanding the Overfitting Phenomenon of Deep Click-Through Rate Prediction Models. (arXiv:2209.06053v1 [cs.IR])
    Deep learning techniques have been applied widely in industrial recommendation systems. However, far less attention has been paid to the overfitting problem of models in recommendation systems, which, on the contrary, is recognized as a critical issue for deep neural networks. In the context of Click-Through Rate (CTR) prediction, we observe an interesting one-epoch overfitting problem: the model performance exhibits a dramatic degradation at the beginning of the second epoch. Such a phenomenon has been witnessed widely in real-world applications of CTR models. Thereby, the best performance is usually achieved by training with only one epoch. To understand the underlying factors behind the one-epoch phenomenon, we conduct extensive experiments on the production data set collected from the display advertising system of Alibaba. The results show that the model structure, the optimization algorithm with a fast convergence rate, and the feature sparsity are closely related to the one-epoch phenomenon. We also provide a likely hypothesis for explaining such a phenomenon and conduct a set of proof-of-concept experiments. We hope this work can shed light on future research on training more epochs for better performance.
    Implicit Bias of Linear Equivariant Networks. (arXiv:2110.06084v3 [cs.LG] UPDATED)
    Group equivariant convolutional neural networks (G-CNNs) are generalizations of convolutional neural networks (CNNs) which excel in a wide range of technical applications by explicitly encoding symmetries, such as rotations and permutations, in their architectures. Although the success of G-CNNs is driven by their \emph{explicit} symmetry bias, a recent line of work has proposed that the \emph{implicit} bias of training algorithms on particular architectures is key to understanding generalization for overparameterized neural nets. In this context, we show that $L$-layer full-width linear G-CNNs trained via gradient descent for binary classification converge to solutions with low-rank Fourier matrix coefficients, regularized by the $2/L$-Schatten matrix norm. Our work strictly generalizes previous analysis on the implicit bias of linear CNNs to linear G-CNNs over all finite groups, including the challenging setting of non-commutative groups (such as permutations), as well as band-limited G-CNNs over infinite groups. We validate our theorems via experiments on a variety of groups, and empirically explore more realistic nonlinear networks, which locally capture similar regularization patterns. Finally, we provide intuitive interpretations of our Fourier space implicit regularization results in real space via uncertainty principles.
    Automatically Score Tissue Images Like a Pathologist by Transfer Learning. (arXiv:2209.05954v1 [cs.LG])
    Cancer is the second leading cause of death in the world. Diagnosing cancer early on can save many lives. Pathologists have to look at tissue microarray (TMA) images manually to identify tumors, which can be time-consuming, inconsistent and subjective. Existing algorithms that automatically detect tumors have either not achieved the accuracy level of a pathologist or require substantial human involvements. A major challenge is that TMA images with different shapes, sizes, and locations can have the same score. Learning staining patterns in TMA images requires a huge number of images, which are severely limited due to privacy concerns and regulations in medical organizations. TMA images from different cancer types may have common characteristics that could provide valuable information, but using them directly harms the accuracy. Transfer learning is adopted to increase the training sample size by extracting knowledge from tissue images from different cancer types. Transfer learning has made it possible for the algorithm to break the critical accuracy barrier. The proposed algorithm reports an accuracy of 75.9% on breast cancer TMA images from the Stanford Tissue Microarray Database, achieving the 75% accuracy level of pathologists. This will allow pathologists to confidently use automatic algorithms to assist them in recognizing tumors consistently with a higher accuracy in real time.
    Skip Training for Multi-Agent Reinforcement Learning Controller for Industrial Wave Energy Converters. (arXiv:2209.05656v1 [cs.LG])
    Recent Wave Energy Converters (WEC) are equipped with multiple legs and generators to maximize energy generation. Traditional controllers have shown limitations to capture complex wave patterns and the controllers must efficiently maximize the energy capture. This paper introduces a Multi-Agent Reinforcement Learning controller (MARL), which outperforms the traditionally used spring damper controller. Our initial studies show that the complex nature of problems makes it hard for training to converge. Hence, we propose a novel skip training approach which enables the MARL training to overcome performance saturation and converge to more optimum controllers compared to default MARL training, boosting power generation. We also present another novel hybrid training initialization (STHTI) approach, where the individual agents of the MARL controllers can be initially trained against the baseline Spring Damper (SD) controller individually and then be trained one agent at a time or all together in future iterations to accelerate convergence. We achieved double-digit gains in energy efficiency over the baseline Spring Damper controller with the proposed MARL controllers using the Asynchronous Advantage Actor-Critic (A3C) algorithm.
    APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning. (arXiv:2209.06119v1 [cs.LG])
    Activation Functions introduce non-linearity in the deep neural networks. This nonlinearity helps the neural networks learn faster and efficiently from the dataset. In deep learning, many activation functions are developed and used based on the type problem statement. ReLU's variants, SWISH, and MISH are goto activation functions. MISH function is considered similar or even better performance than SWISH, and much better than ReLU. In this paper, we propose an activation function named APTx which behaves similar to MISH, but requires lesser mathematical operations to compute. The lesser computational requirements of APTx does speed up the model training, and thus also reduces the hardware requirement for the deep learning model.
    Universal Online Convex Optimization with Minimax Optimal Second-Order Dynamic Regret. (arXiv:1907.00497v3 [math.OC] UPDATED)
    We introduce an online convex optimization algorithm which utilizes projected subgradient descent with optimal adaptive learning rates. Our method provides second-order minimax-optimal dynamic regret guarantee (i.e. dependent on the sum of squared subgradient norms) for a sequence of general convex functions, which may not have strong convexity, smoothness, exp-concavity or even Lipschitz-continuity. The regret guarantee is against any comparator decision sequence with bounded path variation (i.e. sum of the distances between successive decisions). We generate the lower bound of the worst-case second-order dynamic regret by incorporating actual subgradient norms. We show that this lower bound matches with our regret guarantee within a constant factor, which makes our algorithm minimax optimal. We also derive the extension for learning in each decision coordinate individually. We demonstrate how to best preserve our regret guarantee in a truly online manner, when the bound on path variation of the comparator sequence grows in time or the feedback regarding such bound arrives partially as time goes on. We further build on our algorithm to eliminate the need of any knowledge on the comparator path variation, and provide minimax optimal second-order regret guarantees with no a priori information. Our approach can compete against all comparator sequences simultaneously (universally) in a minimax optimal manner, i.e. each regret guarantee depends on the respective comparator path variation. We discuss modifications to our approach which address complexity reductions for time, computation and memory. We further improve our results by making the regret guarantees also dependent on comparator sets' diameters in addition to the respective path variations.
    Simulation and application of COVID-19 compartment model using physic-informed neural network. (arXiv:2208.02433v2 [q-bio.QM] UPDATED)
    COVID-19 pandemic has had a disruptive and irreversible impact globally, yet traditional epidemiological modeling approaches such as the susceptible-infected-recovered (SIR) model have exhibited limited effectiveness in forecasting of the up-to-date pandemic situation. In this work, susceptible-vaccinated-exposed-infected-dead-recovered (SVEIDR) model and its variants -- aged and vaccination-structured SVEIDR models -- are introduced to encode the effect of social contact for different age groups and vaccination status. Then, we implement the physics-informed neural network (PiNN) on both simulated and real-world data. The PiNN model enables robust analysis of the dynamic spread, prediction, and parameter optimization of the COVID-19 compartmental models. The models exhibit relative root mean square error (RRMSE) of <4\% for all components and provide incubation, death, and recovery rates of $\gamma= 0.0224$, $\lambda=0.0002$, and $\rho=0.0082$, respectively, for the first 310 days of the epidemic in the US. To further improve the model performance, temporally varying parameters can be included, such as vaccination, transmission, and incubation rates. Our implementation highlights PiNN as a reliable candidate approach for forecasting real-world data and can be applied to other compartmental model variants of interest.
    A Neural Network-based SAT-Resilient Obfuscation Towards Enhanced Logic Locking. (arXiv:2209.05799v1 [cs.CR])
    Logic obfuscation is introduced as a pivotal defense against multiple hardware threats on Integrated Circuits (ICs), including reverse engineering (RE) and intellectual property (IP) theft. The effectiveness of logic obfuscation is challenged by the recently introduced Boolean satisfiability (SAT) attack and its variants. A plethora of countermeasures has also been proposed to thwart the SAT attack. Irrespective of the implemented defense against SAT attacks, large power, performance, and area overheads are indispensable. In contrast, we propose a cognitive solution: a neural network-based unSAT clause translator, SATConda, that incurs a minimal area and power overhead while preserving the original functionality with impenetrable security. SATConda is incubated with an unSAT clause generator that translates the existing conjunctive normal form (CNF) through minimal perturbations such as the inclusion of pair of inverters or buffers or adding a new lightweight unSAT block depending on the provided CNF. For efficient unSAT clause generation, SATConda is equipped with a multi-layer neural network that first learns the dependencies of features (literals and clauses), followed by a long-short-term-memory (LSTM) network to validate and backpropagate the SAT-hardness for better learning and translation. Our proposed SATConda is evaluated on ISCAS85 and ISCAS89 benchmarks and is seen to defend against multiple state-of-the-art successfully SAT attacks devised for hardware RE. In addition, we also evaluate our proposed SATCondas empirical performance against MiniSAT, Lingeling and Glucose SAT solvers that form the base for numerous existing deobfuscation SAT attacks.  ( 3 min )
    User recommendation system based on MIND dataset. (arXiv:2209.06131v1 [cs.IR])
    Nowadays, it's a very significant way for researchers and other individuals to achieve their interests because it provides short solutions to satisfy their demands. Because there are so many pieces of information on the internet, news recommendation systems allow us to filter content and deliver it to the user in proportion to his desires and interests. RSs have three techniques: content-based filtering, collaborative filtering, and hybrid filtering. We will use the MIND dataset with our system, which was collected in 2019, the big challenge in this dataset because there is a lot of ambiguity and complex text processing. In this paper, will present our proposed recommendation system. The core of our system we have used the GloVe algorithm for word embeddings and representation. Besides, the Multi-head Attention Layer calculates the attention of words, to generate a list of recommended news. Finally, we achieve good results more than some other related works in AUC 71.211, MRR 35.72, nDCG@5 38.05, and nDCG@10 44.45.
    Gradient Episodic Memory for Continual Learning. (arXiv:1706.08840v6 [cs.LG] UPDATED)
    One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. These metrics characterize models not only by their test accuracy, but also in terms of their ability to transfer knowledge across tasks. Second, we propose a model for continual learning, called Gradient Episodic Memory (GEM) that alleviates forgetting, while allowing beneficial transfer of knowledge to previous tasks. Our experiments on variants of the MNIST and CIFAR-100 datasets demonstrate the strong performance of GEM when compared to the state-of-the-art.  ( 2 min )
    MedDistant19: Towards an Accurate Benchmark for Broad-Coverage Biomedical Relation Extraction. (arXiv:2204.04779v2 [cs.CL] UPDATED)
    Relation extraction in the biomedical domain is challenging due to the lack of labeled data and high annotation costs, needing domain experts. Distant supervision is commonly used to tackle the scarcity of annotated data by automatically pairing knowledge graph relationships with raw texts. Such a pipeline is prone to noise and has added challenges to scale for covering a large number of biomedical concepts. We investigated existing broad-coverage distantly supervised biomedical relation extraction benchmarks and found a significant overlap between training and test relationships ranging from 26% to 86%. Furthermore, we noticed several inconsistencies in the data construction process of these benchmarks, and where there is no train-test leakage, the focus is on interactions between narrower entity types. This work presents a more accurate benchmark MedDistant19 for broad-coverage distantly supervised biomedical relation extraction that addresses these shortcomings and is obtained by aligning the MEDLINE abstracts with the widely used SNOMED Clinical Terms knowledge base. Lacking thorough evaluation with domain-specific language models, we also conduct experiments validating general domain relation extraction findings to biomedical relation extraction.
    A Survey on Machine Learning Techniques for Source Code Analysis. (arXiv:2110.09610v2 [cs.SE] UPDATED)
    The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number of studies hinders the community from understanding the current research landscape. This paper aims to summarize the current knowledge in applied machine learning for source code analysis. We review studies belonging to twelve categories of software engineering tasks and corresponding machine learning techniques, tools, and datasets that have been applied to solve them. To do so, we conducted an extensive literature search and identified 479 primary studies published between 2011 and 2021. We summarize our observations and findings with the help of the identified studies. Our findings suggest that the use of machine learning techniques for source code analysis tasks is consistently increasing. We synthesize commonly used steps and the overall workflow for each task and summarize machine learning techniques employed. We identify a comprehensive list of available datasets and tools useable in this context. Finally, the paper discusses perceived challenges in this area, including the availability of standard datasets, reproducibility and replicability, and hardware resources.
    Exploiting Digital Surface Models for Inferring Super-Resolution for Remotely Sensed Images. (arXiv:2205.04056v2 [cs.CV] UPDATED)
    Despite the plethora of successful Super-Resolution Reconstruction (SRR) models applied to natural images, their application to remote sensing imagery tends to produce poor results. Remote sensing imagery is often more complicated than natural images and has its peculiarities such as being of lower resolution, it contains noise, and often depicting large textured surfaces. As a result, applying non-specialized SRR models on remote sensing imagery results in artifacts and poor reconstructions. To address these problems, this paper proposes an architecture inspired by previous research work, introducing a novel approach for forcing an SRR model to output realistic remote sensing images: instead of relying on feature-space similarities as a perceptual loss, the model considers pixel-level information inferred from the normalized Digital Surface Model (nDSM) of the image. This strategy allows the application of better-informed updates during the training of the model which sources from a task (elevation map inference) that is closely related to remote sensing. Nonetheless, the nDSM auxiliary information is not required during production and thus the model infers a super-resolution image without any additional data besides its low-resolution pairs. We assess our model on two remotely sensed datasets of different spatial resolutions that also contain the DSM pairs of the images: the DFC2018 dataset and the dataset containing the national Lidar fly-by of Luxembourg. Based on visual inspection, the inferred super-resolution images exhibit particularly superior quality. In particular, the results for the high-resolution DFC2018 dataset are realistic and almost indistinguishable from the ground truth images.
    Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals. (arXiv:2203.13887v4 [econ.EM] UPDATED)
    We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction. We provide further applications of our approach to estimation of dynamic discrete choice models and estimation of long-term effects with surrogates.
    Enhanced Membership Inference Attacks against Machine Learning Models. (arXiv:2111.09679v4 [cs.LG] UPDATED)
    How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, we present a comprehensive \textit{hypothesis testing framework} that enables us not only to formally express the prior work in a consistent way, but also to design new membership inference attacks that use reference models to achieve a significantly higher power (true positive rate) for any (false positive rate) error. More importantly, we explain \textit{why} different attacks perform differently. We present a template for indistinguishability games, and provide an interpretation of attack success rate across different instances of the game. We discuss various uncertainties of attackers that arise from the formulation of the problem, and show how our approach tries to minimize the attack uncertainty to the one bit secret about the presence or absence of a data point in the training set. We perform a \textit{differential analysis} between all types of attacks, explain the gap between them, and show what causes data points to be vulnerable to an attack (as the reasons vary due to different granularities of memorization, from overfitting to conditional memorization). Our auditing framework is openly accessible as part of the \textit{Privacy Meter} software tool.
    Characterizing Graph Datasets for Node Classification: Beyond Homophily-Heterophily Dichotomy. (arXiv:2209.06177v1 [cs.SI])
    Homophily is a graph property describing the tendency of edges to connect similar nodes; the opposite is called heterophily. While homophily is natural for many real-world networks, there are also networks without this property. It is often believed that standard message-passing graph neural networks (GNNs) do not perform well on non-homophilous graphs, and thus such datasets need special attention. While a lot of effort has been put into developing graph representation learning methods for heterophilous graphs, there is no universally agreed upon measure of homophily. Several metrics for measuring homophily have been used in the literature, however, we show that all of them have critical drawbacks preventing comparison of homophily levels between different datasets. We formalize desirable properties for a proper homophily measure and show how existing literature on the properties of classification performance metrics can be linked to our problem. In doing so we find a measure that we call adjusted homophily that satisfies more desirable properties than existing homophily measures. Interestingly, this measure is related to two classification performance metrics - Cohen's Kappa and Matthews correlation coefficient. Then, we go beyond the homophily-heterophily dichotomy and propose a new property that we call label informativeness (LI) that characterizes how much information a neighbor's label provides about a node's label. We theoretically show that LI is comparable across datasets with different numbers of classes and class size balance. Through a series of experiments we show that LI is a better predictor of the performance of GNNs on a dataset than homophily. We show that LI explains why GNNs can sometimes perform well on heterophilous datasets - a phenomenon recently observed in the literature.
    Bayesian Pseudo Labels: Expectation Maximization for Robust and Efficient Semi-Supervised Segmentation. (arXiv:2208.04435v3 [cs.CV] UPDATED)
    This paper concerns pseudo labelling in segmentation. Our contribution is fourfold. Firstly, we present a new formulation of pseudo-labelling as an Expectation-Maximization (EM) algorithm for clear statistical interpretation. Secondly, we propose a semi-supervised medical image segmentation method purely based on the original pseudo labelling, namely SegPL. We demonstrate SegPL is a competitive approach against state-of-the-art consistency regularisation based methods on semi-supervised segmentation on a 2D multi-class MRI brain tumour segmentation task and a 3D binary CT lung vessel segmentation task. The simplicity of SegPL allows less computational cost comparing to prior methods. Thirdly, we demonstrate that the effectiveness of SegPL may originate from its robustness against out-of-distribution noises and adversarial attacks. Lastly, under the EM framework, we introduce a probabilistic generalisation of SegPL via variational inference, which learns a dynamic threshold for pseudo labelling during the training. We show that SegPL with variational inference can perform uncertainty estimation on par with the gold-standard method Deep Ensemble.
    Learning to Solve Multiple-TSP with Time Window and Rejections via Deep Reinforcement Learning. (arXiv:2209.06094v1 [cs.LG])
    We propose a manager-worker framework based on deep reinforcement learning to tackle a hard yet nontrivial variant of Travelling Salesman Problem (TSP), \ie~multiple-vehicle TSP with time window and rejections (mTSPTWR), where customers who cannot be served before the deadline are subject to rejections. Particularly, in the proposed framework, a manager agent learns to divide mTSPTWR into sub-routing tasks by assigning customers to each vehicle via a Graph Isomorphism Network (GIN) based policy network. A worker agent learns to solve sub-routing tasks by minimizing the cost in terms of both tour length and rejection rate for each vehicle, the maximum of which is then fed back to the manager agent to learn better assignments. Experimental results demonstrate that the proposed framework outperforms strong baselines in terms of higher solution quality and shorter computation time. More importantly, the trained agents also achieve competitive performance for solving unseen larger instances.
    Nash Convergence of Mean-Based Learning Algorithms in First Price Auctions. (arXiv:2110.03906v3 [cs.GT] UPDATED)
    Understanding the convergence properties of learning dynamics in repeated auctions is a timely and important question in the area of learning in auctions, with numerous applications in, e.g., online advertising markets. This work focuses on repeated first price auctions where bidders with fixed values for the item learn to bid using mean-based algorithms -- a large class of online learning algorithms that include popular no-regret algorithms such as Multiplicative Weights Update and Follow the Perturbed Leader. We completely characterize the learning dynamics of mean-based algorithms, in terms of convergence to a Nash equilibrium of the auction, in two senses: (1) time-average: the fraction of rounds where bidders play a Nash equilibrium approaches 1 in the limit; (2)last-iterate: the mixed strategy profile of bidders approaches a Nash equilibrium in the limit. Specifically, the results depend on the number of bidders with the highest value: - If the number is at least three, the bidding dynamics almost surely converges to a Nash equilibrium of the auction, both in time-average and in last-iterate. - If the number is two, the bidding dynamics almost surely converges to a Nash equilibrium in time-average but not necessarily in last-iterate. - If the number is one, the bidding dynamics may not converge to a Nash equilibrium in time-average nor in last-iterate. Our discovery opens up new possibilities in the study of convergence dynamics of learning algorithms.
    Addressing overfitting in spectral clustering via a non-parametric bootstrap. (arXiv:2209.05812v1 [stat.ML])
    Finite mixture modelling is a popular method in the field of clustering and is beneficial largely due to its soft cluster membership probabilities. However, the most common algorithm for fitting finite mixture models, the EM algorithm, falls victim to a number of issues. We address these issues that plague clustering using finite mixture models, including convergence to solutions corresponding to local maxima and algorithm speed concerns in high dimensional cases. This is done by developing two novel algorithms that incorporate a spectral decomposition of the data matrix and a non-parametric bootstrap sampling scheme. Simulations show the validity of our algorithms and demonstrate not only their flexibility but also their ability to avoid solutions corresponding to local-maxima, when compared to other (bootstrapped) clustering algorithms for estimating finite mixture models. Our novel algorithms have a typically more consistent convergence criteria as well as a significant increase in speed over other bootstrapped algorithms that fit finite mixture models.  ( 2 min )
    Uncertainty Quantification of the 4th kind; optimal posterior accuracy-uncertainty tradeoff with the minimum enclosing ball. (arXiv:2108.10517v2 [stat.ME] UPDATED)
    There are essentially three kinds of approaches to Uncertainty Quantification (UQ): (A) robust optimization, (B) Bayesian, (C) decision theory. Although (A) is robust, it is unfavorable with respect to accuracy and data assimilation. (B) requires a prior, it is generally brittle and posterior estimations can be slow. Although (C) leads to the identification of an optimal prior, its approximation suffers from the curse of dimensionality and the notion of risk is one that is averaged with respect to the distribution of the data. We introduce a 4th kind which is a hybrid between (A), (B), (C), and hypothesis testing. It can be summarized as, after observing a sample $x$, (1) defining a likelihood region through the relative likelihood and (2) playing a minmax game in that region to define optimal estimators and their risk. The resulting method has several desirable properties (a) an optimal prior is identified after measuring the data, and the notion of risk is a posterior one, (b) the determination of the optimal estimate and its risk can be reduced to computing the minimum enclosing ball of the image of the likelihood region under the quantity of interest map (which is fast and not subject to the curse of dimensionality). The method is characterized by a parameter in $ [0,1]$ acting as an assumed lower bound on the rarity of the observed data (the relative likelihood). When that parameter is near $1$, the method produces a posterior distribution concentrated around a maximum likelihood estimate with tight but low confidence UQ estimates. When that parameter is near $0$, the method produces a maximal risk posterior distribution with high confidence UQ estimates. In addition to navigating the accuracy-uncertainty tradeoff, the proposed method addresses the brittleness of Bayesian inference by navigating the robustness-accuracy tradeoff associated with data assimilation.
    Unsupervised Selective Labeling for More Effective Semi-Supervised Learning. (arXiv:2110.03006v3 [cs.LG] UPDATED)
    Given an unlabeled dataset and an annotation budget, we study how to selectively label a fixed number of instances so that semi-supervised learning (SSL) on such a partially labeled dataset is most effective. We focus on selecting the right data to label, in addition to usual SSL's propagating labels from labeled data to the rest unlabeled data. This instance selection task is challenging, as without any labeled data we do not know what the objective of learning should be. Intuitively, no matter what the downstream task is, instances to be labeled must be representative and diverse: The former would facilitate label propagation to unlabeled data, whereas the latter would ensure coverage of the entire dataset. We capture this idea by selecting cluster prototypes, either in a pretrained feature space, or along with feature optimization, both without labels. Our unsupervised selective labeling consistently improves SSL methods over state-of-the-art active learning given labeled data, by 8 to 25 times in label efficiency. For example, it boosts FixMatch by 10% (14%) in accuracy on CIFAR-10 (ImageNet-1K) with 0.08% (0.2%) labeled data, demonstrating that small computation spent on selecting what data to label brings significant gain especially under a low annotation budget. Our work sets a new standard for practical and efficient SSL.
    Unsupervised representational learning with recognition-parametrised probabilistic models. (arXiv:2209.05661v1 [cs.LG])
    We introduce a new approach to probabilistic unsupervised learning based on the recognition-parametrised model (RPM): a normalised semi-parametric hypothesis class for joint distributions over observed and latent variables. Under the key assumption that observations are conditionally independent given the latents, RPMs directly encode the "recognition" process, parametrising both the prior distribution on the latents and their conditional distributions given observations. This recognition model is paired with non-parametric descriptions of the marginal distribution of each observed variable. Thus, the focus is on learning a good latent representation that captures dependence between the measurements. The RPM permits exact maximum likelihood learning in settings with discrete latents and a tractable prior, even when the mapping between continuous observations and the latents is expressed through a flexible model such as a neural network. We develop effective approximations for the case of continuous latent variables with tractable priors. Unlike the approximations necessary in dual-parametrised models such as Helmholtz machines and variational autoencoders, these RPM approximations introduce only minor bias, which may often vanish asymptotically. Furthermore, where the prior on latents is intractable the RPM may be combined effectively with standard probabilistic techniques such as variational Bayes. We demonstrate the model in high dimensional data settings, including a form of weakly supervised learning on MNIST digits and the discovery of latent maps from sensory observations. The RPM provides an effective way to discover, represent and reason probabilistically about the latent structure underlying observational data, functions which are critical to both animal and artificial intelligence.
    Class-Level Logit Perturbation. (arXiv:2209.05668v1 [cs.LG])
    Features, logits, and labels are the three primary data when a sample passes through a deep neural network. Feature perturbation and label perturbation receive increasing attention in recent years. They have been proven to be useful in various deep learning approaches. For example, (adversarial) feature perturbation can improve the robustness or even generalization capability of learned models. However, limited studies have explicitly explored for the perturbation of logit vectors. This work discusses several existing methods related to class-level logit perturbation. A unified viewpoint between positive/negative data augmentation and loss variations incurred by logit perturbation is established. A theoretical analysis is provided to illuminate why class-level logit perturbation is useful. Accordingly, new methodologies are proposed to explicitly learn to perturb logits for both single-label and multi-label classification tasks. Extensive experiments on benchmark image classification data sets and their long-tail versions indicated the competitive performance of our learning method. As it only perturbs on logit, it can be used as a plug-in to fuse with any existing classification algorithms. All the codes are available at https://github.com/limengyang1992/lpl.
    Distortion Audio Effects: Learning How to Recover the Clean Signal. (arXiv:2202.01664v3 [eess.AS] UPDATED)
    Given the recent advances in music source separation and automatic mixing, removing audio effects in music tracks is a meaningful step toward developing an automated remixing system. This paper focuses on removing distortion audio effects applied to guitar tracks in music production. We explore whether effect removal can be solved by neural networks designed for source separation and audio effect modeling. Our approach proves particularly effective for effects that mix the processed and clean signals. The models achieve better quality and significantly faster inference compared to state-of-the-art solutions based on sparse optimization. We demonstrate that the models are suitable not only for declipping but also for other types of distortion effects. By discussing the results, we stress the usefulness of multiple evaluation metrics to assess different aspects of reconstruction in distortion effect removal.  ( 2 min )
    SENDER: SEmi-Nonlinear Deep Efficient Reconstructor for Extraction Canonical, Meta, and Sub Functional Connectivity in the Human Brain. (arXiv:2209.05627v1 [cs.LG])
    Deep Linear and Nonlinear learning methods have already been vital machine learning methods for investigating the hierarchical features such as functional connectivity in the human brain via functional Magnetic Resonance signals; however, there are three major shortcomings: 1). For deep linear learning methods, although the identified hierarchy of functional connectivity is easily explainable, it is challenging to reveal more hierarchical functional connectivity; 2). For deep nonlinear learning methods, although non-fully connected architecture reduces the complexity of neural network structures that are easy to optimize and not vulnerable to overfitting, the functional connectivity hierarchy is difficult to explain; 3). Importantly, it is challenging for Deep Linear/Nonlinear methods to detect meta and sub-functional connectivity even in the shallow layers; 4). Like most conventional Deep Nonlinear Methods, such as Deep Neural Networks, the hyperparameters must be tuned manually, which is time-consuming. Thus, in this work, we propose a novel deep hybrid learning method named SEmi-Nonlinear Deep Efficient Reconstruction (SENDER), to overcome the aforementioned shortcomings: 1). SENDER utilizes a multiple-layer stacked structure for the linear learning methods to detect the canonical functional connectivity; 2). SENDER implements a non-fully connected architecture conducted for the nonlinear learning methods to reveal the meta-functional connectivity through shallow and deeper layers; 3). SENDER incorporates the proposed background components to extract the sub-functional connectivity; 4). SENDER adopts a novel rank reduction operator to implement the hyperparameters tuning automatically. To further validate the effectiveness, we compared SENDER with four peer methodologies using real functional Magnetic Resonance Imaging data for the human brain.
    Learning-augmented count-min sketches via Bayesian nonparametrics. (arXiv:2102.04462v3 [stat.ML] UPDATED)
    The count-min sketch (CMS) is a time and memory efficient randomized data structure that provides estimates of tokens' frequencies in a data stream of tokens, i.e. point queries, based on random hashed data. A learning-augmented version of the CMS, referred to as CMS-DP, has been proposed by Cai, Mitzenmacher and Adams (\textit{NeurIPS} 2018), and it relies on Bayesian nonparametric (BNP) modeling of the data stream of tokens via a Dirichlet process (DP) prior, with estimates of a point query being obtained as suitable mean functionals of the posterior distribution of the point query, given the hashed data. While the CMS-DP has proved to improve on some aspects of CMS, it has the major drawback of arising from a ``constructive" proof that builds upon arguments tailored to the DP prior, namely arguments that are not usable for other nonparametric priors. In this paper, we present a ``Bayesian" proof of the CMS-DP that has the main advantage of building upon arguments that are usable, in principle, within a broad class of nonparametric priors arising from normalized completely random measures. This result leads to develop a novel learning-augmented CMS under power-law data streams, referred to as CMS-PYP, which relies on BNP modeling of the data stream of tokens via a Pitman-Yor process (PYP) prior. Under this more general framework, we apply the arguments of the ``Bayesian" proof of the CMS-DP, suitably adapted to the PYP prior, in order to compute the posterior distribution of a point query, given the hashed data. Applications to synthetic data and real textual data show that the CMS-PYP outperforms the CMS and the CMS-DP in estimating low-frequency tokens, which are known to be of critical interest in textual data, and it is competitive with respect to a variation of the CMS designed for low-frequency tokens. An extension of our BNP approach to more general queries is also discussed.  ( 3 min )
    Patching Weak Convolutional Neural Network Models through Modularization and Composition. (arXiv:2209.06116v1 [cs.LG])
    Despite great success in many applications, deep neural networks are not always robust in practice. For instance, a convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily in classifying some particular classes of objects. In this work, we are concerned with patching the weak part of a CNN model instead of improving it through the costly retraining of the entire model. Inspired by the fundamental concepts of modularization and composition in software engineering, we propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules. Each module is a sub-model containing a part of the convolution kernels of the strong model. To patch a weak CNN model that performs unsatisfactorily on a target class (TC), we compose the weak CNN model with the corresponding module obtained from a strong CNN model. The ability of the weak CNN model to recognize the TC can thus be improved through patching. Moreover, the ability to recognize non-TCs is also improved, as the samples misclassified as TC could be classified as non-TCs correctly. Experimental results with two representative CNNs on three widely-used datasets show that the averaged improvement on the TC in terms of precision and recall are 12.54% and 2.14%, respectively. Moreover, patching improves the accuracy of non-TCs by 1.18%. The results demonstrate that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.
    Mathematical Framework for Online Social Media Regulation. (arXiv:2209.05550v1 [cs.LG])
    Social media platforms (SMPs) leverage algorithmic filtering (AF) as a means of selecting the content that constitutes a user's feed with the aim of maximizing their rewards. Selectively choosing the contents to be shown on the user's feed may yield a certain extent of influence, either minor or major, on the user's decision-making, compared to what it would have been under a natural/fair content selection. As we have witnessed over the past decade, algorithmic filtering can cause detrimental side effects, ranging from biasing individual decisions to shaping those of society as a whole, for example, diverting users' attention from whether to get the COVID-19 vaccine or inducing the public to choose a presidential candidate. The government's constant attempts to regulate the adverse effects of AF are often complicated, due to bureaucracy, legal affairs, and financial considerations. On the other hand SMPs seek to monitor their own algorithmic activities to avoid being fined for exceeding the allowable threshold. In this paper, we mathematically formalize this framework and utilize it to construct a data-driven statistical algorithm to regulate the AF from deflecting users' beliefs over time, along with sample and complexity guarantees. We show that our algorithm is robust against potential adversarial users. This state-of-the-art algorithm can be used either by authorities acting as external regulators or by SMPs for self-regulation.
    Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition. (arXiv:1911.01803v2 [eess.AS] UPDATED)
    End-to-end learning models using raw waveforms as input have shown superior performances in many audio recognition tasks. However, most model architectures are based on convolutional neural networks (CNN) which were mainly developed for visual recognition tasks. In this paper, we propose an extension of squeeze-and-excitation networks (SENets) which adds temporal feedback control from the top-layer features to channel-wise feature activations in lower layers using a recurrent module. This is analogous to the adaptive gain control mechanism of outer hair-cell in the human auditory system. We apply the proposed model to speech command recognition and show that it slightly outperforms the SENets and other CNN-based models. We also investigate the details of the performance improvement by conducting failure analysis and visualizing the channel-wise feature scaling induced by the temporal feedback.
    Boosting Sensitivity of Large-scale Online Experimentation via Dropout Buyer Imputation. (arXiv:2209.06125v1 [cs.LG])
    Metrics provide strong evidence to support hypotheses in online experimentation and hence reduce debates in the decision-making process. In this work, we introduce the concept of dropout buyers and categorize users with incomplete metric values into two groups: visitors and dropout buyers. For the analysis of incomplete metrics, we propose a cluster-based k-nearest neighbors-based imputation method. Our proposed imputation method considers both the experiment-specific features and users' activities along their shopping paths, allowing different imputation values for different users. To facilitate efficient imputation in large-scale data sets in online experimentation, the proposed method uses a combination of stratification and clustering. The performance of the proposed method was compared to several conventional methods in a past experiment at eBay.
    Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent. (arXiv:2106.13792v3 [cs.LG] UPDATED)
    Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the particular setup in each problem, making it difficult to generalize across different settings. To address this drawback in the literature, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that gradient descent on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.
    Investigating the Predictive Reproducibility of Federated Graph Neural Networks using Medical Datasets. (arXiv:2209.06032v1 [cs.LG])
    Graph neural networks (GNNs) have achieved extraordinary enhancements in various areas including the fields medical imaging and network neuroscience where they displayed a high accuracy in diagnosing challenging neurological disorders such as autism. In the face of medical data scarcity and high-privacy, training such data-hungry models remains challenging. Federated learning brings an efficient solution to this issue by allowing to train models on multiple datasets, collected independently by different hospitals, in fully data-preserving manner. Although both state-of-the-art GNNs and federated learning techniques focus on boosting classification accuracy, they overlook a critical unsolved problem: investigating the reproducibility of the most discriminative biomarkers (i.e., features) selected by the GNN models within a federated learning paradigm. Quantifying the reproducibility of a predictive medical model against perturbations of training and testing data distributions presents one of the biggest hurdles to overcome in developing translational clinical applications. To the best of our knowledge, this presents the first work investigating the reproducibility of federated GNN models with application to classifying medical imaging and brain connectivity datasets. We evaluated our framework using various GNN models trained on medical imaging and connectomic datasets. More importantly, we showed that federated learning boosts both the accuracy and reproducibility of GNN models in such medical learning tasks. Our source code is available at https://github.com/basiralab/reproducibleFedGNN.
    Binaural Signal Representations for Joint Sound Event Detection and Acoustic Scene Classification. (arXiv:2209.05900v1 [cs.SD])
    Sound event detection (SED) and Acoustic scene classification (ASC) are two widely researched audio tasks that constitute an important part of research on acoustic scene analysis. Considering shared information between sound events and acoustic scenes, performing both tasks jointly is a natural part of a complex machine listening system. In this paper, we investigate the usefulness of several spatial audio features in training a joint deep neural network (DNN) model performing SED and ASC. Experiments are performed for two different datasets containing binaural recordings and synchronous sound event and acoustic scene labels to analyse the differences between performing SED and ASC separately or jointly. The presented results show that the use of specific binaural features, mainly the Generalized Cross Correlation with Phase Transform (GCC-phat) and sines and cosines of phase differences, result in a better performing model in both separate and joint tasks as compared with baseline methods based on logmel energies only.
    Riemannian data-dependent randomized smoothing for neural networks certification. (arXiv:2206.10235v2 [cs.LG] UPDATED)
    Certification of neural networks is an important and challenging problem that has been attracting the attention of the machine learning community since few years. In this paper, we focus on randomized smoothing (RS) which is considered as the state-of-the-art method to obtain certifiably robust neural networks. In particular, a new data-dependent RS technique called ANCER introduced recently can be used to certify ellipses with orthogonal axis near each input data of the neural network. In this work, we remark that ANCER is not invariant under rotation of input data and propose a new rotationally-invariant formulation of it which can certify ellipses without constraints on their axis. Our approach called Riemannian Data Dependant Randomized Smoothing (RDDRS) relies on information geometry techniques on the manifold of covariance matrices and can certify bigger regions than ANCER based on our experiments on the MNIST dataset.
    Sparse deep neural networks for modeling aluminum electrolysis dynamics. (arXiv:2209.05832v1 [physics.chem-ph])
    Artificial neural networks have a broad array of applications today due to their high degree of flexibility and ability to model nonlinear functions from data. However, the trustworthiness of neural networks is limited due to their black-box nature, their poor ability to generalize from small datasets, and their inconsistent convergence during training. Aluminum electrolysis is a complex nonlinear process with many interrelated sub-processes. Artificial neural networks can potentially be well suited for modeling the aluminum electrolysis process, but the safety-critical nature of this process requires trustworthy models. In this work, sparse neural networks are trained to model the system dynamics of an aluminum electrolysis simulator. The sparse model structure has a significantly reduction in model complexity compared to a corresponding dense neural network. We argue that this makes the model more interpretable. Furthermore, the empirical study shows that the sparse models generalize better from small training sets than dense neural networks. Moreover, training an ensemble of sparse neural networks with different parameter initializations show that the models converge to similar model structures with similar learned input features.  ( 2 min )
    Self-Supervised Coordinate Projection Network for Sparse-View Computed Tomography. (arXiv:2209.05483v1 [eess.IV])
    In the present work, we propose a Self-supervised COordinate Projection nEtwork (SCOPE) to reconstruct the artifacts-free CT image from a single SV sinogram by solving the inverse tomography imaging problem. Compared with recent related works that solve similar problems using implicit neural representation network (INR), our essential contribution is an effective and simple re-projection strategy that pushes the tomography image reconstruction quality over supervised deep learning CT reconstruction works. The proposed strategy is inspired by the simple relationship between linear algebra and inverse problems. To solve the under-determined linear equation system, we first introduce INR to constrain the solution space via image continuity prior and achieve a rough solution. And secondly, we propose to generate a dense view sinogram that improves the rank of the linear equation system and produces a more stable CT image solution space. Our experiment results demonstrate that the re-projection strategy significantly improves the image reconstruction quality (+3 dB for PSNR at least). Besides, we integrate the recent hash encoding into our SCOPE model, which greatly accelerates the model training. Finally, we evaluate SCOPE in parallel and fan X-ray beam SVCT reconstruction tasks. Experimental results indicate that the proposed SCOPE model outperforms two latest INR-based methods and two well-popular supervised DL methods quantitatively and qualitatively.
    Intrusion Detection Systems Using Support Vector Machines on the KDDCUP'99 and NSL-KDD Datasets: A Comprehensive Survey. (arXiv:2209.05579v1 [cs.CR])
    With the growing rates of cyber-attacks and cyber espionage, the need for better and more powerful intrusion detection systems (IDS) is even more warranted nowadays. The basic task of an IDS is to act as the first line of defense, in detecting attacks on the internet. As intrusion tactics from intruders become more sophisticated and difficult to detect, researchers have started to apply novel Machine Learning (ML) techniques to effectively detect intruders and hence preserve internet users' information and overall trust in the entire internet network security. Over the last decade, there has been an explosion of research on intrusion detection techniques based on ML and Deep Learning (DL) architectures on various cyber security-based datasets such as the DARPA, KDDCUP'99, NSL-KDD, CAIDA, CTU-13, UNSW-NB15. In this research, we review contemporary literature and provide a comprehensive survey of different types of intrusion detection technique that applies Support Vector Machines (SVMs) algorithms as a classifier. We focus only on studies that have been evaluated on the two most widely used datasets in cybersecurity namely: the KDDCUP'99 and the NSL-KDD datasets. We provide a summary of each method, identifying the role of the SVMs classifier, and all other algorithms involved in the studies. Furthermore, we present a critical review of each method, in tabular form, highlighting the performance measures, strengths, and limitations of each of the methods surveyed.  ( 3 min )
    Driving Safety Prediction and Safe Route Mapping Using In-vehicle and Roadside Data. (arXiv:2209.05604v1 [cs.HC])
    Risk assessment of roadways is commonly practiced based on historical crash data. Information on driver behaviors and real-time traffic situations is sometimes missing. In this paper, the Safe Route Mapping (SRM) model, a methodology for developing dynamic risk heat maps of roadways, is extended to consider driver behaviors when making predictions. An Android App is designed to gather drivers' information and upload it to a server. On the server, facial recognition extracts drivers' data, such as facial landmarks, gaze directions, and emotions. The driver's drowsiness and distraction are detected, and driving performance is evaluated. Meanwhile, dynamic traffic information is captured by a roadside camera and uploaded to the same server. A longitudinal-scanline-based arterial traffic video analytics is applied to recognize vehicles from the video to build speed and trajectory profiles. Based on these data, a LightGBM model is introduced to predict conflict indices for drivers in the next one or two seconds. Then, multiple data sources, including historical crash counts and predicted traffic conflict indicators, are combined using a Fuzzy logic model to calculate risk scores for road segments. The proposed SRM model is illustrated using data collected from an actual traffic intersection and a driving simulation platform. The prediction results show that the model is accurate, and the added driver behavior features will improve the model's performance. Finally, risk heat maps are generated for visualization purposes. The authorities can use the dynamic heat map to designate safe corridors and dispatch law enforcement and drivers for early warning and trip planning.
    Document Image Binarization in JPEG Compressed Domain using Dual Discriminator Generative Adversarial Networks. (arXiv:2209.05921v1 [cs.CV])
    Image binarization techniques are being popularly used in enhancement of noisy and/or degraded images catering different Document Image Anlaysis (DIA) applications like word spotting, document retrieval, and OCR. Most of the existing techniques focus on feeding pixel images into the Convolution Neural Networks to accomplish document binarization, which may not produce effective results when working with compressed images that need to be processed without full decompression. Therefore in this research paper, the idea of document image binarization directly using JPEG compressed stream of document images is proposed by employing Dual Discriminator Generative Adversarial Networks (DD-GANs). Here the two discriminator networks - Global and Local work on different image ratios and use focal loss as generator loss. The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres. The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.
    Semantic Data Augmentation based Distance Metric Learning for Domain Generalization. (arXiv:2208.02803v3 [cs.LG] UPDATED)
    Domain generalization (DG) aims to learn a model on one or more different but related source domains that could be generalized into an unseen target domain. Existing DG methods try to prompt the diversity of source domains for the model's generalization ability, while they may have to introduce auxiliary networks or striking computational costs. On the contrary, this work applies the implicit semantic augmentation in feature space to capture the diversity of source domains. Concretely, an additional loss function of distance metric learning (DML) is included to optimize the local geometry of data distribution. Besides, the logits from cross entropy loss with infinite augmentations is adopted as input features for the DML loss in lieu of the deep features. We also provide a theoretical analysis to show that the logits can approximate the distances defined on original features well. Further, we provide an in-depth analysis of the mechanism and rational behind our approach, which gives us a better understanding of why leverage logits in lieu of features can help domain generalization. The proposed DML loss with the implicit augmentation is incorporated into a recent DG method, that is, Fourier Augmented Co-Teacher framework (FACT). Meanwhile, our method also can be easily plugged into various DG methods. Extensive experiments on three benchmarks (Digits-DG, PACS and Office-Home) have demonstrated that the proposed method is able to achieve the state-of-the-art performance.
    On clustering uncertain and structured data with Wasserstein barycenters and a geodesic criterion for the number of clusters. (arXiv:1912.11801v3 [stat.ML] UPDATED)
    In this work clustering schemes for uncertain and structured data are considered relying on the notion of Wasserstein barycenters, accompanied by appropriate clustering indices based on the intrinsic geometry of the Wasserstein space where the clustering task is performed. Such type of clustering approaches are highly appreciated in many fields where the observational/experimental error is significant (e.g. astronomy, biology, remote sensing, etc.) or the data nature is more complex and the traditional learning algorithms are not applicable or effective to treat them (e.g. network data, interval data, high frequency records, matrix data, etc.). Under this perspective, each observation is identified by an appropriate probability measure and the proposed clustering schemes rely on discrimination criteria that utilize the geometric structure of the space of probability measures through core techniques from the optimal transport theory. The advantages and capabilities of the proposed approach and the geodesic criterion performance are illustrated through a simulation study and the implementation in two real world applications: (a) clustering eurozone countries according to their observed government bond yield curves and (b) classifying the areas of a satellite image to certain land uses categories, a standard task in remote sensing.
    SHED: A Newton-type algorithm for federated learning based on incremental Hessian eigenvector sharing. (arXiv:2202.05800v2 [cs.LG] UPDATED)
    There is a growing interest in the distributed optimization framework that goes under the name of Federated Learning (FL). In particular, much attention is being turned to FL scenarios where the network is strongly heterogeneous in terms of communication resources (e.g., bandwidth) and data distribution. In these cases, communication between local machines (agents) and the central server (Master) is a main consideration. In this work, we present SHED, an original communication-constrained Newton-type (NT) algorithm designed to accelerate FL in such heterogeneous scenarios. SHED is by design robust to non i.i.d. data distributions, handles heterogeneity of agents' communication resources (CRs), only requires sporadic Hessian computations, and achieves super-linear convergence. This is possible thanks to an incremental strategy, based on eigendecomposition of the local Hessian matrices, which exploits (possibly) outdated second-order information. The proposed solution is thoroughly validated on real datasets by assessing (i) the number of communication rounds required for convergence, (ii) the overall amount of data transmitted and (iii) the number of local Hessian computations. For all these metrics, the proposed approach shows superior performance against state-of-the art techniques like GIANT and FedNL.
    Tailoring Molecules for Protein Pockets: a Transformer-based Generative Solution for Structured-based Drug Design. (arXiv:2209.06158v1 [q-bio.BM])
    Structure-based drug design is drawing growing attentions in computer-aided drug discovery. Compared with the virtual screening approach where a pre-defined library of compounds are computationally screened, de novo drug design based on the structure of a target protein can provide novel drug candidates. In this paper, we present a generative solution named TamGent (Target-aware molecule generator with Transformer) that can directly generate candidate drugs from scratch for a given target, overcoming the limits imposed by existing compound libraries. Following the Transformer framework (a state-of-the-art framework in deep learning), we design a variant of Transformer encoder to process 3D geometric information of targets and pre-train the Transformer decoder on 10 million compounds from PubChem for candidate drug generation. Systematical evaluation on candidate compounds generated for targets from DrugBank shows that both binding affinity and drugability are largely improved. TamGent outperforms previous baselines in terms of both effectiveness and efficiency. The method is further verified by generating candidate compounds for the SARS-CoV-2 main protease and the oncogenic mutant KRAS G12C. The results show that our method not only re-discovers previously verified drug molecules , but also generates novel molecules with better docking scores, expanding the compound pool and potentially leading to the discovery of novel drugs.
    Large Language Models and the Reverse Turing Test. (arXiv:2207.14382v6 [cs.CL] UPDATED)
    Large Language Models (LLMs) have been transformative. They are pre-trained foundational models that are self-supervised and can be adapted with fine tuning to a wide ranger of natural language tasks, each of which previously would have required a separate network model. This is one step closer to the extraordinary versatility of human language. GPT-3 and more recently LaMDA can carry on dialogs with humans on many topics after minimal priming with a few examples. However, there has been a wide range of reactions on whether these LLMs understand what they are saying or exhibit signs of intelligence. This high variance is exhibited in three interviews with LLMs reaching wildly different conclusions. A new possibility was uncovered that could explain this divergence. What appears to be intelligence in LLMs may in fact be a mirror that reflects the intelligence of the interviewer, a remarkable twist that could be considered a Reverse Turing Test. If so, then by studying interviews we may be learning more about the intelligence and beliefs of the interviewer than the intelligence of the LLMs. As LLMs become more capable they may transform the way we interact with machines and how they interact with each other. LLMs can talk the talk, but can they walk the walk?
    Hierarchical Conversational Preference Elicitation with Bandit Feedback. (arXiv:2209.06129v1 [cs.IR])
    The recent advances of conversational recommendations provide a promising way to efficiently elicit users' preferences via conversational interactions. To achieve this, the recommender system conducts conversations with users, asking their preferences for different items or item categories. Most existing conversational recommender systems for cold-start users utilize a multi-armed bandit framework to learn users' preference in an online manner. However, they rely on a pre-defined conversation frequency for asking about item categories instead of individual items, which may incur excessive conversational interactions that hurt user experience. To enable more flexible questioning about key-terms, we formulate a new conversational bandit problem that allows the recommender system to choose either a key-term or an item to recommend at each round and explicitly models the rewards of these actions. This motivates us to handle a new exploration-exploitation (EE) trade-off between key-term asking and item recommendation, which requires us to accurately model the relationship between key-term and item rewards. We conduct a survey and analyze a real-world dataset to find that, unlike assumptions made in prior works, key-term rewards are mainly affected by rewards of representative items. We propose two bandit algorithms, Hier-UCB and Hier-LinUCB, that leverage this observed relationship and the hierarchical structure between key-terms and items to efficiently learn which items to recommend. We theoretically prove that our algorithm can reduce the regret bound's dependency on the total number of items from previous work. We validate our proposed algorithms and regret bound on both synthetic and real-world data.
    MoDi: Unconditional Motion Synthesis from Diverse Data. (arXiv:2206.08010v2 [cs.GR] UPDATED)
    The emergence of neural networks has revolutionized the field of motion synthesis. Yet, learning to unconditionally synthesize motions from a given distribution remains a challenging task, especially when the motions are highly diverse. In this work, we present MoDi - a generative model trained in a completely unsupervised setting from an extremely diverse, unstructured and unlabeled motion dataset. During inference, MoDi can synthesize high-quality, diverse motions that lay in a well-behaved and highly semantic latent space. We show that despite the lack of any structure in the dataset, the latent space can be semantically clustered, facilitating various applications including, semantic editing, crowd simulation and motion interpolation. Our qualitative and quantitative experiments show that our framework achieves state-of-the-art synthesis quality that can follow the distribution of highly diverse motion datasets. Code and trained models are available at https://sigal-raab.github.io/MoDi.
    Analysing the Predictivity of Features to Characterise the Search Space. (arXiv:2209.06114v1 [cs.LG])
    Exploring search spaces is one of the most unpredictable challenges that has attracted the interest of researchers for decades. One way to handle unpredictability is to characterise the search spaces and take actions accordingly. A well-characterised search space can assist in mapping the problem states to a set of operators for generating new problem states. In this paper, a landscape analysis-based set of features has been analysed using the most renown machine learning approaches to determine the optimal feature set. However, in order to deal with problem complexity and induce commonality for transferring experience across domains, the selection of the most representative features remains crucial. The proposed approach analyses the predictivity of a set of features in order to determine the best categorization.
    A new Reinforcement Learning framework to discover natural flavor molecules. (arXiv:2209.05859v1 [cs.LG])
    The flavor is the focal point in the flavor industry, which follows social tendencies and behaviors. The research and development of new flavoring agents and molecules are essential in this field. On the other hand, the development of natural flavors plays a critical role in modern society. In light of this, the present work proposes a novel framework based on Scientific Machine Learning to undertake an emerging problem in flavor engineering and industry. Therefore, this work brings an innovative methodology to design new natural flavor molecules. The molecules are evaluated regarding the synthetic accessibility, the number of atoms, and the likeness to a natural or pseudo-natural product.
    DOMINO: Domain-aware Model Calibration in Medical Image Segmentation. (arXiv:2209.06077v1 [eess.IV])
    Model calibration measures the agreement between the predicted probability estimates and the true correctness likelihood. Proper model calibration is vital for high-risk applications. Unfortunately, modern deep neural networks are poorly calibrated, compromising trustworthiness and reliability. Medical image segmentation particularly suffers from this due to the natural uncertainty of tissue boundaries. This is exasperated by their loss functions, which favor overconfidence in the majority classes. We address these challenges with DOMINO, a domain-aware model calibration method that leverages the semantic confusability and hierarchical similarity between class labels. Our experiments demonstrate that our DOMINO-calibrated deep neural networks outperform non-calibrated models and state-of-the-art morphometric methods in head image segmentation. Our results show that our method can consistently achieve better calibration, higher accuracy, and faster inference times than these methods, especially on rarer classes. This performance is attributed to our domain-aware regularization to inform semantic model calibration. These findings show the importance of semantic ties between class labels in building confidence in deep learning models. The framework has the potential to improve the trustworthiness and reliability of generic medical image segmentation models. The code for this article is available at: https://github.com/lab-smile/DOMINO.
    Information Compression and Performance Evaluation of Tic-Tac-Toe's Evaluation Function Using Singular Value Decomposition. (arXiv:2207.02449v3 [cs.LG] UPDATED)
    We approximated the evaluation function for the game Tic-Tac-Toe by singular value decomposition (SVD) and investigated the effect of approximation accuracy on winning rate. We first prepared the perfect evaluation function of Tic-Tac-Toe and performed low-rank approximation by considering the evaluation function as a ninth-order tensor. We found that we can reduce the amount of information of the evaluation function by 70% without significantly degrading the performance. Approximation accuracy and winning rate were strongly correlated but not perfectly proportional. We also investigated how the decomposition method of the evaluation function affects the performance. We considered two decomposition methods: simple SVD regarding the evaluation function as a matrix and the Tucker decomposition by higher-order SVD (HOSVD). At the same compression ratio, the strategy with the approximated evaluation function obtained by HOSVD exhibited a significantly higher winning rate than that obtained by SVD. These results suggest that SVD can effectively compress board game strategies and an optimal compression method that depends on the game exists.
    Cut-and-Paste Object Insertion by Enabling Deep Image Prior for Reshading. (arXiv:2010.05907v2 [cs.CV] UPDATED)
    We show how to insert an object from one image to another and get realistic results in the hard case, where the shading of the inserted object clashes with the shading of the scene. Rendering objects using an illumination model of the scene doesn't work, because doing so requires a geometric and material model of the object, which is hard to recover from a single image. In this paper, we introduce a method that corrects shading inconsistencies of the inserted object without requiring a geometric and physical model or an environment map. Our method uses a deep image prior (DIP), trained to produce reshaded renderings of inserted objects via consistent image decomposition inferential losses. The resulting image from DIP aims to have (a) an albedo similar to the cut-and-paste albedo, (b) a similar shading field to that of the target scene, and (c) a shading that is consistent with the cut-and-paste surface normals. The result is a simple procedure that produces convincing shading of the inserted object. We show the efficacy of our method both qualitatively and quantitatively for several objects with complex surface properties and also on a dataset of spherical lampshades for quantitative evaluation. Our method significantly outperforms an Image Harmonization (IH) baseline for all these objects. They also outperform the cut-and-paste and IH baselines in a user study with over 100 users.
    Directed mixed membership stochastic blockmodel. (arXiv:2101.02307v3 [stat.ML] UPDATED)
    Mixed membership problem for undirected network has been well studied in network analysis recent years. However, the more general case of mixed membership for directed network in which nodes can belong to multiple communities remains a challenge. Here, we propose an interpretable and identifiable model: directed mixed membership stochastic blockmodel (DiMMSB) for directed mixed membership networks. DiMMSB allows that row nodes and column nodes of the adjacency matrix can be different and these nodes may have distinct community structure in a directed network. We also develop an efficient spectral algorithm called DiSP designed based on simplex structures inherent in the left and right singular vectors of the population adjacency matrix to estimate the mixed memberships for both row nodes and column nodes in a directed network. We show that DiSP is asymptotically consistent under mild conditions by providing error bounds for the inferred membership vectors of each row node and each column node using delicate spectral analysis. Numerical results on computer-generated directed mixed membership networks support our theoretical findings and show that our DiSP outperforms its competitor in both error rates and run-time. Applications of DiSP to real-world directed networks demonstrate the advantages of DiSP in studying the asymmetric structure of directed networks.
    SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias. (arXiv:2209.06054v1 [cs.SD])
    Real-time music accompaniment generation has a wide range of applications in the music industry, such as music education and live performances. However, automatic real-time music accompaniment generation is still understudied and often faces a trade-off between logical latency and exposure bias. In this paper, we propose SongDriver, a real-time music accompaniment generation system without logical latency nor exposure bias. Specifically, SongDriver divides one accompaniment generation task into two phases: 1) The arrangement phase, where a Transformer model first arranges chords for input melodies in real-time, and caches the chords for the next phase instead of playing them out. 2) The prediction phase, where a CRF model generates playable multi-track accompaniments for the coming melodies based on previously cached chords. With this two-phase strategy, SongDriver directly generates the accompaniment for the upcoming melody, achieving zero logical latency. Furthermore, when predicting chords for a timestep, SongDriver refers to the cached chords from the first phase rather than its previous predictions, which avoids the exposure bias problem. Since the input length is often constrained under real-time conditions, another potential problem is the loss of long-term sequential information. To make up for this disadvantage, we extract four musical features from a long-term music piece before the current time step as global information. In the experiment, we train SongDriver on some open-source datasets and an original \`aiSong Dataset built from Chinese-style modern pop music scores. The results show that SongDriver outperforms existing SOTA (state-of-the-art) models on both objective and subjective metrics, meanwhile significantly reducing the physical latency.
    Normalizing Flows for Interventional Density Estimation. (arXiv:2209.06203v1 [cs.LG])
    Existing machine learning methods for causal inference usually estimate quantities expressed via the mean of potential outcomes (e.g., average treatment effect). However, such quantities do not capture the full information about the distribution of potential outcomes. In this work, we estimate the density of potential outcomes after interventions from observational data. Specifically, we propose a novel, fully-parametric deep learning method for this purpose, called Interventional Normalizing Flows. Our Interventional Normalizing Flows offer a properly normalized density estimator. For this, we introduce an iterative training of two normalizing flows, namely (i) a teacher flow for estimation of nuisance parameters and (ii) a student flow for parametric estimation of the density of potential outcomes. For efficient and doubly-robust estimation of the student flow parameters, we develop a custom tractable optimization objective based on a one-step bias correction. Across various experiments, we demonstrate that our Interventional Normalizing Flows are expressive and highly effective, and scale well with both sample size and high-dimensional confounding. To the best of our knowledge, our Interventional Normalizing Flows are the first fully-parametric, deep learning method for density estimation of potential outcomes.
    Visualizing Image Content to Explain Novel Image Discovery. (arXiv:1908.05006v2 [cs.LG] UPDATED)
    The initial analysis of any large data set can be divided into two phases: (1) the identification of common trends or patterns and (2) the identification of anomalies or outliers that deviate from those trends. We focus on the goal of detecting observations with novel content, which can alert us to artifacts in the data set or, potentially, the discovery of previously unknown phenomena. To aid in interpreting and diagnosing the novel aspect of these selected observations, we recommend the use of novelty detection methods that generate explanations. In the context of large image data sets, these explanations should highlight what aspect of a given image is new (color, shape, texture, content) in a human-comprehensible form. We propose DEMUD-VIS, the first method for providing visual explanations of novel image content by employing a convolutional neural network (CNN) to extract image features, a method that uses reconstruction error to detect novel content, and an up-convolutional network to convert CNN feature representations back into image space. We demonstrate this approach on diverse images from ImageNet, freshwater streams, and the surface of Mars.
    FedHAP: Fast Federated Learning for LEO Constellations using Collaborative HAPs. (arXiv:2205.07216v3 [cs.LG] UPDATED)
    Low Earth Orbit (LEO) satellite constellations have seen a surge in deployment over the past few years by virtue of their ability to provide broadband Internet access as well as to collect vast amounts of Earth observational data that can be utilised to develop AI on a global scale. As traditional machine learning (ML) approaches which train a model by downloading satellite data to a ground station (GS) is not practical, Federated Learning (FL) offers a potential solution. However, existing FL approaches cannot be readily used because of excessively prolonged training time and unreliable satellite-GS communication channels. In this paper, we propose FedHAP by introducing high-altitude platforms (HAPs) as distributed parameter servers (PSs) into FL for Satcom (or more concretely LEO constellations), to achieve fast and efficient model training. FedHAP consists of three components: 1) a layered communication topology, 2) a model propagation algorithm, and 3) a model aggregation algorithm. Our extensive simulations demonstrate that FedHAP significantly accelerates FL model convergence as compared to state-of-the-art baselines, cutting the training time from several days down to a few hours yet achieving higher accuracy.
    Challenges and Pitfalls of Bayesian Unlearning. (arXiv:2207.03227v2 [cs.LG] UPDATED)
    Machine unlearning refers to the task of removing a subset of training data, thereby removing its contributions to a trained model. Approximate unlearning are one class of methods for this task which avoid the need to retrain the model from scratch on the retained data. Bayes' rule can be used to cast approximate unlearning as an inference problem where the objective is to obtain the updated posterior by dividing out the likelihood of deleted data. However this has its own set of challenges as one often doesn't have access to the exact posterior of the model parameters. In this work we examine the use of the Laplace approximation and Variational Inference to obtain the updated posterior. With a neural network trained for a regression task as the guiding example, we draw insights on the applicability of Bayesian unlearning in practical scenarios.
    BayesLDM: A Domain-Specific Language for Probabilistic Modeling of Longitudinal Data. (arXiv:2209.05581v1 [cs.LG])
    In this paper we present BayesLDM, a system for Bayesian longitudinal data modeling consisting of a high-level modeling language with specific features for modeling complex multivariate time series data coupled with a compiler that can produce optimized probabilistic program code for performing inference in the specified model. BayesLDM supports modeling of Bayesian network models with a specific focus on the efficient, declarative specification of dynamic Bayesian Networks (DBNs). The BayesLDM compiler combines a model specification with inspection of available data and outputs code for performing Bayesian inference for unknown model parameters while simultaneously handling missing data. These capabilities have the potential to significantly accelerate iterative modeling workflows in domains that involve the analysis of complex longitudinal data by abstracting away the process of producing computationally efficient probabilistic inference code. We describe the BayesLDM system components, evaluate the efficiency of representation and inference optimizations and provide an illustrative example of the application of the system to analyzing heterogeneous and partially observed mobile health data.
    Understanding Time Variations of DNN Inference in Autonomous Driving. (arXiv:2209.05487v1 [cs.LG])
    Deep neural networks (DNNs) are widely used in autonomous driving due to their high accuracy for perception, decision, and control. In safety-critical systems like autonomous driving, executing tasks like sensing and perception in real-time is vital to the vehicle's safety, which requires the application's execution time to be predictable. However, non-negligible time variations are observed in DNN inference. Current DNN inference studies either ignore the time variation issue or rely on the scheduler to handle it. None of the current work explains the root causes of DNN inference time variations. Understanding the time variations of the DNN inference becomes a fundamental challenge in real-time scheduling for autonomous driving. In this work, we analyze the time variation in DNN inference in fine granularity from six perspectives: data, I/O, model, runtime, hardware, and end-to-end perception system. Six insights are derived in understanding the time variations for DNN inference.
    General Greedy De-bias Learning. (arXiv:2112.10572v4 [cs.LG] UPDATED)
    Neural networks often make predictions relying on the spurious correlations from the datasets rather than the intrinsic properties of the task of interest, facing sharp degradation on out-of-distribution (OOD) test data. Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios. Others implicitly identify the dataset bias by special design low capability biased models or losses, but they degrade when the training and testing data are from the same distribution. In this paper, we propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model. The base model is encouraged to focus on examples that are hard to solve with biased models, thus remaining robust against spurious correlations in the test stage. GGD largely improves models' OOD generalization ability on various tasks, but sometimes over-estimates the bias level and degrades on the in-distribution test. We further re-analyze the ensemble process of GGD and introduce the Curriculum Regularization inspired by curriculum learning, which achieves a good trade-off between in-distribution and out-of-distribution performance. Extensive experiments on image classification, adversarial question answering, and visual question answering demonstrate the effectiveness of our method. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
    On topological data analysis for SHM; an introduction to persistent homology. (arXiv:2209.06155v1 [math.AT])
    This paper aims to discuss a method of quantifying the 'shape' of data, via a methodology called topological data analysis. The main tool within topological data analysis is persistent homology; this is a means of measuring the shape of data, from the homology of a simplicial complex, calculated over a range of values. The required background theory and a method of computing persistent homology is presented here, with applications specific to structural health monitoring. These results allow for topological inference and the ability to deduce features in higher-dimensional data, that might otherwise be overlooked. A simplicial complex is constructed for data for a given distance parameter. This complex encodes information about the local proximity of data points. A singular homology value can be calculated from this simplicial complex. Extending this idea, the distance parameter is given for a range of values, and the homology is calculated over this range. The persistent homology is a representation of how the homological features of the data persist over this interval. The result is characteristic to the data. A method that allows for the comparison of the persistent homology for different data sets is also discussed.
    Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models. (arXiv:2203.01104v2 [cs.CL] UPDATED)
    Recently, Mixture-of-Experts (short as MoE) architecture has achieved remarkable success in increasing the model capacity of large-scale language models. However, MoE requires incorporating significantly more parameters than the base model being extended. In this paper, we propose building a parameter-efficient MoE architecture by sharing information across experts. We adopt the matrix product operator (MPO, a tensor decomposition from quantum many-body physics) to reconstruct the parameter matrix in the expert layer and increase model capacity for pre-trained language models by sharing parameters of the central tensor (containing the core information) among different experts while enabling the specificity through the auxiliary tensors (complementing the central tensor) of different experts. To address the unbalanced optimization issue, we further design the gradient mask strategy for the MPO-based MoE architecture. Extensive experiments based on T5 and GPT-2 show improved performance and efficiency of the pre-trained language model (27.2x reduction in total parameters for the superior model performance, compared with the Switch Transformers). Our code is publicly available at \url{https://github.com/RUCAIBox/MPO/MPOE}.
    Calibrated Forecasts: The Minimax Proof. (arXiv:2209.05863v1 [econ.TH])
    A formal write-up of the simple proof (1995) of the existence of calibrated forecasts by the minimax theorem, which moreover shows that N^3 periods suffice to guarantee a 1/N calibration error.
    A Meta-level Analysis of Online Anomaly Detectors. (arXiv:2209.05899v1 [cs.LG])
    Real-time detection of anomalies in streaming data is receiving increasing attention as it allows us to raise alerts, predict faults, and detect intrusions or threats across industries. Yet, little attention has been given to compare the effectiveness and efficiency of anomaly detectors for streaming data (i.e., of online algorithms). In this paper, we present a qualitative, synthetic overview of major online detectors from different algorithmic families (i.e., distance, density, tree or projection-based) and highlight their main ideas for constructing, updating and testing detection models. Then, we provide a thorough analysis of the results of a quantitative experimental evaluation of online detection algorithms along with their offline counterparts. The behavior of the detectors is correlated with the characteristics of different datasets (i.e., meta-features), thereby providing a meta-level analysis of their performance. Our study addresses several missing insights from the literature such as (a) how reliable are detectors against a random classifier and what dataset characteristics make them perform randomly; (b) to what extent online detectors approximate the performance of offline counterparts; (c) which sketch strategy and update primitives of detectors are best to detect anomalies visible only within a feature subspace of a dataset; (d) what are the tradeoffs between the effectiveness and the efficiency of detectors belonging to different algorithmic families; (e) which specific characteristics of datasets yield an online algorithm to outperform all others.
    Application of the Multi-label Residual Convolutional Neural Network text classifier using Content-Based Routing process. (arXiv:2110.15801v2 [cs.CL] UPDATED)
    In this article, we will present an NLP application in text classifying process using the content-based router. The ultimate goal throughout this article is to predict the event described by a legal ad from the plain text of the ad. This problem is purely a supervised problem that will involve the use of NLP techniques and conventional modeling methodologies through the use of the Multi-label Residual Convolutional Neural Network for text classification. We will explain the approach put in place to solve the problem of classified ads, the difficulties encountered and the experimental results.
    Predicting Brain Multigraph Population From a Single Graph Template for Boosting One-Shot Classification. (arXiv:2209.06005v1 [q-bio.NC])
    A central challenge in training one-shot learning models is the limited representativeness of the available shots of the data space. Particularly in the field of network neuroscience where the brain is represented as a graph, such models may lead to low performance when classifying brain states (e.g., typical vs. autistic). To cope with this, most of the existing works involve a data augmentation step to increase the size of the training set, its diversity and representativeness. Though effective, such augmentation methods are limited to generating samples with the same size as the input shots (e.g., generating brain connectivity matrices from a single shot matrix). To the best of our knowledge, the problem of generating brain multigraphs capturing multiple types of connectivity between pairs of nodes (i.e., anatomical regions) from a single brain graph remains unsolved. In this paper, we unprecedentedly propose a hybrid graph neural network (GNN) architecture, namely Multigraph Generator Network or briefly MultigraphGNet, comprising two subnetworks: (1) a many-to-one GNN which integrates an input population of brain multigraphs into a single template graph, namely a connectional brain temple (CBT), and (2) a reverse one-to-many U-Net network which takes the learned CBT in each training step and outputs the reconstructed input multigraph population. Both networks are trained in an end-to-end way using a cyclic loss. Experimental results demonstrate that our MultigraphGNet boosts the performance of an independent classifier when trained on the augmented brain multigraphs in comparison with training on a single CBT from each class. We hope that our framework can shed some light on the future research of multigraph augmentation from a single graph. Our MultigraphGNet source code is available at https://github.com/basiralab/MultigraphGNet.
    Federated Meta-Learning for Traffic Steering in O-RAN. (arXiv:2209.05874v1 [cs.NI])
    The vision of 5G lies in providing high data rates, low latency (for the aim of near-real-time applications), significantly increased base station capacity, and near-perfect quality of service (QoS) for users, compared to LTE networks. In order to provide such services, 5G systems will support various combinations of access technologies such as LTE, NR, NR-U and Wi-Fi. Each radio access technology (RAT) provides different types of access, and these should be allocated and managed optimally among the users. Besides resource management, 5G systems will also support a dual connectivity service. The orchestration of the network therefore becomes a more difficult problem for system managers with respect to legacy access technologies. In this paper, we propose an algorithm for RAT allocation based on federated meta-learning (FML), which enables RAN intelligent controllers (RICs) to adapt more quickly to dynamically changing environments. We have designed a simulation environment which contains LTE and 5G NR service technologies. In the simulation, our objective is to fulfil UE demands within the deadline of transmission to provide higher QoS values. We compared our proposed algorithm with a single RL agent, the Reptile algorithm and a rule-based heuristic method. Simulation results show that the proposed FML method achieves higher caching rates at first deployment round 21% and 12% respectively. Moreover, proposed approach adapts to new tasks and environments most quickly amongst the compared methods.
    Meta-learning Causal Discovery. (arXiv:2209.05598v1 [cs.LG])
    Causal discovery (CD) from time-varying data is important in neuroscience, medicine, and machine learning. Techniques for CD include randomized experiments which are generally unbiased but expensive. It also includes algorithms like regression, matching, and Granger causality, which are only correct under strong assumptions made by human designers. However, as we found in other areas of machine learning, humans are usually not quite right and are usually outperformed by data-driven approaches. Here we test if we can improve causal discovery in a data-driven way. We take a system with a large number of causal components (transistors), the MOS 6502 processor, and meta-learn the causal discovery procedure represented as a neural network. We find that this procedure far outperforms human-designed causal discovery procedures, such as Mutual Information and Granger Causality. We argue that the causality field should consider, where possible, a supervised approach, where CD procedures are learned from large datasets with known causal relations instead of being designed by a human specialist. Our findings promise a new approach toward CD in neural and medical data and for the broader machine learning community.
    Deep Learning Training on Multi-Instance GPUs. (arXiv:2209.06018v1 [cs.LG])
    Deep learning training is an expensive process that extensively uses GPUs, but not all model training saturates the modern powerful GPUs. Multi-Instance GPU (MIG) is a new technology introduced by NVIDIA that can partition a GPU to better fit workloads that don't require all the memory and compute resources of a full GPU. In this paper, we examine the performance of a MIG-enabled A100 GPU under deep learning workloads of three sizes focusing on image recognition training with ResNet models. We investigate the behavior of these workloads when running in isolation on a variety of MIG instances allowed by the GPU in addition to running them in parallel on homogeneous instances co-located on the same GPU. Our results demonstrate that employing MIG can significantly improve the utilization of the GPU when the workload is too small to utilize the whole GPU in isolation. By training multiple small models in parallel, more work can be performed by the GPU per unit of time, despite the increase in time-per-epoch, leading to $\sim$3 times the throughput. In contrast, for medium and large-sized workloads, which already utilize the whole GPU well on their own, MIG only provides marginal performance improvements. Nevertheless, we observe that training models in parallel using separate MIG partitions does not exhibit interference underlining the value of having a functionality like MIG on modern GPUs.
    Adversarial Coreset Selection for Efficient Robust Training. (arXiv:2209.05785v1 [cs.LG])
    Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches to training robust models against such attacks. Unfortunately, this method is much slower than vanilla training of neural networks since it needs to construct adversarial examples for the entire training data at every iteration. By leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a principled approach to reducing the time complexity of robust training. To this end, we first provide convergence guarantees for adversarial coreset selection. In particular, we show that the convergence bound is directly related to how well our coresets can approximate the gradient computed over the entire training data. Motivated by our theoretical analysis, we propose using this gradient approximation error as our adversarial coreset selection objective to reduce the training set size effectively. Once built, we run adversarial training over this subset of the training data. Unlike existing methods, our approach can be adapted to a wide variety of training objectives, including TRADES, $\ell_p$-PGD, and Perceptual Adversarial Training. We conduct extensive experiments to demonstrate that our approach speeds up adversarial training by 2-3 times while experiencing a slight degradation in the clean and robust accuracy.
    Pre-training Transformers on Indian Legal Text. (arXiv:2209.06049v1 [cs.CL])
    Natural Language Processing in the legal domain been benefited hugely by the emergence of Transformer-based Pre-trained Language Models (PLMs) pre-trained on legal text. There exist PLMs trained over European and US legal text, most notably LegalBERT. However, with the rapidly increasing volume of NLP applications on Indian legal documents, and the distinguishing characteristics of Indian legal text, it has become necessary to pre-train LMs over Indian legal text as well. In this work, we introduce transformer-based PLMs pre-trained over a large corpus of Indian legal documents. We also apply these PLMs over several benchmark legal NLP tasks over Indian legal documents, namely, Legal Statute Identification from facts, Semantic segmentation of court judgements, and Court Judgement Prediction. Our experiments demonstrate the utility of the India-specific PLMs developed in this work.
    A Distributed Acoustic Sensor System for Intelligent Transportation using Deep Learning. (arXiv:2209.05978v1 [cs.LG])
    Intelligent transport systems (ITS) are pivotal in the development of sustainable and green urban living. ITS is data-driven and enabled by the profusion of sensors ranging from pneumatic tubes to smart cameras. This work explores a novel data source based on optical fibre-based distributed acoustic sensors (DAS) for traffic analysis. Detecting the type of vehicle and estimating the occupancy of vehicles are prime concerns in ITS. The first is motivated by the need for tracking, controlling, and forecasting traffic flow. The second targets the regulation of high occupancy vehicle lanes in an attempt to reduce emissions and congestion. These tasks are often conducted by individuals inspecting vehicles or through the use of emerging computer vision technologies. The former is not scale-able nor efficient whereas the latter is intrusive to passengers' privacy. To this end, we propose a deep learning technique to analyse DAS signals to address this challenge through continuous sensing and without exposing personal information. We propose a deep learning method for processing DAS signals and achieve 92% vehicle classification accuracy and 92-97% in occupancy detection based on DAS data collected under controlled conditions.
    4G 5G Cell-level Multi-indicator Forecasting based on Dense-MLP. (arXiv:2209.05989v1 [cs.NI])
    With the development of 4G/5G, the rapid growth of traffic has caused a large number of cell indicators to exceed the warning threshold, and network quality has deteriorated. It is necessary for operators to solve the congestion in advance and effectively to guarantee the quality of user experience. Cell-level multi-indicator forecasting is the foundation task for proactive complex network optimization. In this paper, we propose the 4G/5G Cell-level multi-indicator forecasting method based on the dense-Multi-Layer Perceptron (MLP) neural network, which adds additional fully-connected layers between non-adjacent layers in an MLP network. The model forecasted the following week's traffic indicators of 13000 cells according to the six-month historical indicators of 65000 cells in the 4G&5G network, which got the highest weighted MAPE score (0.2484) in the China Mobile problem statement in the ITU-T AI/ML in 5G Challenge 2021. Furthermore, the proposed model has been integrated into the AsiaInfo 4G/5G energy-saving system and deployed in Jiangsu Province of China.
    GIFT: Graph-guIded Feature Transfer for Cold-Start Video Click-Through Rate Prediction. (arXiv:2202.11525v2 [cs.IR] UPDATED)
    Short video has witnessed rapid growth in the past few years in e-commerce platforms like Taobao. To ensure the freshness of the content, platforms need to release a large number of new videos every day, making conventional click-through rate (CTR) prediction methods suffer from the item cold-start problem. In this paper, we propose GIFT, an efficient Graph-guIded Feature Transfer system, to fully take advantages of the rich information of warmed-up videos to compensate for the cold-start ones. Specifically, we establish a heterogeneous graph that contains physical and semantic linkages to guide the feature transfer process from warmed-up video to cold-start videos. The physical linkages represent explicit relationships, while the semantic linkages measure the proximity of multi-modal representations of two videos. We elaborately design the feature transfer function to make aware of different types of transferred features (e.g., id representations and historical statistics) from different metapaths on the graph. We conduct extensive experiments on a large real-world dataset, and the results show that our GIFT system outperforms SOTA methods significantly and brings a 6.82% lift on CTR in the homepage of Taobao App.
    Weight-based Channel-model Matrix Framework: a reasonable solution for EEG-based cross-dataset emotion recognition. (arXiv:2209.05849v1 [eess.SP])
    Cross-dataset emotion recognition as an extremely challenging task in the field of EEG-based affective computing is influenced by many factors, which make the universal models yield unsatisfactory results. Facing the situation that lack of EEG information decoding researches, we first analyzed the impact of different EEG information(individual, session, emotion, trial) to emotion recognition by sample space visualization, sample aggregation phenomenon quantification, and energy pattern analysis on five public datasets. And based on these phenomena and patterns, we provided the processing methods and interpretable work of various EEG differences. Through the analysis of emotional feature distribution patterns, Individual Emotional Feature Distribution Difference(IEFDD) was found. After analyzing the limitations of traditional modeling approach suffering from IEFDD, we proposed the Weight-based Channel-model Matrix Framework(WCMF). In order to characterize emotional feature distribution patterns reasonably, four weight extraction methods were designed, and the optimal of them is Correction T-test(CT) weight extraction method. Finally, the performance of WCMF was validated on cross-dataset tasks in two kinds of experiments that simulated different practical scenarios, the results showed WCMF had more stable and better emotion recognition ability.
    A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language. (arXiv:2209.05481v1 [cs.LG])
    Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality. Since the hierarchy of molecular knowledge is profound, even humans learn from different modalities including both intuitive diagrams and professional texts to assist their understanding. Inspired by this, we propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data (crawled from published Scientific Citation Index papers) via contrastive learning. This AI model represents a critical attempt that directly bridges molecular graphs and natural language. Importantly, through capturing the specific and complementary information of the two modalities, our proposed model can better grasp molecular expertise. Experimental results show that our model not only exhibits promising performance in cross-modal tasks such as cross-modal retrieval and molecule caption, but also enhances molecular property prediction and possesses capability to generate meaningful molecular graphs from natural language descriptions. We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine, among others.
    Variational Causal Inference. (arXiv:2209.05935v1 [stat.ML])
    Estimating an individual's potential outcomes under counterfactual treatments is a challenging task for traditional causal inference and supervised learning approaches when the outcome is high-dimensional (e.g. gene expressions, impulse responses, human faces) and covariates are relatively limited. In this case, to construct one's outcome under a counterfactual treatment, it is crucial to leverage individual information contained in its observed factual outcome on top of the covariates. We propose a deep variational Bayesian framework that rigorously integrates two main sources of information for outcome construction under a counterfactual treatment: one source is the individual features embedded in the high-dimensional factual outcome; the other source is the response distribution of similar subjects (subjects with the same covariates) that factually received this treatment of interest.
    Learning to Visually Navigate in Photorealistic Environments Without any Supervision. (arXiv:2004.04954v1 [cs.CV] CROSS LISTED)
    Learning to navigate in a realistic setting where an agent must rely solely on visual inputs is a challenging task, in part because the lack of position information makes it difficult to provide supervision during training. In this paper, we introduce a novel approach for learning to navigate from image inputs without external supervision or reward. Our approach consists of three stages: learning a good representation of first-person views, then learning to explore using memory, and finally learning to navigate by setting its own goals. The model is trained with intrinsic rewards only so that it can be applied to any environment with image observations. We show the benefits of our approach by training an agent to navigate challenging photo-realistic environments from the Gibson dataset with RGB inputs only.
    Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes. (arXiv:2209.05953v1 [stat.ML])
    In this paper, we propose a sample complexity bound for learning a simplex from noisy samples. A dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown arbitrary simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by an additive Gaussian noise of an arbitrary magnitude. We propose a strategy which outputs a simplex having, with high probability, a total variation distance of $\epsilon + O\left(\mathrm{SNR}^{-1}\right)$ from the true simplex, for any $\epsilon>0$. We prove that to arrive this close to the true simplex, it is sufficient to have $n\ge\tilde{O}\left(K^2/\epsilon^2\right)$ samples. Here, SNR stands for the signal-to-noise ratio which can be viewed as the ratio of the diameter of the simplex to the standard deviation of the noise. Our proofs are based on recent advancements in sample compression techniques, which have already shown promises in deriving tight bounds for density estimation in high-dimensional Gaussian mixture models.
    One-shot Network Pruning at Initialization with Discriminative Image Patches. (arXiv:2209.05683v1 [cs.CV])
    One-shot Network Pruning at Initialization (OPaI) is an effective method to decrease network pruning costs. Recently, there is a growing belief that data is unnecessary in OPaI. However, we obtain an opposite conclusion by ablation experiments in two representative OPaI methods, SNIP and GraSP. Specifically, we find that informative data is crucial to enhancing pruning performance. In this paper, we propose two novel methods, Discriminative One-shot Network Pruning (DOP) and Super Stitching, to prune the network by high-level visual discriminative image patches. Our contributions are as follows. (1) Extensive experiments reveal that OPaI is data-dependent. (2) Super Stitching performs significantly better than the original OPaI method on benchmark ImageNet, especially in a highly compressed model.
    Adversarial Inter-Group Link Injection Degrades the Fairness of Graph Neural Networks. (arXiv:2209.05957v1 [cs.LG])
    We present evidence for the existence and effectiveness of adversarial attacks on graph neural networks (GNNs) that aim to degrade fairness. These attacks can disadvantage a particular subgroup of nodes in GNN-based node classification, where nodes of the underlying network have sensitive attributes, such as race or gender. We conduct qualitative and experimental analyses explaining how adversarial link injection impairs the fairness of GNN predictions. For example, an attacker can compromise the fairness of GNN-based node classification by injecting adversarial links between nodes belonging to opposite subgroups and opposite class labels. Our experiments on empirical datasets demonstrate that adversarial fairness attacks can significantly degrade the fairness of GNN predictions (attacks are effective) with a low perturbation rate (attacks are efficient) and without a significant drop in accuracy (attacks are deceptive). This work demonstrates the vulnerability of GNN models to adversarial fairness attacks. We hope our findings raise awareness about this issue in our community and lay a foundation for the future development of GNN models that are more robust to such attacks.
    Generalization Bounds for Deep Transfer Learning Using Majority Predictor Accuracy. (arXiv:2209.05709v1 [cs.LG])
    We analyze new generalization bounds for deep learning models trained by transfer learning from a source to a target task. Our bounds utilize a quantity called the majority predictor accuracy, which can be computed efficiently from data. We show that our theory is useful in practice since it implies that the majority predictor accuracy can be used as a transferability measure, a fact that is also validated by our experiments.  ( 2 min )
    Exploiting Expert Knowledge for Assigning Firms to Industries: A Novel Deep Learning Method. (arXiv:2209.05943v1 [cs.LG])
    Industry assignment, which assigns firms to industries according to a predefined Industry Classification System (ICS), is fundamental to a large number of critical business practices, ranging from operations and strategic decision making by firms to economic analyses by government agencies. Three types of expert knowledge are essential to effective industry assignment: definition-based knowledge (i.e., expert definitions of each industry), structure-based knowledge (i.e., structural relationships among industries as specified in an ICS), and assignment-based knowledge (i.e., prior firm-industry assignments performed by domain experts). Existing industry assignment methods utilize only assignment-based knowledge to learn a model that classifies unassigned firms to industries, and overlook definition-based and structure-based knowledge. Moreover, these methods only consider which industry a firm has been assigned to, but ignore the time-specificity of assignment-based knowledge, i.e., when the assignment occurs. To address the limitations of existing methods, we propose a novel deep learning-based method that not only seamlessly integrates the three types of knowledge for industry assignment but also takes the time-specificity of assignment-based knowledge into account. Methodologically, our method features two innovations: dynamic industry representation and hierarchical assignment. The former represents an industry as a sequence of time-specific vectors by integrating the three types of knowledge through our proposed temporal and spatial aggregation mechanisms. The latter takes industry and firm representations as inputs, computes the probability of assigning a firm to different industries, and assigns the firm to the industry with the highest probability.  ( 3 min )
    Meta-Gradients in Non-Stationary Environments. (arXiv:2209.06159v1 [cs.LG])
    Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we ask: (i) how much information should be given to the learned optimizers, so as to enable faster adaptation and generalization over a lifetime, (ii) what meta-optimizer functions are learned in this process, and (iii) whether meta-gradient methods provide a bigger advantage in highly non-stationary environments. To study the effect of information provided to the meta-optimizer, as in recent works (Flennerhag et al., 2021; Almeida et al., 2021), we replace the tuned meta-parameters of fixed update rules with learned meta-parameter functions of selected context features. The context features carry information about agent performance and changes in the environment and hence can inform learned meta-parameter schedules. We find that adding more contextual information is generally beneficial, leading to faster adaptation of meta-parameter values and increased performance over a lifetime. We support these results with a qualitative analysis of resulting meta-parameter schedules and learned functions of context features. Lastly, we find that without context, meta-gradients do not provide a consistent advantage over the baseline in highly non-stationary environments. Our findings suggest that contextualizing meta-gradients can play a pivotal role in extracting high performance from meta-gradients in non-stationary settings.  ( 3 min )
    Concept-Based Explanations for Tabular Data. (arXiv:2209.05690v1 [cs.LG])
    The interpretability of machine learning models has been an essential area of research for the safe deployment of machine learning systems. One particular approach is to attribute model decisions to high-level concepts that humans can understand. However, such concept-based explainability for Deep Neural Networks (DNNs) has been studied mostly on image domain. In this paper, we extend TCAV, the concept attribution approach, to tabular learning, by providing an idea on how to define concepts over tabular data. On a synthetic dataset with ground-truth concept explanations and a real-world dataset, we show the validity of our method in generating interpretability results that match the human-level intuitions. On top of this, we propose a notion of fairness based on TCAV that quantifies what layer of DNN has learned representations that lead to biased predictions of the model. Also, we empirically demonstrate the relation of TCAV-based fairness to a group fairness notion, Demographic Parity.
    Leveraging Language Foundation Models for Human Mobility Forecasting. (arXiv:2209.05479v1 [cs.LG])
    In this paper, we propose a novel pipeline that leverages language foundation models for temporal sequential pattern mining, such as for human mobility forecasting tasks. For example, in the task of predicting Place-of-Interest (POI) customer flows, typically the number of visits is extracted from historical logs, and only the numerical data are used to predict visitor flows. In this research, we perform the forecasting task directly on the natural language input that includes all kinds of information such as numerical values and contextual semantic information. Specific prompts are introduced to transform numerical temporal sequences into sentences so that existing language models can be directly applied. We design an AuxMobLCast pipeline for predicting the number of visitors in each POI, integrating an auxiliary POI category classification task with the encoder-decoder architecture. This research provides empirical evidence of the effectiveness of the proposed AuxMobLCast pipeline to discover sequential patterns in mobility forecasting tasks. The results, evaluated on three real-world datasets, demonstrate that pre-trained language foundation models also have good performance in forecasting temporal sequences. This study could provide visionary insights and lead to new research directions for predicting human mobility.
    Variance Reduction based Experience Replay for Policy Optimization. (arXiv:2110.08902v2 [cs.LG] UPDATED)
    For reinforcement learning on complex stochastic systems where many factors dynamically impact the output trajectories, it is desirable to effectively leverage the information from historical samples collected in previous iterations to accelerate policy optimization. Classical experience replay allows agents to remember by reusing historical observations. However, the uniform reuse strategy that treats all observations equally overlooks the relative importance of different samples. To overcome this limitation, we propose a general variance reduction based experience replay (VRER) framework that can selectively reuse the most relevant samples to improve policy gradient estimation. This selective mechanism can adaptively put more weight on past samples that are more likely to be generated by the current target distribution. Our theoretical and empirical studies show that the proposed VRER can accelerate the learning of optimal policy and enhance the performance of state-of-the-art policy optimization approaches.
    Blurring Diffusion Models. (arXiv:2209.05557v1 [cs.LG])
    Recently, Rissanen et al., (2022) have presented a new type of diffusion process for generative modeling based on heat dissipation, or blurring, as an alternative to isotropic Gaussian diffusion. Here, we show that blurring can equivalently be defined through a Gaussian diffusion process with non-isotropic noise. In making this connection, we bridge the gap between inverse heat dissipation and denoising diffusion, and we shed light on the inductive bias that results from this modeling choice. Finally, we propose a generalized class of diffusion models that offers the best of both standard Gaussian denoising diffusion and inverse heat dissipation, which we call Blurring Diffusion Models.
    Sample Complexity of an Adversarial Attack on UCB-based Best-arm Identification Policy. (arXiv:2209.05692v1 [cs.LG])
    In this work I study the problem of adversarial perturbations to rewards, in a Multi-armed bandit (MAB) setting. Specifically, I focus on an adversarial attack to a UCB type best-arm identification policy applied to a stochastic MAB. The UCB attack presented in [1] results in pulling a target arm K very often. I used the attack model of [1] to derive the sample complexity required for selecting target arm K as the best arm. I have proved that the stopping condition of UCB based best-arm identification algorithm given in [2], can be achieved by the target arm K in T rounds, where T depends only on the total number of arms and $\sigma$ parameter of $\sigma^2-$ sub-Gaussian random rewards of the arms.
    LSNN Method For Scalar Nonlinear HCLs: Discrete Divergence Operator. (arXiv:2110.10895v2 [math.NA] UPDATED)
    The least-squares neural network (LSNN) method was introduced for solving scalar linear and nonlinear hyperbolic conservation laws in [6, 5]. The method is based on an equivalent least-squares (LS) formulation and employs ReLU neural network as approximating functions, that is especially suitable for approximating discontinuous functions with unknown interface location. In design of the LSNN method for HCLs, numerical approximation of differential operator plays a critical role, and standard numerical or automatic differentiation along coordinate directions usually results in a failing NN-based method. To overcome this difficulty, this paper rewrites HCLs in their divergence form of space and time and introduces a new discrete divergence operator. Theoretically, accuracy of the discrete divergence operator is estimated even if the solution is discontinuous. Numerically, the resulting LSNN method with the new discrete divergence operator is tested for several benchmark problems with both convex and non-convex fluxes; the method is capable of computing the correct physical solution for problems with rarefaction waves and capturing the shock of the underlying problem without oscillation or smearing.
    Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. (arXiv:2209.05757v1 [cs.LG])
    The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure -- unless the clusters are well-separated. To overcome its limitations, we propose a new hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini- or Bonferroni-index) of the cluster sizes does not drastically increase above a given threshold. The presented benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage's speed. The Genie algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution even further. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering. It can be applied on arbitrary spaces equipped with a dissimilarity measure, e.g., on real vectors, DNA or protein sequences, images, rankings, informetric data, etc. A reference implementation of the algorithm has been included in the open source 'genie' package for R. See also https://genieclust.gagolewski.com for a new implementation (genieclust) -- available for both R and Python.
    Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation. (arXiv:2209.05889v1 [stat.ML])
    Machine learning applications are becoming increasingly pervasive in our society. Since these decision-making systems rely on data-driven learning, risk is that they will systematically spread the bias embedded in data. In this paper, we propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations. We delve into the nature of these biases discussing their relationship to moral and justice frameworks. Finally, we exploit our proposed synthetic data generator to perform experiments on different scenarios, with various bias combinations. We thus analyze the impact of biases on performance and fairness metrics both in non-mitigated and mitigated machine learning models.
    Just Noticeable Difference Modeling for Face Recognition System. (arXiv:2209.05856v1 [cs.CV])
    High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation of the FR system. Herein, we make the first attempt to study just noticeable difference (JND) for the FR system, which can be defined as the maximum distortion that the FR system cannot notice. More specifically, we establish a JND dataset including 3530 original images and 137,670 compressed images generated by advanced reference encoding/decoding software based on the Versatile Video Coding (VVC) standard (VTM-15.0). Subsequently, we develop a novel JND prediction model to directly infer JND images for the FR system. In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i.e., identity and residual features, via self-supervised learning. Then, the residual feature is fed into the decoder to generate the residual map. Finally, the predicted JND map is obtained by subtracting the residual map from the original image. Experimental results have demonstrated that the proposed model achieves higher accuracy of JND map prediction compared with the state-of-the-art JND models, and is capable of saving more bits while maintaining the performance of the FR system compared with VTM-15.0.
    How to See Hidden Patterns in Metamaterials with Interpretable Machine Learning. (arXiv:2111.05949v2 [cs.LG] UPDATED)
    Machine learning models can assist with metamaterials design by approximating computationally expensive simulators or solving inverse design problems. However, past work has usually relied on black box deep neural networks, whose reasoning processes are opaque and require enormous datasets that are expensive to obtain. In this work, we develop two novel machine learning approaches to metamaterials discovery that have neither of these disadvantages. These approaches, called shape-frequency features and unit-cell templates, can discover 2D metamaterials with user-specified frequency band gaps. Our approaches provide logical rule-based conditions on metamaterial unit-cells that allow for interpretable reasoning processes, and generalize well across design spaces of different resolutions. The templates also provide design flexibility where users can almost freely design the fine resolution features of a unit-cell without affecting the user's desired band gap.
    A deep variational free energy approach to dense hydrogen. (arXiv:2209.06095v1 [cond-mat.str-el])
    We present a deep generative model-based variational free energy approach to the equations of state of dense hydrogen. We employ a normalizing flow network to model the proton Boltzmann distribution and a fermionic neural network to model the electron wavefunction at given proton positions. By jointly optimizing the two neural networks we reached a comparable variational free energy to the previous coupled electron-ion Monte Carlo calculation. Our result suggests that hydrogen in the planetary condition is even denser compared to previous Monte Carlo and ab initio molecular dynamics data, which is further away from the empirical chemical model predictions. Obtaining reliable equations of state of dense hydrogen, and in particular, direct access to entropy and free energy opens new opportunities in planetary modeling and high-pressure physics research.
    Certified Defences Against Adversarial Patch Attacks on Semantic Segmentation. (arXiv:2209.05980v1 [cs.CV])
    Adversarial patch attacks are an emerging security threat for real world deep learning applications. We present Demasked Smoothing, the first approach (up to our knowledge) to certify the robustness of semantic segmentation models against this threat model. Previous work on certifiably defending against patch attacks has mostly focused on image classification task and often required changes in the model architecture and additional training which is undesirable and computationally expensive. In Demasked Smoothing, any segmentation model can be applied without particular training, fine-tuning, or restriction of the architecture. Using different masking strategies, Demasked Smoothing can be applied both for certified detection and certified recovery. In extensive experiments we show that Demasked Smoothing can on average certify 64% of the pixel predictions for a 1% patch in the detection task and 48% against a 0.5% patch for the recovery task on the ADE20K dataset.
    A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game. (arXiv:2209.05742v1 [cs.LG])
    Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and incentives to manipulate the ranking list. Meanwhile, the intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. To fully understand the possible risks, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data in this paper. From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim. To perform the targeted attack, we formulate the interaction between the adversary and the victim as a game-theoretic framework consisting of two continuous operators while Nash equilibrium is established. Then two procedures against HodgeRank and RankCentrality are constructed to produce the modification of the original data. Furthermore, we prove that the victims will produce the target ranking list once the adversary masters the complete information. It is noteworthy that the proposed methods allow the adversary only to hold incomplete information or imperfect feedback and perform the purposeful attack. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments. These experimental results show that the proposed methods could achieve the attacker's goal in the sense that the leading candidate of the perturbed ranking list is the designated one by the adversary.
    Leveraging Large Language Models for Robot 3D Scene Understanding. (arXiv:2209.05629v1 [cs.RO])
    Semantic 3D scene understanding is a problem of critical importance in robotics. While significant advances have been made in spatial perception, robots are still far from having the common-sense knowledge about household objects and locations of an average human. We thus investigate the use of large language models to impart common sense for scene understanding. Specifically, we introduce three paradigms for leveraging language for classifying rooms in indoor environments based on their contained objects: (i) a zero-shot approach, (ii) a feed-forward classifier approach, and (iii) a contrastive classifier approach. These methods operate on 3D scene graphs produced by modern spatial perception systems. We then analyze each approach, demonstrating notable zero-shot generalization and transfer capabilities stemming from their use of language. Finally, we show these approaches also apply to inferring building labels from contained rooms and demonstrate our zero-shot approach on a real environment. All code can be found at https://github.com/MIT-SPARK/llm_scene_understanding.  ( 2 min )
    Cocktail Party Attack: Breaking Aggregation-Based Privacy in Federated Learning using Independent Component Analysis. (arXiv:2209.05578v1 [cs.LG])
    Federated learning (FL) aims to perform privacy-preserving machine learning on distributed data held by multiple data owners. To this end, FL requires the data owners to perform training locally and share the gradient updates (instead of the private inputs) with the central server, which are then securely aggregated over multiple data owners. Although aggregation by itself does not provably offer privacy protection, prior work showed that it may suffice if the batch size is sufficiently large. In this paper, we propose the Cocktail Party Attack (CPA) that, contrary to prior belief, is able to recover the private inputs from gradients aggregated over a very large batch size. CPA leverages the crucial insight that aggregate gradients from a fully connected layer is a linear combination of its inputs, which leads us to frame gradient inversion as a blind source separation (BSS) problem (informally called the cocktail party problem). We adapt independent component analysis (ICA)--a classic solution to the BSS problem--to recover private inputs for fully-connected and convolutional networks, and show that CPA significantly outperforms prior gradient inversion attacks, scales to ImageNet-sized inputs, and works on large batch sizes of up to 1024.  ( 2 min )
    It's Not Fairness, and It's Not Fair: The Failure of Distributional Equality and the Promise of Relational Equality in Complete-Information Hiring Games. (arXiv:2209.05602v1 [cs.CY])
    Existing efforts to formulate computational definitions of fairness have largely focused on distributional notions of equality, where equality is defined by the resources or decisions given to individuals in the system. Yet existing discrimination and injustice is often the result of unequal social relations, rather than an unequal distribution of resources. Here, we show how optimizing for existing computational and economic definitions of fairness and equality fail to prevent unequal social relations. To do this, we provide an example of a self-confirming equilibrium in a simple hiring market that is relationally unequal but satisfies existing distributional notions of fairness. In doing so, we introduce a notion of blatant relational unfairness for complete-information games, and discuss how this definition helps initiate a new approach to incorporating relational equality into computational systems.  ( 2 min )
    TEDL: A Two-stage Evidential Deep Learning Method for Classification Uncertainty Quantification. (arXiv:2209.05522v1 [cs.LG])
    In this paper, we propose TEDL, a two-stage learning approach to quantify uncertainty for deep learning models in classification tasks, inspired by our findings in experimenting with Evidential Deep Learning (EDL) method, a recently proposed uncertainty quantification approach based on the Dempster-Shafer theory. More specifically, we observe that EDL tends to yield inferior AUC compared with models learnt by cross-entropy loss and is highly sensitive in training. Such sensitivity is likely to cause unreliable uncertainty estimation, making it risky for practical applications. To mitigate both limitations, we propose a simple yet effective two-stage learning approach based on our analysis on the likely reasons causing such sensitivity, with the first stage learning from cross-entropy loss, followed by a second stage learning from EDL loss. We also re-formulate the EDL loss by replacing ReLU with ELU to avoid the Dying ReLU issue. Extensive experiments are carried out on varied sized training corpus collected from a large-scale commercial search engine, demonstrating that the proposed two-stage learning framework can increase AUC significantly and greatly improve training robustness.  ( 2 min )
    R\'{e}nyi Divergence Deep Mutual Learning. (arXiv:2209.05732v1 [cs.LG])
    This paper revisits an incredibly simple yet exceedingly effective computing paradigm, Deep Mutual Learning (DML). We observe that the effectiveness correlates highly to its excellent generalization quality. In the paper, we interpret the performance improvement with DML from a novel perspective that it is roughly an approximate Bayesian posterior sampling procedure. This also establishes the foundation for applying the R\'{e}nyi divergence to improve the original DML, as it brings in the variance control of the prior (in the context of DML). Therefore, we propose R\'{e}nyi Divergence Deep Mutual Learning (RDML). Our empirical results represent the advantage of the marriage of DML and the \renyi{} divergence. The flexible control imposed by the R\'{e}nyi divergence is able to further improve DML to learn better generalized models.  ( 2 min )
    Automatically Assessing Students Performance with Smartphone Data. (arXiv:2209.05596v1 [cs.HC])
    As the number of smart devices that surround us increases, so do the opportunities to create smart socially-aware systems. In this context, mobile devices can be used to collect data about students and to better understand how their day-to-day routines can influence their academic performance. Moreover, the Covid-19 pandemic led to new challenges and difficulties, also for students, with considerable impact on their lifestyle. In this paper we present a dataset collected using a smartphone application (ISABELA), which include passive data (e.g., activity and location) as well as self-reported data from questionnaires. We present several tests with different machine learning models, in order to classify students' performance. These tests were carried out using different time windows, showing that weekly time windows lead to better prediction and classification results than monthly time windows. Furthermore, it is shown that the created models can predict student performance even with data collected from different contexts, namely before and during the Covid-19 pandemic. SVMs, XGBoost and AdaBoost-SAMME with Random Forest were found to be the best algorithms, showing an accuracy greater than 78%. Additionally, we propose a pipeline that uses a decision level median voting algorithm to further improve the models' performance, by using historic data from the students to further improve the prediction. Using this pipeline, it is possible to further increase the performance of the models, with some of them obtaining an accuracy greater than 90%.  ( 3 min )
    Active Learning and Approximate Model Calibration for Automated Visual Inspection in Manufacturing. (arXiv:2209.05486v1 [cs.LG])
    Quality control is a crucial activity performed by manufacturing enterprises to ensure that their products meet quality standards and avoid potential damage to the brand's reputation. The decreased cost of sensors and connectivity enabled increasing digitalization of manufacturing. In addition, artificial intelligence enables higher degrees of automation, reducing overall costs and time required for defect inspection. This research compares three active learning approaches (with single and multiple oracles) to visual inspection. We propose a novel approach to probabilities calibration of classification models and two new metrics to assess the performance of the calibration without the need for ground truth. We performed experiments on real-world data provided by Philips Consumer Lifestyle BV. Our results show that explored active learning settings can reduce the data labeling effort by between three and four percent without detriment to the overall quality goals, considering a threshold of p=0.95. Furthermore, we show that the proposed metrics successfully capture relevant information otherwise available to metrics used up to date only through ground truth data. Therefore, the proposed metrics can be used to estimate the quality of models' probability calibration without committing to a labeling effort to obtain ground truth data.  ( 3 min )
    Deep Neural Networks as Complex Networks. (arXiv:2209.05488v1 [cs.LG])
    Deep Neural Networks are, from a physical perspective, graphs whose `links` and `vertices` iteratively process data and solve tasks sub-optimally. We use Complex Network Theory (CNT) to represents Deep Neural Networks (DNNs) as directed weighted graphs: within this framework, we introduce metrics to study DNNs as dynamical systems, with a granularity that spans from weights to layers, including neurons. CNT discriminates networks that differ in the number of parameters and neurons, the type of hidden layers and activations, and the objective task. We further show that our metrics discriminate low vs. high performing networks. CNT is a comprehensive method to reason about DNNs and a complementary approach to explain a model's behavior that is physically grounded to networks theory and goes beyond the well-studied input-output relation.  ( 2 min )
    Bending the Future: Autoregressive Modeling of Temporal Knowledge Graphs in Curvature-Variable Hyperbolic Spaces. (arXiv:2209.05635v1 [cs.LG])
    Recently there is an increasing scholarly interest in time-varying knowledge graphs, or temporal knowledge graphs (TKG). Previous research suggests diverse approaches to TKG reasoning that uses historical information. However, less attention has been given to the hierarchies within such information at different timestamps. Given that TKG is a sequence of knowledge graphs based on time, the chronology in the sequence derives hierarchies between the graphs. Furthermore, each knowledge graph has its hierarchical level which may differ from one another. To address these hierarchical characteristics in TKG, we propose HyperVC, which utilizes hyperbolic space that better encodes the hierarchies than Euclidean space. The chronological hierarchies between knowledge graphs at different timestamps are represented by embedding the knowledge graphs as vectors in a common hyperbolic space. Additionally, diverse hierarchical levels of knowledge graphs are represented by adjusting the curvatures of hyperbolic embeddings of their entities and relations. Experiments on four benchmark datasets show substantial improvements, especially on the datasets with higher hierarchical levels.  ( 2 min )
    CustOmics: A versatile deep-learning based strategy for multi-omics integration. (arXiv:2209.05485v1 [q-bio.GN])
    Recent advances in high-throughput sequencing technologies have enabled the extraction of multiple features that depict patient samples at diverse and complementary molecular levels. The generation of such data has led to new challenges in computational biology regarding the integration of high-dimensional and heterogeneous datasets that capture the interrelationships between multiple genes and their functions. Thanks to their versatility and ability to learn synthetic latent representations of complex data, deep learning methods offer promising perspectives for integrating multi-omics data. These methods have led to the conception of many original architectures that are primarily based on autoencoder models. However, due to the difficulty of the task, the integration strategy is fundamental to take full advantage of the sources' particularities without losing the global trends. This paper presents a novel strategy to build a customizable autoencoder model that adapts to the dataset used in the case of high-dimensional multi-source integration. We will assess the impact of integration strategies on the latent representation and combine the best strategies to propose a new method, CustOmics (https://github.com/HakimBenkirane/CustOmics). We focus here on the integration of data from multiple omics sources and demonstrate the performance of the proposed method on test cases for several tasks such as classification and survival analysis.  ( 2 min )
  • Open

    Visualizing Image Content to Explain Novel Image Discovery. (arXiv:1908.05006v2 [cs.LG] UPDATED)
    The initial analysis of any large data set can be divided into two phases: (1) the identification of common trends or patterns and (2) the identification of anomalies or outliers that deviate from those trends. We focus on the goal of detecting observations with novel content, which can alert us to artifacts in the data set or, potentially, the discovery of previously unknown phenomena. To aid in interpreting and diagnosing the novel aspect of these selected observations, we recommend the use of novelty detection methods that generate explanations. In the context of large image data sets, these explanations should highlight what aspect of a given image is new (color, shape, texture, content) in a human-comprehensible form. We propose DEMUD-VIS, the first method for providing visual explanations of novel image content by employing a convolutional neural network (CNN) to extract image features, a method that uses reconstruction error to detect novel content, and an up-convolutional network to convert CNN feature representations back into image space. We demonstrate this approach on diverse images from ImageNet, freshwater streams, and the surface of Mars.
    A Tale of HodgeRank and Spectral Method: Target Attack Against Rank Aggregation Is the Fixed Point of Adversarial Game. (arXiv:2209.05742v1 [cs.LG])
    Rank aggregation with pairwise comparisons has shown promising results in elections, sports competitions, recommendations, and information retrieval. However, little attention has been paid to the security issue of such algorithms, in contrast to numerous research work on the computational and statistical characteristics. Driven by huge profits, the potential adversary has strong motivation and incentives to manipulate the ranking list. Meanwhile, the intrinsic vulnerability of the rank aggregation methods is not well studied in the literature. To fully understand the possible risks, we focus on the purposeful adversary who desires to designate the aggregated results by modifying the pairwise data in this paper. From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim. To perform the targeted attack, we formulate the interaction between the adversary and the victim as a game-theoretic framework consisting of two continuous operators while Nash equilibrium is established. Then two procedures against HodgeRank and RankCentrality are constructed to produce the modification of the original data. Furthermore, we prove that the victims will produce the target ranking list once the adversary masters the complete information. It is noteworthy that the proposed methods allow the adversary only to hold incomplete information or imperfect feedback and perform the purposeful attack. The effectiveness of the suggested target attack strategies is demonstrated by a series of toy simulations and several real-world data experiments. These experimental results show that the proposed methods could achieve the attacker's goal in the sense that the leading candidate of the perturbed ranking list is the designated one by the adversary.
    Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent. (arXiv:2106.13792v3 [cs.LG] UPDATED)
    Although the optimization objectives for learning neural networks are highly non-convex, gradient-based methods have been wildly successful at learning neural networks in practice. This juxtaposition has led to a number of recent studies on provable guarantees for neural networks trained by gradient descent. Unfortunately, the techniques in these works are often highly specific to the particular setup in each problem, making it difficult to generalize across different settings. To address this drawback in the literature, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that gradient descent on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.
    An Optimal Transport Formulation of Bayes' Law for Nonlinear Filtering Algorithms. (arXiv:2203.11869v2 [math.OC] UPDATED)
    This paper presents a variational representation of the Bayes' law using optimal transportation theory. The variational representation is in terms of the optimal transportation between the joint distribution of the (state, observation) and their independent coupling. By imposing certain structure on the transport map, the solution to the variational problem is used to construct a Brenier-type map that transports the prior distribution to the posterior distribution for any value of the observation signal. The new formulation is used to derive the optimal transport form of the Ensemble Kalman filter (EnKF) for the discrete-time filtering problem and propose a novel extension of EnKF to the non-Gaussian setting utilizing input convex neural networks. Finally, the proposed methodology is used to derive the optimal transport form of the feedback particle filler (FPF) in the continuous-time limit, which constitutes its first variational construction without explicitly using the nonlinear filtering equation or Bayes' law.
    Enhanced Membership Inference Attacks against Machine Learning Models. (arXiv:2111.09679v4 [cs.LG] UPDATED)
    How much does a machine learning algorithm leak about its training data, and why? Membership inference attacks are used as an auditing tool to quantify this leakage. In this paper, we present a comprehensive \textit{hypothesis testing framework} that enables us not only to formally express the prior work in a consistent way, but also to design new membership inference attacks that use reference models to achieve a significantly higher power (true positive rate) for any (false positive rate) error. More importantly, we explain \textit{why} different attacks perform differently. We present a template for indistinguishability games, and provide an interpretation of attack success rate across different instances of the game. We discuss various uncertainties of attackers that arise from the formulation of the problem, and show how our approach tries to minimize the attack uncertainty to the one bit secret about the presence or absence of a data point in the training set. We perform a \textit{differential analysis} between all types of attacks, explain the gap between them, and show what causes data points to be vulnerable to an attack (as the reasons vary due to different granularities of memorization, from overfitting to conditional memorization). Our auditing framework is openly accessible as part of the \textit{Privacy Meter} software tool.
    Interpreting and predicting the economy flows: A time-varying parameter global vector autoregressive integrated the machine learning model. (arXiv:2209.05998v1 [econ.EM])
    The paper proposes a time-varying parameter global vector autoregressive (TVP-GVAR) framework for predicting and analysing developed region economic variables. We want to provide an easily accessible approach for the economy application settings, where a variety of machine learning models can be incorporated for out-of-sample prediction. The LASSO-type technique for numerically efficient model selection of mean squared errors (MSEs) is selected. We show the convincing in-sample performance of our proposed model in all economic variables and relatively high precision out-of-sample predictions with different-frequency economic inputs. Furthermore, the time-varying orthogonal impulse responses provide novel insights into the connectedness of economic variables at critical time points across developed regions. We also derive the corresponding asymptotic bands (the confidence intervals) for orthogonal impulse responses function under standard assumptions.
    Calibrated Forecasts: The Minimax Proof. (arXiv:2209.05863v1 [econ.TH])
    A formal write-up of the simple proof (1995) of the existence of calibrated forecasts by the minimax theorem, which moreover shows that N^3 periods suffice to guarantee a 1/N calibration error.
    Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes. (arXiv:2209.05953v1 [stat.ML])
    In this paper, we propose a sample complexity bound for learning a simplex from noisy samples. A dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown arbitrary simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by an additive Gaussian noise of an arbitrary magnitude. We propose a strategy which outputs a simplex having, with high probability, a total variation distance of $\epsilon + O\left(\mathrm{SNR}^{-1}\right)$ from the true simplex, for any $\epsilon>0$. We prove that to arrive this close to the true simplex, it is sufficient to have $n\ge\tilde{O}\left(K^2/\epsilon^2\right)$ samples. Here, SNR stands for the signal-to-noise ratio which can be viewed as the ratio of the diameter of the simplex to the standard deviation of the noise. Our proofs are based on recent advancements in sample compression techniques, which have already shown promises in deriving tight bounds for density estimation in high-dimensional Gaussian mixture models.
    Directed mixed membership stochastic blockmodel. (arXiv:2101.02307v3 [stat.ML] UPDATED)
    Mixed membership problem for undirected network has been well studied in network analysis recent years. However, the more general case of mixed membership for directed network in which nodes can belong to multiple communities remains a challenge. Here, we propose an interpretable and identifiable model: directed mixed membership stochastic blockmodel (DiMMSB) for directed mixed membership networks. DiMMSB allows that row nodes and column nodes of the adjacency matrix can be different and these nodes may have distinct community structure in a directed network. We also develop an efficient spectral algorithm called DiSP designed based on simplex structures inherent in the left and right singular vectors of the population adjacency matrix to estimate the mixed memberships for both row nodes and column nodes in a directed network. We show that DiSP is asymptotically consistent under mild conditions by providing error bounds for the inferred membership vectors of each row node and each column node using delicate spectral analysis. Numerical results on computer-generated directed mixed membership networks support our theoretical findings and show that our DiSP outperforms its competitor in both error rates and run-time. Applications of DiSP to real-world directed networks demonstrate the advantages of DiSP in studying the asymmetric structure of directed networks.
    Mathematical Framework for Online Social Media Regulation. (arXiv:2209.05550v1 [cs.LG])
    Social media platforms (SMPs) leverage algorithmic filtering (AF) as a means of selecting the content that constitutes a user's feed with the aim of maximizing their rewards. Selectively choosing the contents to be shown on the user's feed may yield a certain extent of influence, either minor or major, on the user's decision-making, compared to what it would have been under a natural/fair content selection. As we have witnessed over the past decade, algorithmic filtering can cause detrimental side effects, ranging from biasing individual decisions to shaping those of society as a whole, for example, diverting users' attention from whether to get the COVID-19 vaccine or inducing the public to choose a presidential candidate. The government's constant attempts to regulate the adverse effects of AF are often complicated, due to bureaucracy, legal affairs, and financial considerations. On the other hand SMPs seek to monitor their own algorithmic activities to avoid being fined for exceeding the allowable threshold. In this paper, we mathematically formalize this framework and utilize it to construct a data-driven statistical algorithm to regulate the AF from deflecting users' beliefs over time, along with sample and complexity guarantees. We show that our algorithm is robust against potential adversarial users. This state-of-the-art algorithm can be used either by authorities acting as external regulators or by SMPs for self-regulation.
    Addressing overfitting in spectral clustering via a non-parametric bootstrap. (arXiv:2209.05812v1 [stat.ML])
    Finite mixture modelling is a popular method in the field of clustering and is beneficial largely due to its soft cluster membership probabilities. However, the most common algorithm for fitting finite mixture models, the EM algorithm, falls victim to a number of issues. We address these issues that plague clustering using finite mixture models, including convergence to solutions corresponding to local maxima and algorithm speed concerns in high dimensional cases. This is done by developing two novel algorithms that incorporate a spectral decomposition of the data matrix and a non-parametric bootstrap sampling scheme. Simulations show the validity of our algorithms and demonstrate not only their flexibility but also their ability to avoid solutions corresponding to local-maxima, when compared to other (bootstrapped) clustering algorithms for estimating finite mixture models. Our novel algorithms have a typically more consistent convergence criteria as well as a significant increase in speed over other bootstrapped algorithms that fit finite mixture models.
    Challenges and Pitfalls of Bayesian Unlearning. (arXiv:2207.03227v2 [cs.LG] UPDATED)
    Machine unlearning refers to the task of removing a subset of training data, thereby removing its contributions to a trained model. Approximate unlearning are one class of methods for this task which avoid the need to retrain the model from scratch on the retained data. Bayes' rule can be used to cast approximate unlearning as an inference problem where the objective is to obtain the updated posterior by dividing out the likelihood of deleted data. However this has its own set of challenges as one often doesn't have access to the exact posterior of the model parameters. In this work we examine the use of the Laplace approximation and Variational Inference to obtain the updated posterior. With a neural network trained for a regression task as the guiding example, we draw insights on the applicability of Bayesian unlearning in practical scenarios.
    Investigating Bias with a Synthetic Data Generator: Empirical Evidence and Philosophical Interpretation. (arXiv:2209.05889v1 [stat.ML])
    Machine learning applications are becoming increasingly pervasive in our society. Since these decision-making systems rely on data-driven learning, risk is that they will systematically spread the bias embedded in data. In this paper, we propose to analyze biases by introducing a framework for generating synthetic data with specific types of bias and their combinations. We delve into the nature of these biases discussing their relationship to moral and justice frameworks. Finally, we exploit our proposed synthetic data generator to perform experiments on different scenarios, with various bias combinations. We thus analyze the impact of biases on performance and fairness metrics both in non-mitigated and mitigated machine learning models.
    Unsupervised representational learning with recognition-parametrised probabilistic models. (arXiv:2209.05661v1 [cs.LG])
    We introduce a new approach to probabilistic unsupervised learning based on the recognition-parametrised model (RPM): a normalised semi-parametric hypothesis class for joint distributions over observed and latent variables. Under the key assumption that observations are conditionally independent given the latents, RPMs directly encode the "recognition" process, parametrising both the prior distribution on the latents and their conditional distributions given observations. This recognition model is paired with non-parametric descriptions of the marginal distribution of each observed variable. Thus, the focus is on learning a good latent representation that captures dependence between the measurements. The RPM permits exact maximum likelihood learning in settings with discrete latents and a tractable prior, even when the mapping between continuous observations and the latents is expressed through a flexible model such as a neural network. We develop effective approximations for the case of continuous latent variables with tractable priors. Unlike the approximations necessary in dual-parametrised models such as Helmholtz machines and variational autoencoders, these RPM approximations introduce only minor bias, which may often vanish asymptotically. Furthermore, where the prior on latents is intractable the RPM may be combined effectively with standard probabilistic techniques such as variational Bayes. We demonstrate the model in high dimensional data settings, including a form of weakly supervised learning on MNIST digits and the discovery of latent maps from sensory observations. The RPM provides an effective way to discover, represent and reason probabilistically about the latent structure underlying observational data, functions which are critical to both animal and artificial intelligence.
    Uncovering Regions of Maximum Dissimilarity on Random Process Data. (arXiv:2209.05569v1 [stat.ME])
    The comparison of local characteristics of two random processes can shed light on periods of time or space at which the processes differ the most. This paper proposes a method that learns about regions with a certain volume, where the marginal attributes of two processes are less similar. The proposed methods are devised in full generality for the setting where the data of interest are themselves stochastic processes, and thus the proposed method can be used for pointing out the regions of maximum dissimilarity with a certain volume, in the contexts of functional data, time series, and point processes. The parameter functions underlying both stochastic processes of interest are modeled via a basis representation, and Bayesian inference is conducted via an integrated nested Laplace approximation. The numerical studies validate the proposed methods, and we showcase their application with case studies on criminology, finance, and medicine.
    Distribution Compression in Near-linear Time. (arXiv:2111.07941v5 [stat.ML] UPDATED)
    In distribution compression, one aims to accurately summarize a probability distribution $\mathbb{P}$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$. Unfortunately, these algorithms suffer from quadratic or super-quadratic runtime in the sample size $n$. To address this deficiency, we introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm while suffering at most a factor of $4$ in error. When combined with the quadratic-time kernel halving and kernel thinning algorithms of Dwivedi and Mackey (2021), Compress++ delivers $\sqrt{n}$ points with $\mathcal{O}(\sqrt{\log n/n})$ integration error and better-than-Monte-Carlo maximum mean discrepancy in $\mathcal{O}(n \log^3 n)$ time and $\mathcal{O}( \sqrt{n} \log^2 n )$ space. Moreover, Compress++ enjoys the same near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. In our benchmarks with high-dimensional Monte Carlo samples and Markov chains targeting challenging differential equation posteriors, Compress++ matches or nearly matches the accuracy of its input algorithm in orders of magnitude less time.
    Variational Causal Inference. (arXiv:2209.05935v1 [stat.ML])
    Estimating an individual's potential outcomes under counterfactual treatments is a challenging task for traditional causal inference and supervised learning approaches when the outcome is high-dimensional (e.g. gene expressions, impulse responses, human faces) and covariates are relatively limited. In this case, to construct one's outcome under a counterfactual treatment, it is crucial to leverage individual information contained in its observed factual outcome on top of the covariates. We propose a deep variational Bayesian framework that rigorously integrates two main sources of information for outcome construction under a counterfactual treatment: one source is the individual features embedded in the high-dimensional factual outcome; the other source is the response distribution of similar subjects (subjects with the same covariates) that factually received this treatment of interest.
    Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals. (arXiv:2203.13887v4 [econ.EM] UPDATED)
    We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction. We provide further applications of our approach to estimation of dynamic discrete choice models and estimation of long-term effects with surrogates.
    The Mori-Zwanzig formulation of deep learning. (arXiv:2209.05544v1 [cs.LG])
    We develop a new formulation of deep learning based on the Mori-Zwanzig (MZ) formalism of irreversible statistical mechanics. The new formulation is built upon the well-known duality between deep neural networks and discrete stochastic dynamical systems, and it allows us to directly propagate quantities of interest (conditional expectations and probability density functions) forward and backward through the network by means of exact linear operator equations. Such new equations can be used as a starting point to develop new effective parameterizations of deep neural networks, and provide a new framework to study deep-learning via operator theoretic methods. The proposed MZ formulation of deep learning naturally introduces a new concept, i.e., the memory of the neural network, which plays a fundamental role in low-dimensional modeling and parameterization. By using the theory of contraction mappings, we develop sufficient conditions for the memory of the neural network to decay with the number of layers. This allows us to rigorously transform deep networks into shallow ones, e.g., by reducing the number of neurons per layer (using projection operators), or by reducing the total number of layers (using the decaying property of the memory operator).
    Concept Drift Monitoring and Diagnostics of Supervised Learning Models via Score Vectors. (arXiv:2012.06916v2 [stat.ML] UPDATED)
    Supervised learning models are one of the most fundamental classes of models. Viewing supervised learning from a probabilistic perspective, the set of training data to which the model is fitted is usually assumed to follow a stationary distribution. However, this stationarity assumption is often violated in a phenomenon called concept drift, which refers to changes over time in the predictive relationship between covariates $\mathbf{X}$ and a response variable $Y$ and can render trained models suboptimal or obsolete. We develop a comprehensive and computationally efficient framework for detecting, monitoring, and diagnosing concept drift. Specifically, we monitor the Fisher score vector, defined as the gradient of the log-likelihood for the fitted model, using a form of multivariate exponentially weighted moving average, which monitors for general changes in the mean of a random vector. In spite of the substantial performance advantages that we demonstrate over popular error-based methods, a score-based approach has not been previously considered for concept drift monitoring. Advantages of the proposed score-based framework include applicability to any parametric model, more powerful detection of changes as shown in theory and experiments, and inherent diagnostic capabilities for helping to identify the nature of the changes.
    The power of private likelihood-ratio tests for goodness-of-fit in frequency tables. (arXiv:2109.09630v2 [math.ST] UPDATED)
    Privacy-protecting data analysis investigates statistical methods under privacy constraints. This is a rising challenge in modern statistics, as the achievement of confidentiality guarantees, which typically occurs through suitable perturbations of the data, may determine a loss in the statistical utility of the data. In this paper, we consider privacy-protecting tests for goodness-of-fit in frequency tables, this being arguably the most common form of releasing data, and present a rigorous analysis of the large sample behaviour of a private likelihood-ratio (LR) test. Under the framework of $(\varepsilon,\delta)$-differential privacy for perturbed data, our main contribution is the power analysis of the private LR test, which characterizes the trade-off between confidentiality, measured via the differential privacy parameters $(\varepsilon,\delta)$, and statistical utility, measured via the power of the test. This is obtained through a Bahadur-Rao large deviation expansion for the power of the private LR test, bringing out a critical quantity, as a function of the sample size, the dimension of the table and $(\varepsilon,\delta)$, that determines a loss in the power of the test. Such a result is then applied to characterize the impact of the sample size and the dimension of the table, in connection with the parameters $(\varepsilon,\delta)$, on the loss of the power of the private LR test. In particular, we determine the (sample) cost of $(\varepsilon,\delta)$-differential privacy in the private LR test, namely the additional sample size that is required to recover the power of the Multinomial LR test in the absence of perturbation. Our power analysis rely on a non-standard large deviation analysis for the LR, as well as the development of a novel (sharp) large deviation principle for sum of i.i.d. random vectors, which is of independent interest.
    Blurring Diffusion Models. (arXiv:2209.05557v1 [cs.LG])
    Recently, Rissanen et al., (2022) have presented a new type of diffusion process for generative modeling based on heat dissipation, or blurring, as an alternative to isotropic Gaussian diffusion. Here, we show that blurring can equivalently be defined through a Gaussian diffusion process with non-isotropic noise. In making this connection, we bridge the gap between inverse heat dissipation and denoising diffusion, and we shed light on the inductive bias that results from this modeling choice. Finally, we propose a generalized class of diffusion models that offers the best of both standard Gaussian denoising diffusion and inverse heat dissipation, which we call Blurring Diffusion Models.
    BR-SNIS: Bias Reduced Self-Normalized Importance Sampling. (arXiv:2207.06364v2 [stat.ML] UPDATED)
    Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights. In many applications, the target distribution is known only up to a normalization constant, in which case self-normalized IS (SNIS) can be used. While the use of self-normalization can have a positive effect on the dispersion of the estimator, it introduces bias. In this work, we propose a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without increasing the variance. This method is a wrapper in the sense that it uses the same proposal samples and importance weights as SNIS, but makes clever use of iterated sampling--importance resampling (ISIR) to form a bias-reduced version of the estimator. We furnish the proposed algorithm with rigorous theoretical results, including new bias, variance and high-probability bounds, and these are illustrated by numerical examples.
    On clustering uncertain and structured data with Wasserstein barycenters and a geodesic criterion for the number of clusters. (arXiv:1912.11801v3 [stat.ML] UPDATED)
    In this work clustering schemes for uncertain and structured data are considered relying on the notion of Wasserstein barycenters, accompanied by appropriate clustering indices based on the intrinsic geometry of the Wasserstein space where the clustering task is performed. Such type of clustering approaches are highly appreciated in many fields where the observational/experimental error is significant (e.g. astronomy, biology, remote sensing, etc.) or the data nature is more complex and the traditional learning algorithms are not applicable or effective to treat them (e.g. network data, interval data, high frequency records, matrix data, etc.). Under this perspective, each observation is identified by an appropriate probability measure and the proposed clustering schemes rely on discrimination criteria that utilize the geometric structure of the space of probability measures through core techniques from the optimal transport theory. The advantages and capabilities of the proposed approach and the geodesic criterion performance are illustrated through a simulation study and the implementation in two real world applications: (a) clustering eurozone countries according to their observed government bond yield curves and (b) classifying the areas of a satellite image to certain land uses categories, a standard task in remote sensing.
    Feature Grouping and Sparse Principal Component Analysis with Truncated Regularization. (arXiv:2106.13685v2 [stat.ME] UPDATED)
    In this paper, we consider a new variant for principal component analysis (PCA), aiming to capture the grouping and/or sparse structures of factor loadings simultaneously. To achieve these goals, we employ a non-convex truncated regularization with naturally adjustable sparsity and grouping effects, and propose the Feature Grouping and Sparse Principal Component Analysis (FGSPCA). The proposed FGSPCA method encourages the factor loadings with similar values to collapse into disjoint homogeneous groups for feature grouping or into a special zero-valued group for feature selection, which in turn helps reducing model complexity and increasing model interpretation. Usually, existing structured PCA methods require prior knowledge to construct the regularization term. However, the proposed FGSPCA can simultaneously capture the grouping and/or sparse structures of factor loadings without any prior information. To solve the resulting non-convex optimization problem, we propose an alternating algorithm that incorporates the difference-of-convex programming, augmented Lagrange method and coordinate descent method. Experimental results demonstrate the promising performance and efficiency of the new method on both synthetic and real-world datasets. An R implementation of FGSPCA can be found on github {https://github.com/higeeks/FGSPCA}.
    Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. (arXiv:2209.05757v1 [cs.LG])
    The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure -- unless the clusters are well-separated. To overcome its limitations, we propose a new hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini- or Bonferroni-index) of the cluster sizes does not drastically increase above a given threshold. The presented benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage's speed. The Genie algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution even further. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering. It can be applied on arbitrary spaces equipped with a dissimilarity measure, e.g., on real vectors, DNA or protein sequences, images, rankings, informetric data, etc. A reference implementation of the algorithm has been included in the open source 'genie' package for R. See also https://genieclust.gagolewski.com for a new implementation (genieclust) -- available for both R and Python.
    Learning-augmented count-min sketches via Bayesian nonparametrics. (arXiv:2102.04462v3 [stat.ML] UPDATED)
    The count-min sketch (CMS) is a time and memory efficient randomized data structure that provides estimates of tokens' frequencies in a data stream of tokens, i.e. point queries, based on random hashed data. A learning-augmented version of the CMS, referred to as CMS-DP, has been proposed by Cai, Mitzenmacher and Adams (\textit{NeurIPS} 2018), and it relies on Bayesian nonparametric (BNP) modeling of the data stream of tokens via a Dirichlet process (DP) prior, with estimates of a point query being obtained as suitable mean functionals of the posterior distribution of the point query, given the hashed data. While the CMS-DP has proved to improve on some aspects of CMS, it has the major drawback of arising from a ``constructive" proof that builds upon arguments tailored to the DP prior, namely arguments that are not usable for other nonparametric priors. In this paper, we present a ``Bayesian" proof of the CMS-DP that has the main advantage of building upon arguments that are usable, in principle, within a broad class of nonparametric priors arising from normalized completely random measures. This result leads to develop a novel learning-augmented CMS under power-law data streams, referred to as CMS-PYP, which relies on BNP modeling of the data stream of tokens via a Pitman-Yor process (PYP) prior. Under this more general framework, we apply the arguments of the ``Bayesian" proof of the CMS-DP, suitably adapted to the PYP prior, in order to compute the posterior distribution of a point query, given the hashed data. Applications to synthetic data and real textual data show that the CMS-PYP outperforms the CMS and the CMS-DP in estimating low-frequency tokens, which are known to be of critical interest in textual data, and it is competitive with respect to a variation of the CMS designed for low-frequency tokens. An extension of our BNP approach to more general queries is also discussed.

  • Open

    Weekly China AI News: Shenzhen Passes China's 1st AI Regulation; Alibaba Welcomes Former Didi, JD AI Masterminds; Tencent's Wheel-Legged Robot Develops New Acrobatic Skills
    submitted by /u/trcytony [link] [comments]  ( 87 min )
    Extracting data from text using GPT-3
    submitted by /u/juliarmg [link] [comments]  ( 87 min )
    Concepts, abstractions, and analogies: Three abilities AI is missing
    submitted by /u/bendee983 [link] [comments]  ( 94 min )
    Elon Musk: SpaceX has had “promising conversations” with Apple.
    https://digesttime.com/2022/09/11/elon-musk-spacex-has-had-promising-conversations-with-apple/ submitted by /u/Theauntgate [link] [comments]  ( 86 min )
    Leaked design for MODOK in Ant-Man and the Wasp: Quantumania has surfaced + More info
    submitted by /u/BitOddInnit [link] [comments]  ( 90 min )
    Understanding consciousness is more important than ever at the dawn of this AI age
    submitted by /u/Csai [link] [comments]  ( 87 min )
    I asked GPT-3 to write something captivating you could not stop reading, here are the results
    submitted by /u/SupPandaHugger [link] [comments]  ( 87 min )
    Any suggestions on tutorials for how to make a virtual assistant app?
    submitted by /u/Logical_Train_5787 [link] [comments]  ( 86 min )
    New Google AI Generates Video of Beautiful Scenery From 1 Photo | Intelligent Quantum Sensor | AI Powered X-Ray Detects Cancer
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    The Follower: AI art project tracks down influencers
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 87 min )
    What is Artificial General Intelligence (AGI)?
    The idea of AGI is still just a theory. It is defined as AI that can think and reason like a human in a wide range of areas, such as language processing, image processing, computational reasoning, and so on. We are still a long way from making a system with AGI. To think like humans, an AGI system would need to be made up of thousands of Artificial Narrow Intelligence systems that work together and talk to each other. Even with the most advanced computing systems and infrastructures, like Fujitsu's K or IBM's Watson, it has taken 40 minutes to simulate a single second of neuronal activity. This shows both how complicated and interconnected the human brain is and how hard it will be to build an AGI with the tools we have now. submitted by /u/Ishan220699 [link] [comments]  ( 88 min )
    What makes AI Important for Mobile Apps?
    The major limitation in defining AI as simply “building machines that are intelligent” is that it doesn't actually explain what AI is and what makes a machine intelligent. AI is an interdisciplinary science with multiple approaches, but advancements in machine learning and deep learning are creating a paradigm shift in virtually every sector of the tech industry. Here is a short video for you all: https://www.youtube.com/watch?v=1IPHRmTGmmk&t submitted by /u/PenKindly5950 [link] [comments]  ( 87 min )
    BERT Tokenizers NuGet Package for C#
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 86 min )
    Stable Diffusion Weekly AI Art Video 4K 30 FPS 9.13.22 Amazing Cadence D...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Getting the most out of Stable Diffusion 👇🏽 See comments
    submitted by /u/mdfnb [link] [comments]  ( 92 min )
    QUEEN ELIZABETH II ENDS A CULTURAL EPOCH: FALL PREVIEW
    submitted by /u/Artlever [link] [comments]  ( 92 min )
    What have been the most impactful uses of artificial intelligence so far?
    submitted by /u/ibexVR [link] [comments]  ( 93 min )
    [P] PaddleSeg: An easy-to-use image segmentation library with awesome pre-trained model zoo
    Hi, all, I am glad to share an open source repository PaddleSeg, which provides the ability of designing, training and deploying segmentation models. Code: https://github.com/PaddlePaddle/PaddleSeg Features Set: Support several tasks: Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, etc. Provide 40+ semantic segmentation models and 140+ high-quality pre-training models Provide efficient interactive segmentation tool (EISeg) for annotating images Release a variety of human matting and portrait segmentation models for practical application without training Support 3D medical image segmentation Support Linux, Windows, MacOS and other systems Hope more people can benefit from PaddleSeg project. https://i.redd.it/hxjle8x2cjn91.gif https://i.redd.it/qh083525cjn91.gif submitted by /u/Effective_Tax_2096 [link] [comments]  ( 87 min )
  • Open

    [D] A particular satirical graphic from a Neurips presentation
    About 8 months ago, I was browsing Neurips presentations and saw a graphic depicting a theorist saying some complicated stuff, and another stick figure with crossed eyes yelling "ADD MOAR LAYERS," and in the background, there was a graph labeled "performance" going up. Kind of like a Stonks meme. This might be a bit weird, but does anyone remember a graphic like that on a past presentation? I've been trying to find it myself but have been relatively unsuccessful; I believe it was a Neurips presentation made by some Deepmind folks, but I'm not sure what year it was. submitted by /u/Permagnanate [link] [comments]  ( 101 min )
    Git Re-Basin: Merging Models modulo Permutation Symmetries
    submitted by /u/89237849237498237427 [link] [comments]  ( 105 min )
    [N] Releasing the MLPerf automation framework to plug in real-world ML models, data sets and tools
    Hi! Just sharing our open-source project to automate MLPerf benchmarks and make it easier for everyone to plug in their real-world ML models, data sets, frameworks/SDKs and hardware. Here is a simple example of a modular image classification to explain the concept. Feedback is very welcome! submitted by /u/gfursin [link] [comments]  ( 89 min )
    [Discussion] What are the top N journals for Machine Learning and/or for NLP
    Looking for a community opinion on the top machine learning journals in general and/or for NLP. Choose an N significant for you. Perhaps you only care about 3 journals, or maybe 15. Feel free to expand on your preferences. submitted by /u/Geckel [link] [comments]  ( 92 min )
    [D] Random-walk with restarts as a diffusion process? Is it possible to model to reverse process?
    Apologies if I have misunderstood any of the foundations of diffusion models, this was a thought that occurred to me when doing some analysis for my own research using random-walks and I am still learning about diffusion models and the underlying math. ​ In my research we do a lot of graph embedding. As in full graph embedding, not just node embeddings. We have recently found that depending on the "scale" of features we want the embedding to emphasize (i.e small clusters of nodes connected across small distances vs large neighborhoods of nodes connected across large distances) we can run a random-walk with restarts to smooth out the small scale features and emphasize large clusters in the graph. ​ This is helpful for if we want to go in that direction weighting large scale features more than small scale, but I just wondered if it would be possible to go in the other direction and diminish some of the large scale features and emphasize the smaller scale features by running some kind of reverse random-walk process. In practice I've found that I can pretty much just set a maximum edge distance and this usually works ok but not consistently. ​ I was unable to find anything like this mentioned already, at least in my classical graph theory references, so I just turned to what was fresh in my mind: diffusion models. Could we use the random-walk with restarts as the forward diffusion process and then train a model to learn the reverse process, i.e a "reverse" random-walk? Has anything like this been tried before? Would this even make sense to do since the random walk doesn't "destroy" the signal but converges to a stationary distribution? ​ Thank you for any insight! submitted by /u/dimsycamore [link] [comments]  ( 93 min )
    Virtual conferences [D]
    I quite liked the affordability of attending conferences during COVID. Can anyone name conferences that are keeping virtual attendance an option? submitted by /u/lemlo100 [link] [comments]  ( 100 min )
    [D] Combining LayerNorm and BatchNorm
    Hi everyone, I wondered if using LayerNorm and BatchNorm together in the same network makes sense, for instance, if you were using a ResNet to extract features from an image and used a Transformer with multi-head attention as the classification head, would it make sense to use BatchNorm for ResNet layers and LayerNorm for the transformer layers? submitted by /u/TheInnocuousOne [link] [comments]  ( 89 min )
    [R] 12th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART)
    Hello colleagues, We are organizing the 12th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART) and we think it may be of interest to many of you. The conference will take place in Brno, Czech Republic, between 12 and 14 April 2023. If you work with Artificial Intelligence techniques applied to visual art, music, sound synthesis, architecture, video, poetry, design or other creative tasks, you can present your work at this conference. If not, it is also a great opportunity to know all the news of research in these fields. For more information, visit the event's webpage: https://www.evostar.org/2023/evomusart/ ​ https://preview.redd.it/w9oarzr2fln91.png?width=4167&format=png&auto=webp&s=2780188336858e5a77559eceda0f8a24d5f1e464 submitted by /u/evomusart_conference [link] [comments]  ( 89 min )
    [D] What are some AI conferences that give outstanding reviewers credits
    As far as I know, CVPR and NeurIPS have outstadning/best reviewers. What are some other conferences that give credits to excellent reviewers? submitted by /u/xiikjuy [link] [comments]  ( 89 min )
    [P] Mirage's Stable Diffusion Platform
    Hi! Our team at Mirage created a tool using NVIDIA's textual-inversion to personalize StabilityAI's diffusion model using 3-5 example images. Check it out at: https://www.app.mirageml.com Demo Video: https://www.loom.com/embed/3d20c1c93bf44915974b4a81d536966c I believe this post does not violate the community's rules. Mod's please let me know otherwise. submitted by /u/mirageml [link] [comments]  ( 88 min )
    [D] Job Market in 2022?
    Wondering how people who’re currently applying are fairing. I’m a Carnegie Mellon masters student in a vision program with a several years of production experience in vision, and aside from return offers and the odd start up, the pickings seem incredibly slim for both myself and really brilliant classmates. Between hiring freezes, many companies downsizing or stopping expansion of ML related departments, the startup bubble bursting (VCs not just throwing cash at every “AI” company), and increased competition from everyone and their mother thinking ML is the next “big thing”, it seems like an already competitive field has gotten even more challenging in the last year. I might be overly pessimistic, but I never recommend undergrads pursue ML anymore unless they have a serious passion or interest for it, and aren’t just hyped about the latest OpenAI release. Just about any other CS related specialization offers significantly more security, less competition, and for the degree of education and experience necessary, frequently more pay. I could have a pretty narrow perspective from where I’m at in my career though, so I’d love to hear everyone’s thoughts on the future in industry and academia. submitted by /u/RepsNRobots [link] [comments]  ( 89 min )
  • Open

    LOLNeRF: Learn from One Look
    Posted by Daniel Rebain, Student Researcher, and Mark Matthews, Senior Software Engineer, Google Research, Perception Team An important aspect of human vision is our ability to comprehend 3D shape from the 2D images we observe. Achieving this kind of understanding with computer vision systems has been a fundamental challenge in the field. Many successful approaches rely on multi-view data, where two or more images of the same scene are available from different perspectives, which makes it much easier to infer the 3D shape of objects in the images. There are, however, many situations where it would be useful to know 3D structure from a single image, but this problem is generally difficult or impossible to solve. For example, it isn’t necessarily possible to tell the difference between an …  ( 22 min )
  • Open

    DSC Weekly 13 Sept 2022 – The Automation Balance
    n many respects, we are facing not the need for a new form of money but rather a new form of economics - a discipline about the world where scarcity still holds in physical materials but where overabundance is the rule in virtual ones. To me, this is one of the key tenets that need to be hammered out in the metaverse: How do the actual creators of the virtual worlds, and not just the hosts, get paid for their work? The post DSC Weekly 13 Sept 2022 – The Automation Balance appeared first on Data Science Central.  ( 25 min )
    Cybercrime after COVID-19
    The COVID-19 pandemic remains fresh in our memories as it affected many aspects of our lives. Cyberspace is no exception. With the imperative need to stay safe, individuals had to create alternate methods to work, school, communicate and access services. However, this period also saw cybercriminals double their efforts to exploit the situation. The post Cybercrime after COVID-19 appeared first on Data Science Central.  ( 21 min )
    How Long Does It Take To Learn Blockchain?
    Blockchain is a sophisticated technology. It uses cryptography extensively to secure records and build a tamper-proof network, wherein records can’t be altered unless validated by a majority of participating parties on the network. The post How Long Does It Take To Learn Blockchain? appeared first on Data Science Central.  ( 20 min )
    Challenges and Best Practices of Data Cleansing
    Data accuracy is the biggest challenge many businesses encounter in their quest to cleanse data. Having accurate data is the foundation of the usefulness of data in all its stages of use. The post Challenges and Best Practices of Data Cleansing appeared first on Data Science Central.  ( 23 min )
    Image Annotation Overview 2022
    One of the most important jobs in computer vision is image annotation. Computer vision essentially aims to give machine eyes — the capacity to perceive and comprehend the world — through various applications. The post Image Annotation Overview 2022 appeared first on Data Science Central.  ( 21 min )
    Project Management Data Analytics: Benefits and Practices
    Advanced data analytics is a driving power nowadays, covering various human activities and giving businesses worthy insights. Having enough analytical data about your enterprise, employees' and customers' satisfaction, finances, and more, project managers can contribute significantly to decision-making, business growth, and overall business prosperity. The post Project Management Data Analytics: Benefits and Practices appeared first on Data Science Central.  ( 24 min )
  • Open

    A Survey on Efficient Convolutional Neural Networks and Hardware Acceleration
    submitted by /u/Harley109 [link] [comments]  ( 86 min )
    New Google AI Generates Video of Beautiful Scenery From 1 Photo | Intelligent Quantum Sensor | AI Powered X-Ray Detects Cancer
    submitted by /u/kenickh [link] [comments]  ( 94 min )
    How to use Cyclical Learning Rate to get quick convergence for your Neural Network?-[Article]
    submitted by /u/JoshuaDaD [link] [comments]  ( 87 min )
    BERT Tokenizers NuGet Package for C#
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 94 min )
    Best Neural Networks Courses on Udemy to Consider in 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 87 min )
  • Open

    Get up to Speed: Five Reasons Not to Miss NVIDIA CEO Jensen Huang’s GTC Keynote Sept. 20
    Natural language understanding, the metaverse and the 3D internet, new gaming technology, and advanced AI technologies impacting industries as varied as transportation, healthcare, finance and entertainment are all coming your way. From advances in robotics to supercomputers and hyperscale data centers, the brightest minds in science, industry and the public sector will discuss the latest Read article > The post Get up to Speed: Five Reasons Not to Miss NVIDIA CEO Jensen Huang’s GTC Keynote Sept. 20 appeared first on NVIDIA Blog.  ( 4 min )
    AI on the Stars: Hyperrealistic Avatars Propel Startup to ‘America’s Got Talent’ Finals
    More than 6 million pairs of eyes will be on real-time AI avatar technology in this week’s finale of America’s Got Talent — currently the second-most popular primetime TV show in the U.S.. Metaphysic, a member of the NVIDIA Inception global network of technology startups, is one of 11 acts competing for $1 million and Read article > The post AI on the Stars: Hyperrealistic Avatars Propel Startup to ‘America’s Got Talent’ Finals appeared first on NVIDIA Blog.  ( 6 min )
    Concept Designer Ben Mauro Delivers Epic 3D Trailer ‘Huxley’ This Week ‘In the NVIDIA Studio’
    The gripping sci-fi comic Huxley was brought to life in an action-packed 3D trailer full of excitement and intrigue this week In the NVIDIA Studio. The post Concept Designer Ben Mauro Delivers Epic 3D Trailer ‘Huxley’ This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 7 min )
  • Open

    Get better insight from reviews using Amazon Comprehend
    “85% of buyers trust online reviews as much as a personal recommendation” – Gartner Consumers are increasingly engaging with businesses through digital surfaces and multiple touchpoints. Statistics show that the majority of shoppers use reviews to determine what products to buy and which services to use. As per Spiegel Research Centre, the purchase likelihood for […]  ( 15 min )
    Prepare data at scale in Amazon SageMaker Studio using serverless AWS Glue interactive sessions
    Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). It provides a single, web-based visual interface where you can perform all ML development steps, including preparing data and building, training, and deploying models. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and […]  ( 6 min )
  • Open

    Computing for the health of the planet
    The MIT Schwarzman College of Computing welcomes four new faculty members engaged in research and teaching that address climate risks and other environmental issues.  ( 6 min )
  • Open

    The Five-Generation Workforce: How Digital Tech Can Bring Boomers and Gen Z Together
    From healthcare and manufacturing to marketing and engineering, we are still seeing nearly five different generations share the workforce… Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 10 min )
  • Open

    Multi-Armed Bandit
    Hey Guys, Can Someone please help me with this asap. It would mean a lot! https://preview.redd.it/58qwwbaktmn91.png?width=688&format=png&auto=webp&s=b79dd6e6cd69c5c97cf53b15268cf84c613855ef submitted by /u/Asleep-Ad4480 [link] [comments]  ( 88 min )
    Simulation time for Multi-agent RL problems
    Hello :D I am working on a Multi-Agent RL problem. I use PPO with the concept of centralized learning and distributed execution. However, the runtime of my model is like 48-96 hours for 50mil steps. Is that alright? I am using a parallelized implementation, the problem is just complex and each step/episode requires some function executions (other than the learning) that consume wall-time. Are such long training times common in Multi-Agent RL problems? submitted by /u/AhmedNizam_ [link] [comments]  ( 99 min )
    DQN Model giving high variance returns
    I am working on a model to personalized time to send push notification to my users using DQN. This model trained fine for the timings. Now, I am trying to increase its complexity by differentiating times for weekday from weekend- times. For this, I am adding a flag to the state so that the model can know whether it's predicting for weekday or weekend. However, the model is learning the timings for weekends but doesn't cross the 90%-95% threshold ever. Also, there is a lot of variance in the reward as compared to the weekday return. ​ I have tried changing the hyperparameters. batch_size: 256 learning_rate: 1e-3 no_episodes: 1000 episode_length: 20 epsilon: max(1- (episode_no/no_episodes), 0.05) ​ I have created a random state initially which I evaluate after each episode. I'm including the results for evaluation and prediction percentage for weekday and weekend as well. Any fresh ideas or inputs are appreciated. ​ *EDIT: The model is learning when the user responds to (clicks) a push notification. Initially the model sends a PN at different times and every time the user clicks it within a certain time period, the model accepts it as a positive return (say, +10), and a negative (-10) otherwise. My state also reflects this as the state consists of last 5 clicked times and last 5 not-clicked. e.g. State = [14, 17, 20, 14, 13, 2, 7, 21, 22, 23] Here 14, 17, 20, 14, and 13 are the clicked timings, whereas 2, 7, 21, 22, and 23 are the last not-clicked The model is able to learn this easily. But if I add 5+5 more times for weekend (separately), then the returns are too varied as the screenshot suggests. ​ https://preview.redd.it/vlhs1i3qnkn91.png?width=985&format=png&auto=webp&s=12777e1ba086944a695db43e90272502296fcd5c submitted by /u/gaurjimmy [link] [comments]  ( 90 min )
    How to improve results of DQN model when we want to predict different actions based on condition using single model.
    I am working on a model to personalized time to send push notification to my users using DQN. This model trained fine for the timings. Now, I am trying to increase its complexity by differentiating times for weekday from weekend- times. For this, I am adding a flag to the state so that the model can know whether its predicting for weekday or weekend. However, the model is learning the timings for weekends but doesn't cross the 90%-95% threshold ever. Also, there is a lot of variance in the reward as compared to the weekday return. ​ I have tried changing the hyperparameters. batch_size: 256 learning_rate: 1e-3 no_episodes: 1000 episode_length: 20 epsilon: max(1- (episode_no/no_episodes), 0.05) ​ I have created an random state initally which I evalutate after each episode. I'm including the results for evalutaion and prediction percentage for weekday and weekend as well. Any fresh ideas or inputs are appreciated. Results submitted by /u/ApurvaParikh [link] [comments]  ( 89 min )
  • Open

    AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks. (arXiv:2201.09390v3 [cs.CV] UPDATED)
    This work proposes an attention-based sequence-to-sequence model for handwritten word recognition and explores transfer learning for data-efficient training of HTR systems. To overcome training data scarcity, this work leverages models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models. ResNet feature extraction and bidirectional LSTM-based sequence modeling stages together form an encoder. The prediction stage consists of a decoder and a content-based attention mechanism. The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset Imgur5K and the IAM dataset. The experimental results evaluate the performance of the HTR framework, further supported by an in-depth analysis of the error cases. Source code and pre-trained models are available at https://github.com/dmitrijsk/AttentionHTR.  ( 2 min )
    Explaining Predictions from Machine Learning Models: Algorithms, Users, and Pedagogy. (arXiv:2209.05084v1 [cs.LG])
    Model explainability has become an important problem in machine learning (ML) due to the increased effect that algorithmic predictions have on humans. Explanations can help users understand not only why ML models make certain predictions, but also how these predictions can be changed. In this thesis, we examine the explainability of ML models from three vantage points: algorithms, users, and pedagogy, and contribute several novel solutions to the explainability problem.  ( 2 min )
    Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence Models. (arXiv:2209.04683v1 [cs.CL])
    Recent trends towards training ever-larger language models have substantially improved machine learning performance across linguistic tasks. However, the huge cost of training larger models can make tuning them prohibitively expensive, motivating the study of more efficient methods. Gradient-based hyper-parameter optimization offers the capacity to tune hyper-parameters during training, yet has not previously been studied in a sequence-to-sequence setting. We apply a simple and general gradient-based hyperparameter optimization method to sequence-to-sequence tasks for the first time, demonstrating both efficiency and performance gains over strong baselines for both Neural Machine Translation and Natural Language Understanding (NLU) tasks (via T5 pretraining). For translation, we show the method generalizes across language pairs, is more efficient than Bayesian hyper-parameter optimization, and that learned schedules for some hyper-parameters can out-perform even optimal constant-valued tuning. For T5, we show that learning hyper-parameters during pretraining can improve performance across downstream NLU tasks. When learning multiple hyper-parameters concurrently, we show that the global learning rate can follow a schedule over training that improves performance and is not explainable by the `short-horizon bias' of greedy methods \citep{wu2018}. We release the code used to facilitate further research.  ( 2 min )
    Data-driven, multi-moment fluid modeling of Landau damping. (arXiv:2209.04726v1 [physics.plasm-ph])
    Deriving governing equations of complex physical systems based on first principles can be quite challenging when there are certain unknown terms and hidden physical mechanisms in the systems. In this work, we apply a deep learning architecture to learn fluid partial differential equations (PDEs) of a plasma system based on the data acquired from a fully kinetic model. The learned multi-moment fluid PDEs are demonstrated to incorporate kinetic effects such as Landau damping. Based on the learned fluid closure, the data-driven, multi-moment fluid modeling can well reproduce all the physical quantities derived from the fully kinetic model. The calculated damping rate of Landau damping is consistent with both the fully kinetic simulation and the linear theory. The data-driven fluid modeling of PDEs for complex physical systems may be applied to improve fluid closure and reduce the computational cost of multi-scale modeling of global systems.  ( 2 min )
    Resisting Deep Learning Models Against Adversarial Attack Transferability via Feature Randomization. (arXiv:2209.04930v1 [cs.CR])
    In the past decades, the rise of artificial intelligence has given us the capabilities to solve the most challenging problems in our day-to-day lives, such as cancer prediction and autonomous navigation. However, these applications might not be reliable if not secured against adversarial attacks. In addition, recent works demonstrated that some adversarial examples are transferable across different models. Therefore, it is crucial to avoid such transferability via robust models that resist adversarial manipulations. In this paper, we propose a feature randomization-based approach that resists eight adversarial attacks targeting deep learning models in the testing phase. Our novel approach consists of changing the training strategy in the target network classifier and selecting random feature samples. We consider the attacker with a Limited-Knowledge and Semi-Knowledge conditions to undertake the most prevalent types of adversarial attacks. We evaluate the robustness of our approach using the well-known UNSW-NB15 datasets that include realistic and synthetic attacks. Afterward, we demonstrate that our strategy outperforms the existing state-of-the-art approach, such as the Most Powerful Attack, which consists of fine-tuning the network model against specific adversarial attacks. Finally, our experimental results show that our methodology can secure the target network and resists adversarial attack transferability by over 60%.  ( 2 min )
    Phantom Sponges: Exploiting Non-Maximum Suppression to Attack Deep Object Detectors. (arXiv:2205.13618v2 [cs.CV] UPDATED)
    Adversarial attacks against deep learning-based object detectors have been studied extensively in the past few years. Most of the attacks proposed have targeted the model's integrity (i.e., caused the model to make incorrect predictions), while adversarial attacks targeting the model's availability, a critical aspect in safety-critical domains such as autonomous driving, have not yet been explored by the machine learning research community. In this paper, we propose a novel attack that negatively affects the decision latency of an end-to-end object detection pipeline. We craft a universal adversarial perturbation (UAP) that targets a widely used technique integrated in many object detector pipelines -- non-maximum suppression (NMS). Our experiments demonstrate the proposed UAP's ability to increase the processing time of individual frames by adding "phantom" objects that overload the NMS algorithm while preserving the detection of the original objects (which allows the attack to go undetected for a longer period of time).  ( 2 min )
    An adaptive music generation architecture for games based on the deep learning Transformer mode. (arXiv:2207.01698v2 [cs.SD] UPDATED)
    This paper presents an architecture for generating music for video games based on the Transformer deep learning model. Our motivation is to be able to customize the generation according to the taste of the player, who can select a corpus of training examples, corresponding to his preferred musical style. The system generates various musical layers, following the standard layering strategy currently used by composers designing video game music. To adapt the music generated to the game play and to the player(s) situation, we are using an arousal-valence model of emotions, in order to control the selection of musical layers. We discuss current limitations and prospects for the future, such as collaborative and interactive control of the musical components.  ( 2 min )
    Affinity-VAE for disentanglement, clustering and classification of objects in multidimensional image data. (arXiv:2209.04517v1 [cs.CV])
    In this work we present affinity-VAE: a framework for automatic clustering and classification of objects in multidimensional image data based on their similarity. The method expands on the concept of $\beta$-VAEs with an informed similarity-based loss component driven by an affinity matrix. The affinity-VAE is able to create rotationally-invariant, morphologically homogeneous clusters in the latent representation, with improved cluster separation compared with a standard $\beta$-VAE. We explore the extent of latent disentanglement and continuity of the latent spaces on both 2D and 3D image data, including simulated biological electron cryo-tomography (cryo-ET) volumes as an example of a scientific application.  ( 2 min )
    Class-Incremental Learning with Strong Pre-trained Models. (arXiv:2204.03634v2 [cs.CV] UPDATED)
    Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes). Instead, we explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes. We hypothesize that a strong base model can provide a good representation for novel classes and incremental learning can be done with small adaptations. We propose a 2-stage training scheme, i) feature augmentation -- cloning part of the backbone and fine-tuning it on the novel data, and ii) fusion -- combining the base and novel classifiers into a unified classifier. Experiments show that the proposed method significantly outperforms state-of-the-art CIL methods on the large-scale ImageNet dataset (e.g. +10% overall accuracy than the best). We also propose and analyze understudied practical CIL scenarios, such as base-novel overlap with distribution shift. Our proposed method is robust and generalizes to all analyzed CIL settings. Code is available at https://github.com/amazon-research/sp-cil.  ( 2 min )
    On The Computational Complexity of Self-Attention. (arXiv:2209.04881v1 [cs.LG])
    Transformer architectures have led to remarkable progress in many state-of-art applications. However, despite their successes, modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. Several approaches have been proposed to speed up self-attention mechanisms to achieve sub-quadratic running time; however, the large majority of these works are not accompanied by rigorous error guarantees. In this work, we establish lower bounds on the computational complexity of self-attention in a number of scenarios. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. This argument holds even if the attention computation is performed only approximately, and for a variety of attention mechanisms. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time, at the cost of having an exponential dependence on the polynomial order.  ( 2 min )
    Data-Driven Blind Synchronization and Interference Rejection for Digital Communication Signals. (arXiv:2209.04871v1 [eess.SP])
    We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separation problem is also referred to as interference rejection. We show that capturing high-resolution temporal structures (nonstationarities), which enables accurate synchronization to both the SOI and the interference, leads to substantial performance gains. With this key insight, we propose a domain-informed neural network (NN) design that is able to improve upon both "off-the-shelf" NNs and classical detection and interference rejection methods, as demonstrated in our simulations. Our findings highlight the key role communication-specific domain knowledge plays in the development of data-driven approaches that hold the promise of unprecedented gains.  ( 2 min )
    Explaining Results of Multi-Criteria Decision Making. (arXiv:2209.04582v1 [cs.AI])
    We introduce a method for explaining the results of various linear and hierarchical multi-criteria decision-making (MCDM) techniques such as WSM and AHP. The two key ideas are (A) to maintain a fine-grained representation of the values manipulated by these techniques and (B) to derive explanations from these representations through merging, filtering, and aggregating operations. An explanation in our model presents a high-level comparison of two alternatives in an MCDM problem, presumably an optimal and a non-optimal one, illuminating why one alternative was preferred over the other one. We show the usefulness of our techniques by generating explanations for two well-known examples from the MCDM literature. Finally, we show their efficacy by performing computational experiments.  ( 2 min )
    Instruction-driven history-aware policies for robotic manipulations. (arXiv:2209.04899v1 [cs.RO])
    In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions. Yet, robotic manipulation is extremely challenging as it requires fine-grained motor control, long-term memory as well as generalization to previously unseen tasks and environments. To address these challenges, we propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations while (iii) keeping track of the full history of observations and actions. Such an approach enables learning dependencies between history and instructions and improves manipulation precision using multiple views. We evaluate our method on the challenging RLBench benchmark and on a real-world robot. Notably, our approach scales to 74 diverse RLBench tasks and outperforms the state of the art. We also address instruction-conditioned tasks and demonstrate excellent generalization to previously unseen variations.  ( 2 min )
    A Thermal Machine Learning Solver For Chip Simulation. (arXiv:2209.04741v1 [cs.LG])
    Thermal analysis provides deeper insights into electronic chips behavior under different temperature scenarios and enables faster design exploration. However, obtaining detailed and accurate thermal profile on chip is very time-consuming using FEM or CFD. Therefore, there is an urgent need for speeding up the on-chip thermal solution to address various system scenarios. In this paper, we propose a thermal machine-learning (ML) solver to speed-up thermal simulations of chips. The thermal ML-Solver is an extension of the recent novel approach, CoAEMLSim (Composable Autoencoder Machine Learning Simulator) with modifications to the solution algorithm to handle constant and distributed HTC. The proposed method is validated against commercial solvers, such as Ansys MAPDL, as well as a latest ML baseline, UNet, under different scenarios to demonstrate its enhanced accuracy, scalability, and generalizability.  ( 2 min )
    Learning Consumer Preferences from Bundle Sales Data. (arXiv:2209.04942v1 [stat.ML])
    Product bundling is a common selling mechanism used in online retailing. To set profitable bundle prices, the seller needs to learn consumer preferences from the transaction data. When customers purchase bundles or multiple products, classical methods such as discrete choice models cannot be used to estimate customers' valuations. In this paper, we propose an approach to learn the distribution of consumers' valuations toward the products using bundle sales data. The approach reduces it to an estimation problem where the samples are censored by polyhedral regions. Using the EM algorithm and Monte Carlo simulation, our approach can recover the distribution of consumers' valuations. The framework allows for unobserved no-purchases and clustered market segments. We provide theoretical results on the identifiability of the probability model and the convergence of the EM algorithm. The performance of the approach is also demonstrated numerically.  ( 2 min )
    Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes. (arXiv:2202.00598v2 [cs.LG] UPDATED)
    Background: Embedded feature selection in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. For this hyperparameter optimization, nested cross-validation must be applied to avoid a biased performance estimation. The resulting repeated training with high-dimensional data leads to very long computation times. Moreover, it is likely to observe a high variance in the individual performance evaluation metrics caused by outliers in tiny validation sets. Therefore, early stopping applying standard pruning algorithms to save time risks discarding promising hyperparameter sets. Result: To speed up feature selection for high-dimensional data with tiny sample size, we adapt the use of a state-of-the-art asynchronous successive halving pruner. In addition, we combine it with two complementary pruning strategies based on domain or prior knowledge. One pruning strategy immediately stops computing trials with semantically meaningless results for the selected hyperparameter combinations. The other is a new extrapolating threshold pruning strategy suitable for nested-cross-validation with a high variance of performance evaluation metrics. In repeated experiments, our combined pruning strategy keeps all promising trials. At the same time, the calculation time is substantially reduced compared to using a state-of-the-art asynchronous successive halving pruner alone. Up to 81.3\% fewer models were trained achieving the same optimization result. Conclusion: The proposed combined pruning strategy accelerates data analysis or enables deeper searches for hyperparameters within the same computation time. This leads to significant savings in time, money and energy consumption, opening the door to advanced, time-consuming analyses.  ( 3 min )
    Extended Feature Space-Based Automatic Melanoma Detection System. (arXiv:2209.04588v1 [cs.LG])
    Melanoma is the deadliest form of skin cancer. Uncontrollable growth of melanocytes leads to melanoma. Melanoma has been growing wildly in the last few decades. In recent years, the detection of melanoma using image processing techniques has become a dominant research field. The Automatic Melanoma Detection System (AMDS) helps to detect melanoma based on image processing techniques by accepting infected skin area images as input. A single lesion image is a source of multiple features. Therefore, It is crucial to select the appropriate features from the image of the lesion in order to increase the accuracy of AMDS. For melanoma detection, all extracted features are not important. Some of the extracted features are complex and require more computation tasks, which impacts the classification accuracy of AMDS. The feature extraction phase of AMDS exhibits more variability, therefore it is important to study the behaviour of AMDS using individual and extended feature extraction approaches. A novel algorithm ExtFvAMDS is proposed for the calculation of Extended Feature Vector Space. The six models proposed in the comparative study revealed that the HSV feature vector space for automatic detection of melanoma using Ensemble Bagged Tree classifier on Med-Node Dataset provided 99% AUC, 95.30% accuracy, 94.23% sensitivity, and 96.96% specificity.  ( 2 min )
    Explainable Image Quality Assessments in Teledermatological Photography. (arXiv:2209.04699v1 [cs.CV])
    Image quality is a crucial factor in the success of teledermatological consultations. However, up to 50% of images sent by patients have quality issues, thus increasing the time to diagnosis and treatment. An automated, easily deployable, explainable method for assessing image quality is necessary to improve the current teledermatological consultation flow. We introduce ImageQX, a convolutional neural network trained for image quality assessment with a learning mechanism for identifying the most common poor image quality explanations: bad framing, bad lighting, blur, low resolution, and distance issues. ImageQX was trained on 26635 photographs and validated on 9874 photographs, each annotated with image quality labels and poor image quality explanations by up to 12 board-certified dermatologists. The photographic images were taken between 2017-2019 using a mobile skin disease tracking application accessible worldwide. Our method achieves expert-level performance for both image quality assessment and poor image quality explanation. For image quality assessment, ImageQX obtains a macro F1-score of 0.73 which places it within standard deviation of the pairwise inter-rater F1-score of 0.77. For poor image quality explanations, our method obtains F1-scores of between 0.37 and 0.70, similar to the inter-rater pairwise F1-score of between 0.24 and 0.83. Moreover, with a size of only 15 MB, ImageQX is easily deployable on mobile devices. With an image quality detection performance similar to that of dermatologists, incorporating ImageQX into the teledermatology flow can reduce the image evaluation burden on dermatologists, while at the same time reducing the time to diagnosis and treatment for patients. We introduce ImageQX, a first of its kind explainable image quality assessor which leverages domain expertise to improve the quality and efficiency of dermatological care in a virtual setting.  ( 3 min )
    Batch Bayesian Optimization via Particle Gradient Flows. (arXiv:2209.04722v1 [stat.ML])
    Bayesian Optimisation (BO) methods seek to find global optima of objective functions which are only available as a black-box or are expensive to evaluate. Such methods construct a surrogate model for the objective function, quantifying the uncertainty in that surrogate through Bayesian inference. Objective evaluations are sequentially determined by maximising an acquisition function at each step. However, this ancilliary optimisation problem can be highly non-trivial to solve, due to the non-convexity of the acquisition function, particularly in the case of batch Bayesian optimisation, where multiple points are selected in every step. In this work we reformulate batch BO as an optimisation problem over the space of probability measures. We construct a new acquisition function based on multipoint expected improvement which is convex over the space of probability measures. Practical schemes for solving this `inner' optimisation problem arise naturally as gradient flows of this objective function. We demonstrate the efficacy of this new method on different benchmark functions and compare with state-of-the-art batch BO methods.  ( 2 min )
    Towards Sparsification of Graph Neural Networks. (arXiv:2209.04766v1 [cs.LG])
    As real-world graphs expand in size, larger GNN models with billions of parameters are deployed. High parameter count in such models makes training and inference on graphs expensive and challenging. To reduce the computational and memory costs of GNNs, optimization methods such as pruning the redundant nodes and edges in input graphs have been commonly adopted. However, model compression, which directly targets the sparsification of model layers, has been mostly limited to traditional Deep Neural Networks (DNNs) used for tasks such as image classification and object detection. In this paper, we utilize two state-of-the-art model compression methods (1) train and prune and (2) sparse training for the sparsification of weight layers in GNNs. We evaluate and compare the efficiency of both methods in terms of accuracy, training sparsity, and training FLOPs on real-world graphs. Our experimental results show that on the ia-email, wiki-talk, and stackoverflow datasets for link prediction, sparse training with much lower training FLOPs achieves a comparable accuracy with the train and prune method. On the brain dataset for node classification, sparse training uses a lower number FLOPs (less than 1/7 FLOPs of train and prune method) and preserves a much better accuracy performance under extreme model sparsity.  ( 2 min )
    Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning. (arXiv:2209.04854v1 [eess.SY])
    Choosing an appropriate parameter set for the designed controller is critical for the final performance but usually requires a tedious and careful tuning process, which implies a strong need for automatic tuning methods. However, among existing methods, derivative-free ones suffer from poor scalability or low efficiency, while gradient-based ones are often unavailable due to possibly non-differentiable controller structure. To resolve the issues, we tackle the controller tuning problem using a novel derivative-free reinforcement learning (RL) framework, which performs timestep-wise perturbation in parameter space during experience collection and integrates derivative-free policy updates into the advanced actor-critic RL architecture to achieve high versatility and efficiency. To demonstrate the framework's efficacy, we conduct numerical experiments on two concrete examples from autonomous driving, namely, adaptive cruise control with PID controller and trajectory tracking with MPC controller. Experimental results show that the proposed method outperforms popular baselines and highlight its strong potential for controller tuning.  ( 2 min )
    People detection and social distancing classification in smart cities for COVID-19 by using thermal images and deep learning algorithms. (arXiv:2209.04704v1 [cs.CV])
    COVID-19 is a disease caused by severe respiratory syndrome coronavirus. It was identified in December 2019 in Wuhan, China. It has resulted in an ongoing pandemic that caused infected cases including some deaths. Coronavirus is primarily spread between people during close contact. Motivating to this notion, this research proposes an artificial intelligence system for social distancing classification of persons by using thermal images. By exploiting YOLOv2 (you look at once), a deep learning detection technique is developed for detecting and tracking people in indoor and outdoor scenarios. An algorithm is also implemented for measuring and classifying the distance between persons and automatically check if social distancing rules are respected or not. Hence, this work aims at minimizing the spread of the COVID-19 virus by evaluating if and how persons comply with social distancing rules. The proposed approach is applied to images acquired through thermal cameras, to establish a complete AI system for people tracking, social distancing classification, and body temperature monitoring. The training phase is done with two datasets captured from different thermal cameras. Ground Truth Labeler app is used for labeling the persons in the images. The achieved results show that the proposed method is suitable for the creation of a smart surveillance system in smart cities for people detection, social distancing classification, and body temperature analysis.  ( 3 min )
    Delta Hedging Liquidity Positions on Automated Market Makers. (arXiv:2208.03318v2 [cs.CE] UPDATED)
    Liquidity Providers on Automated Market Makers generate millions of USD in transaction fees daily. However, the net value of a Liquidity Position is vulnerable to price changes in the underlying assets in the pool. The dominant measure of loss in a Liquidity Position is Impermanent Loss. Impermanent Loss for Constant Function Market Makers has been widely studied. We propose a new metric to measure Liquidity Position PNL based on price movement from the underlying assets. We show how this new metric more appropriately measures the change in the net value of a Liquidity Position as a function of price movement in the underlying assets. Our second contribution is an algorithm to delta hedge arbitrary Liquidity Positions on both uniform liquidity Automated Market Makers (such as Uniswap v2) and concentrated liquidity Automated Market Makers (such as Uniswap v3) via a combination of derivatives.
    Variational Autoencoder Kernel Interpretation and Selection for Classification. (arXiv:2209.04715v1 [cs.LG])
    This work proposed kernel selection approaches for probabilistic classifiers based on features produced by the convolutional encoder of a variational autoencoder. Particularly, the developed methodologies allow the selection of the most relevant subset of latent variables. In the proposed implementation, each latent variable was sampled from the distribution associated with a single kernel of the last encoder's convolution layer, as an individual distribution was created for each kernel. Therefore, choosing relevant features on the sampled latent variables makes it possible to perform kernel selection, filtering the uninformative features and kernels. Such leads to a reduction in the number of the model's parameters. Both wrapper and filter methods were evaluated for feature selection. The second was of particular relevance as it is based only on the distributions of the kernels. It was assessed by measuring the Kullback-Leibler divergence between all distributions, hypothesizing that the kernels whose distributions are more similar can be discarded. This hypothesis was confirmed since it was observed that the most similar kernels do not convey relevant information and can be removed. As a result, the proposed methodology is suitable for developing applications for resource-constrained devices.  ( 2 min )
    Fast Regression of the Tritium Breeding Ratio in Fusion Reactors. (arXiv:2104.04026v2 [physics.comp-ph] UPDATED)
    The tritium breeding ratio (TBR) is an essential quantity for the design of modern and next-generation D-T fueled nuclear fusion reactors. Representing the ratio between tritium fuel generated in breeding blankets and fuel consumed during reactor runtime, the TBR depends on reactor geometry and material properties in a complex manner. In this work, we explored the training of surrogate models to produce a cheap but high-quality approximation for a Monte Carlo TBR model in use at the UK Atomic Energy Authority. We investigated possibilities for dimensional reduction of its feature space, reviewed 9 families of surrogate models for potential applicability, and performed hyperparameter optimisation. Here we present the performance and scaling properties of these models, the fastest of which, an artificial neural network, demonstrated $R^2=0.985$ and a mean prediction time of $0.898\ \mu\mathrm{s}$, representing a relative speedup of $8\cdot 10^6$ with respect to the expensive MC model. We further present a novel adaptive sampling algorithm, Quality-Adaptive Surrogate Sampling, capable of interfacing with any of the individually studied surrogates. Our preliminary testing on a toy TBR theory has demonstrated the efficacy of this algorithm for accelerating the surrogate modelling process.
    Automatic Tuberculosis and COVID-19 cough classification using deep learning. (arXiv:2205.05480v2 [cs.LG] UPDATED)
    We present a deep learning based automatic cough classifier which can discriminate tuberculosis (TB) coughs from COVID-19 coughs and healthy coughs. Both TB and COVID-19 are respiratory diseases, contagious, have cough as a predominant symptom and claim thousands of lives each year. The cough audio recordings were collected at both indoor and outdoor settings and also uploaded using smartphones from subjects around the globe, thus containing various levels of noise. This cough data include 1.68 hours of TB coughs, 18.54 minutes of COVID-19 coughs and 1.69 hours of healthy coughs from 47 TB patients, 229 COVID-19 patients and 1498 healthy patients and were used to train and evaluate a CNN, LSTM and Resnet50. These three deep architectures were also pre-trained on 2.14 hours of sneeze, 2.91 hours of speech and 2.79 hours of noise for improved performance. The class-imbalance in our dataset was addressed by using SMOTE data balancing technique and using performance metrics such as F1-score and AUC. Our study shows that the highest F1-scores of 0.9259 and 0.8631 have been achieved from a pre-trained Resnet50 for two-class (TB vs COVID-19) and three-class (TB vs COVID-19 vs healthy) cough classification tasks, respectively. The application of deep transfer learning has improved the classifiers' performance and makes them more robust as they generalise better over the cross-validation folds. Their performances exceed the TB triage test requirements set by the world health organisation (WHO). The features producing the best performance contain higher order of MFCCs suggesting that the differences between TB and COVID-19 coughs are not perceivable by the human ear. This type of cough audio classification is non-contact, cost-effective and can easily be deployed on a smartphone, thus it can be an excellent tool for both TB and COVID-19 screening.
    Do Neural Networks Compress Manifolds Optimally?. (arXiv:2205.08518v2 [cs.IT] UPDATED)
    Artificial Neural-Network-based (ANN-based) lossy compressors have recently obtained striking results on several sources. Their success may be ascribed to an ability to identify the structure of low-dimensional manifolds in high-dimensional ambient spaces. Indeed, prior work has shown that ANN-based compressors can achieve the optimal entropy-distortion curve for some such sources. In contrast, we determine the optimal entropy-distortion tradeoffs for two low-dimensional manifolds with circular structure and show that state-of-the-art ANN-based compressors fail to optimally compress them.
    ScaleFace: Uncertainty-aware Deep Metric Learning. (arXiv:2209.01880v2 [cs.CV] UPDATED)
    The performance of modern deep learning-based systems dramatically depends on the quality of input objects. For example, face recognition quality would be lower for blurry or corrupted inputs. However, it is hard to predict the influence of input quality on the resulting accuracy in more complex scenarios. We propose an approach for deep metric learning that allows direct estimation of the uncertainty with almost no additional computational cost. The developed \textit{ScaleFace} algorithm uses trainable scale values that modify similarities in the space of embeddings. These input-dependent scale values represent a measure of confidence in the recognition result, thus allowing uncertainty estimation. We provide comprehensive experiments on face recognition tasks that show the superior performance of ScaleFace compared to other uncertainty-aware face recognition approaches. We also extend the results to the task of text-to-image retrieval showing that the proposed approach beats the competitors with significant margin.
    On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator. (arXiv:2209.05042v1 [cs.LG])
    The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. At the core of our results is the uniqueness of the stationary point of dLQR when it is observable, which is in a concise form of an observer-based controller with the optimal similarity transformation. These results shed light on designing efficient algorithms for general decision-making problems with partially observed information.
    Examining Uniqueness and Permanence of the WAY EEG GAL dataset toward User Authentication. (arXiv:2209.04802v1 [cs.LG])
    This study evaluates the discriminating capacity (uniqueness) of the EEG data from the WAY EEG GAL public dataset to authenticate individuals against one another as well as its permanence. In addition to the EEG data, Luciw et al. provide EMG (Electromyography), and kinematics data for engineers and researchers to utilize WAY EEG GAL for further studies. However, evaluating the EMG and kinematics data is outside the scope of this study. The goal of the state-of-the-art is to determine whether EEG data can be utilized to control prosthetic devices. On the other hand, this study aims to evaluate the separability of individuals through EEG data to perform user authentication. A feature importance algorithm is utilized to select the best features for each user to authenticate them against all others. The authentication platform implemented for this study is based on Machine Learning models/classifiers. As an initial test, two pilot studies are performed using Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) to observe the learning trends of the models by multi-labeling the EEG dataset. Utilizing kNN first as the classifier for user authentication, accuracy around 75% is observed. Thereafter to improve the performance both linear and non-linear SVMs are used to perform classification. The overall average accuracies of 85.18% and 86.92% are achieved using linear and non-linear SVMs respectively. In addition to accuracy, F1 scores are also calculated. The overall average F1 score of 87.51% and 88.94% are achieved for linear and non-linear SVMs respectively. Beyond the overall performance, high performing individuals with 95.3% accuracy (95.3% F1 score) using linear SVM and 97.4% accuracy (97.3% F1 score) using non-linear SVM are also observed.
    Efficiency Evaluation of Banks with Many Branches using a Heuristic Framework and Dynamic Data Envelopment Optimization Approach: A Real Case Study. (arXiv:2209.04822v1 [math.OC])
    Evaluating the efficiency of organizations and branches within an organization is a challenging issue for managers. Evaluation criteria allow organizations to rank their internal units, identify their position concerning their competitors, and implement strategies for improvement and development purposes. Among the methods that have been applied in the evaluation of bank branches, non-parametric methods have captured the attention of researchers in recent years. One of the most widely used non-parametric methods is the data envelopment analysis (DEA) which leads to promising results. However, the static DEA approaches do not consider the time in the model. Therefore, this paper uses a dynamic DEA (DDEA) method to evaluate the branches of a private Iranian bank over three years (2017-2019). The results are then compared with static DEA. After ranking the branches, they are clustered using the K-means method. Finally, a comprehensive sensitivity analysis approach is introduced to help the managers to decide about changing variables to shift a branch from one cluster to a more efficient one.
    Hyperbolic Self-supervised Contrastive Learning Based Network Anomaly Detection. (arXiv:2209.05049v1 [cs.SI])
    Anomaly detection on the attributed network has recently received increasing attention in many research fields, such as cybernetic anomaly detection and financial fraud detection. With the wide application of deep learning on graph representations, existing approaches choose to apply euclidean graph encoders as their backbone, which may lose important hierarchical information, especially in complex networks. To tackle this problem, we propose an efficient anomaly detection framework using hyperbolic self-supervised contrastive learning. Specifically, we first conduct the data augmentation by performing subgraph sampling. Then we utilize the hierarchical information in hyperbolic space through exponential mapping and logarithmic mapping and obtain the anomaly score by subtracting scores of the positive pairs from the negative pairs via a discriminating process. Finally, extensive experiments on four real-world datasets demonstrate that our approach performs superior over representative baseline approaches.
    AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models. (arXiv:2206.11719v2 [cs.CL] UPDATED)
    The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the linguistic properties of hidden vector spaces, previous works have shown that these pre-trained language models encode simple linguistic properties in their hidden representations. However, none of the previous work assessed whether these models encode the whole grammatical structure of a programming language. In this paper, we prove the existence of a syntactic subspace, lying in the hidden representations of pre-trained language models, which contain the syntactic information of the programming language. We show that this subspace can be extracted from the models' representations and define a novel probing method, the AST-Probe, that enables recovering the whole abstract syntax tree (AST) of an input code snippet. In our experimentations, we show that this syntactic subspace exists in five state-of-the-art pre-trained language models. In addition, we highlight that the middle layers of the models are the ones that encode most of the AST information. Finally, we estimate the optimal size of this syntactic subspace and show that its dimension is substantially lower than those of the models' representation spaces. This suggests that pre-trained language models use a small part of their representation spaces to encode syntactic information of the programming languages.
    A Comparative Study on Unsupervised Anomaly Detection for Time Series: Experiments and Analysis. (arXiv:2209.04635v1 [cs.LG])
    The continued digitization of societal processes translates into a proliferation of time series data that cover applications such as fraud detection, intrusion detection, and energy management, where anomaly detection is often essential to enable reliability and safety. Many recent studies target anomaly detection for time series data. Indeed, area of time series anomaly detection is characterized by diverse data, methods, and evaluation strategies, and comparisons in existing studies consider only part of this diversity, which makes it difficult to select the best method for a particular problem setting. To address this shortcoming, we introduce taxonomies for data, methods, and evaluation strategies, provide a comprehensive overview of unsupervised time series anomaly detection using the taxonomies, and systematically evaluate and compare state-of-the-art traditional as well as deep learning techniques. In the empirical study using nine publicly available datasets, we apply the most commonly-used performance evaluation metrics to typical methods under a fair implementation standard. Based on the structuring offered by the taxonomies, we report on empirical studies and provide guidelines, in the form of comparative tables, for choosing the methods most suitable for particular application settings. Finally, we propose research directions for this dynamic field.
    Improving Model Training via Self-learned Label Representations. (arXiv:2209.04528v1 [cs.LG])
    Modern neural network architectures have shown remarkable success in several large-scale classification and prediction tasks. Part of the success of these architectures is their flexibility to transform the data from the raw input representations (e.g. pixels for vision tasks, or text for natural language processing tasks) to one-hot output encoding. While much of the work has focused on studying how the input gets transformed to the one-hot encoding, very little work has examined the effectiveness of these one-hot labels. In this work, we demonstrate that more sophisticated label representations are better for classification than the usual one-hot encoding. We propose Learning with Adaptive Labels (LwAL) algorithm, which simultaneously learns the label representation while training for the classification task. These learned labels can significantly cut down on the training time (usually by more than 50%) while often achieving better test accuracies. Our algorithm introduces negligible additional parameters and has a minimal computational overhead. Along with improved training times, our learned labels are semantically meaningful and can reveal hierarchical relationships that may be present in the data.
    Git Re-Basin: Merging Models modulo Permutation Symmetries. (arXiv:2209.04836v1 [cs.LG])
    The success of deep learning is thanks to our ability to solve certain massive non-convex optimization problems with relative ease. Despite non-convex optimization being NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes contain (nearly) a single basin, after accounting for all possible permutation symmetries of hidden units. We introduce three algorithms to permute the units of one model to bring them into alignment with units of a reference model. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10 and CIFAR-100. Additionally, we identify intriguing phenomena relating model width and training time to mode connectivity across a variety of models and datasets. Finally, we discuss shortcomings of a single basin theory, including a counterexample to the linear mode connectivity hypothesis.
    Kernel Learning for Explainable Climate Science. (arXiv:2209.04947v1 [cs.LG])
    The Upper Indus Basin, Himalayas provides water for 270 million people and countless ecosystems. However, precipitation, a key component to hydrological modelling, is poorly understood in this area. A key challenge surrounding this uncertainty comes from the complex spatial-temporal distribution of precipitation across the basin. In this work we propose Gaussian processes with structured non-stationary kernels to model precipitation patterns in the UIB. Previous attempts to quantify or model precipitation in the Hindu Kush Karakoram Himalayan region have often been qualitative or include crude assumptions and simplifications which cannot be resolved at lower resolutions. This body of research also provides little to no error propagation. We account for the spatial variation in precipitation with a non-stationary Gibbs kernel parameterised with an input dependent lengthscale. This allows the posterior function samples to adapt to the varying precipitation patterns inherent in the distinct underlying topography of the Indus region. The input dependent lengthscale is governed by a latent Gaussian process with a stationary squared-exponential kernel to allow the function level hyperparameters to vary smoothly. In ablation experiments we motivate each component of the proposed kernel by demonstrating its ability to model the spatial covariance, temporal structure and joint spatio-temporal reconstruction. We benchmark our model with a stationary Gaussian process and a Deep Gaussian processes.
    Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start. (arXiv:2202.03397v2 [stat.ML] UPDATED)
    We analyze a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, equilibrium models, hyperparameter optimization and data poisoning adversarial attacks. Several recent works have proposed algorithms which warm-start the lower level problem, i.e. they use the previous lower-level approximate solution as a staring point for the lower-level solver. This warm-start procedure allows one to improve the sample complexity in both the stochastic and deterministic settings, achieving in some cases the order-wise optimal sample complexity. However, there are situations, e.g., meta learning and equilibrium models, in which the warm-start procedure is not well-suited or ineffective. In this work we show that without warm-start, it is still possible to achieve order-wise optimal or near-optimal sample complexity. In particular, we propose a simple method which uses stochastic fixed point iterations at the lower-level and projected inexact gradient descent at the upper-level, that reaches an $\epsilon$-stationary point using $O(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-1})$ samples for the stochastic and the deterministic setting, respectively. Finally, compared to methods using warm-start, our approach yields a simpler analysis that does not need to study the coupled interactions between the upper-level and lower-level iterates
    What Do Deep Neural Networks Find in Disordered Structures of Glasses?. (arXiv:2208.00349v2 [cond-mat.dis-nn] UPDATED)
    Glass transitions are widely observed in various types of soft matter systems. However, the physical mechanism of these transitions remains {elusive}, despite years of ambitious research. In particular, an important unanswered question is whether the glass transition is accompanied by a divergence of the correlation lengths of the characteristic static structures. In this study, we develop a deep-neural-network-based method that is used to extract the characteristic local meso-structures solely from instantaneous {particle} configurations without any {information} about the dynamics. We first train a neural network to classify configurations of liquids and glasses correctly. Then, we obtain the characteristic structures by quantifying the grounds for the decisions made by the network using Gradient-weighted Class Activation Mapping (Grad-CAM). We considered two qualitatively different glass-forming binary systems, and through comparisons with several established structural indicators, we demonstrate that our system can be used to identify characteristic structures that depend on the details of the systems. Moreover, the extracted structures are remarkably correlated with the nonequilibrium aging dynamics in thermal fluctuations.
    Hybrid Supervised and Reinforcement Learning for the Design and Optimization of Nanophotonic Structures. (arXiv:2209.04447v1 [cs.LG])
    From higher computational efficiency to enabling the discovery of novel and complex structures, deep learning has emerged as a powerful framework for the design and optimization of nanophotonic circuits and components. However, both data-driven and exploration-based machine learning strategies have limitations in their effectiveness for nanophotonic inverse design. Supervised machine learning approaches require large quantities of training data to produce high-performance models and have difficulty generalizing beyond training data given the complexity of the design space. Unsupervised and reinforcement learning-based approaches on the other hand can have very lengthy training or optimization times associated with them. Here we demonstrate a hybrid supervised learning and reinforcement learning approach to the inverse design of nanophotonic structures and show this approach can reduce training data dependence, improve the generalizability of model predictions, and shorten exploratory training times by orders of magnitude. The presented strategy thus addresses a number of contemporary deep learning-based challenges, while opening the door for new design methodologies that leverage multiple classes of machine learning algorithms to produce more effective and practical solutions for photonic design.
    Decision Tree-Based Predictive Models for Academic Achievement Using College Students' Support Networks. (arXiv:2108.13947v2 [stat.ML] UPDATED)
    In this study, we examine a set of primary data collected from 484 students enrolled in a large public university in the Mid-Atlantic United States region during the early stages of the COVID-19 pandemic. The data, called Ties data, included students' demographic and support network information. The support network data comprised of information that highlighted the type of support, (i.e. emotional or educational; routine or intense). Using this data set, models for predicting students' academic achievement, quantified by their self-reported GPA, were created using Chi-Square Automatic Interaction Detection (CHAID), a decision tree algorithm, and cforest, a random forest algorithm that uses conditional inference trees. We compare the methods' accuracy and variation in the set of important variables suggested by each algorithm. Each algorithm found different variables important for different student demographics with some overlap. For White students, different types of educational support were important in predicting academic achievement, while for non-White students, different types of emotional support were important in predicting academic achievement. The presence of differing types of routine support were important in predicting academic achievement for cisgender women, while differing types of intense support were important in predicting academic achievement for cisgender men.
    An Extensive Data Processing Pipeline for MIMIC-IV. (arXiv:2204.13841v4 [cs.LG] UPDATED)
    An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical purposes. This growing area of research has exposed the challenges of the accessibility of EHRs. MIMIC is a popular, public, and free EHR dataset in a raw format that has been used in numerous studies. The absence of standardized pre-processing steps can be, however, a significant barrier to the wider adoption of this rare resource. Additionally, this absence can reduce the reproducibility of the developed tools and limit the ability to compare the results among similar studies. In this work, we provide a greatly customizable pipeline to extract, clean, and pre-process the data available in the fourth version of the MIMIC dataset (MIMIC-IV). The pipeline also presents an end-to-end wizard-like package supporting predictive model creations and evaluations. The pipeline covers a range of clinical prediction tasks which can be broadly classified into four categories - readmission, length of stay, mortality, and phenotype prediction. The tool is publicly available at https://github.com/healthylaife/MIMIC-IV-Data-Pipeline.
    A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. (arXiv:2201.13409v2 [stat.ML] UPDATED)
    Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.
    Set-based value operators for non-stationary Markovian environments. (arXiv:2207.07271v2 [cs.LG] UPDATED)
    This paper analyzes finite state Markov Decision Processes (MDPs) with uncertain parameters in compact sets and re-examines results from robust MDP via set-based fixed point theory. We generalize the Bellman and policy evaluation operators to operators that contract on the space of value functions and denote them as \emph{value operators}. We generalize these value operators to act on the space of value function sets and denote them as \emph{set-based value operators}. We prove that these set-based value operators are contractions in the space of compact value function sets. Leveraging insights from set theory, we generalize the rectangularity condition for the Bellman operator from classic robust MDP literature to a \emph{containment condition} for a generic value operator, which is weaker and can be applied to a larger set of parameter-uncertain MDPs and contractive operators in dynamic programming and reinforcement learning. We prove that both the rectangularity condition and the containment condition sufficiently ensure that the set-based value operator's fixed point set contains its own supremum and infimum elements. For convex and compact sets of uncertain MDP parameters, we show equivalence between the classic robust value function and the supremum of the fixed point set of the set-based Bellman operator. Under dynamically changing MDP parameters in compact sets, we prove a set convergence result for value iteration, which otherwise may not converge to a single value function.
    TransPolymer: a Transformer-based Language Model for Polymer Property Predictions. (arXiv:2209.01307v2 [cs.LG] UPDATED)
    Accurate and efficient prediction of polymer properties is of great significance in polymer development and design. Conventionally, expensive and time-consuming experiments or simulations are required to evaluate the function of polymers. Recently, Transformer models, equipped with attention mechanisms, have exhibited superior performance in various natural language processing tasks. However, such methods have not been investigated in polymer sciences. Herein, we report TransPolymer, a Transformer-based language model for polymer property prediction. Owing to our proposed polymer tokenizer with chemical awareness, TransPolymer can learn representations directly from polymer sequences. The model learns expressive representations by pretraining on a large unlabeled dataset, followed by finetuning the model on downstream datasets concerning various polymer properties. TransPolymer achieves superior performance in all eight datasets and surpasses other baselines significantly on most downstream tasks. Moreover, the improvement by the pretrained TransPolymer over supervised TransPolymer and other language models strengthens the significant benefits of pretraining on large unlabeled data in representation learning. Experiment results further demonstrate the important role of the attention mechanism in understanding polymer sequences. We highlight this model as a promising computational tool for promoting rational polymer design and understanding structure-property relationships in a data science view.
    Robust Geometric Metric Learning. (arXiv:2202.11550v2 [stat.ML] UPDATED)
    This paper proposes new algorithms for the metric learning problem. We start by noticing that several classical metric learning formulations from the literature can be viewed as modified covariance matrix estimation problems. Leveraging this point of view, a general approach, called Robust Geometric Metric Learning (RGML), is then studied. This method aims at simultaneously estimating the covariance matrix of each class while shrinking them towards their (unknown) barycenter. We focus on two specific costs functions: one associated with the Gaussian likelihood (RGML Gaussian), and one with Tyler's M -estimator (RGML Tyler). In both, the barycenter is defined with the Riemannian distance, which enjoys nice properties of geodesic convexity and affine invariance. The optimization is performed using the Riemannian geometry of symmetric positive definite matrices and its submanifold of unit determinant. Finally, the performance of RGML is asserted on real datasets. Strong performance is exhibited while being robust to mislabeled data.
    Federated Unlearning: How to Efficiently Erase a Client in FL?. (arXiv:2207.05521v2 [cs.LG] UPDATED)
    With privacy legislation empowering users with the right to be forgotten, it has become essential to make a model forget about some of its training data. We explore the problem of removing any client's contribution in federated learning (FL). During FL rounds, each client performs local training to learn a model that minimizes the empirical loss on their private data. We propose to perform unlearning at the client (to be erased) by reversing the learning process, i.e., training a model to \emph{maximize} the local empirical loss. In particular, we formulate the unlearning problem as a constrained maximization problem by restricting to an $\ell_2$-norm ball around a suitably chosen reference model to help retain some knowledge learnt from the other clients' data. This allows the client to use projected gradient descent to perform unlearning. The method does neither require global access to the data used for training nor the history of the parameter updates to be stored by the aggregator (server) or any of the clients. Experiments on the MNIST dataset show that the proposed unlearning method is efficient and effective.
    Towards Better Evaluation for Dynamic Link Prediction. (arXiv:2207.10128v2 [cs.LG] UPDATED)
    Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In this work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings because easy negative edges are often used in the current evaluation setting. To evaluate against more difficult negative edges, we introduce two more challenging negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research. Our code repository is accessible at https://github.com/fpour/DGB.git.
    Applications of Multi-Agent Reinforcement Learning in Future Internet: A Comprehensive Survey. (arXiv:2110.13484v3 [cs.AI] UPDATED)
    Future Internet involves several emerging technologies such as 5G and beyond 5G networks, vehicular networks, unmanned aerial vehicle (UAV) networks, and Internet of Things (IoTs). Moreover, future Internet becomes heterogeneous and decentralized with a large number of involved network entities. Each entity may need to make its local decision to improve the network performance under dynamic and uncertain network environments. Standard learning algorithms such as single-agent Reinforcement Learning (RL) or Deep Reinforcement Learning (DRL) have been recently used to enable each network entity as an agent to learn an optimal decision-making policy adaptively through interacting with the unknown environments. However, such an algorithm fails to model the cooperations or competitions among network entities, and simply treats other entities as a part of the environment that may result in the non-stationarity issue. Multi-agent Reinforcement Learning (MARL) allows each network entity to learn its optimal policy by observing not only the environments, but also other entities' policies. As a result, MARL can significantly improve the learning efficiency of the network entities, and it has been recently used to solve various issues in the emerging networks. In this paper, we thus review the applications of MARL in the emerging networks. In particular, we provide a tutorial of MARL and a comprehensive survey of applications of MARL in next generation Internet. In particular, we first introduce single-agent RL and MARL. Then, we review a number of applications of MARL to solve emerging issues in future Internet. The issues consist of network access, transmit power control, computation offloading, content caching, packet routing, trajectory design for UAV-aided networks, and network security issues.
    Learning from All Vehicles. (arXiv:2203.11934v3 [cs.RO] UPDATED)
    In this paper, we present a system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes. This system uses the behaviors of other agents to create more diverse driving scenarios without collecting additional data. The main difficulty in learning from other vehicles is that there is no sensor information. We use a set of supervisory tasks to learn an intermediate representation that is invariant to the viewpoint of the controlling vehicle. This not only provides a richer signal at training time but also allows more complex reasoning during inference. Learning how all vehicles drive helps predict their behavior at test time and can avoid collisions. We evaluate this system in closed-loop driving simulations. Our system outperforms all prior methods on the public CARLA Leaderboard by a wide margin, improving driving score by 25 and route completion rate by 24 points. Our method won the 2021 CARLA Autonomous Driving challenge. Code and data are available at https://github.com/dotchen/LAV.
    Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach. (arXiv:2209.05186v1 [stat.ML])
    In an Markov decision process (MDP), unobservable confounders may exist and have impacts on the data generating process, so that the classic off-policy evaluation (OPE) estimators may fail to identify the true value function of the target policy. In this paper, we study the statistical properties of OPE in confounded MDPs with observable instrumental variables. Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure. For non-asymptotic analysis, we prove a $\mathcal{O}(n^{-1/2})$-error bound where $n$ is the number of samples. For asymptotic analysis, we prove that the two-stage estimator is asymptotically normal with a typical rate of $n^{1/2}$. To the best of our knowledge, we are the first to show such statistical results of the two-stage estimator for confounded linear MDPs via instrumental variables.
    Support Recovery in Mixture Models with Sparse Parameters. (arXiv:2202.11940v2 [cs.LG] UPDATED)
    Mixture models are widely used to fit complex and multimodal datasets. In this paper we study mixtures with high dimensional sparse latent parameter vectors and consider the problem of support recovery of those vectors. While parameter learning in mixture models is well-studied, the sparsity constraint remains relatively unexplored. Sparsity of parameter vectors is a natural constraint in variety of settings, and support recovery is a major step towards parameter estimation. We provide efficient algorithms for support recovery that have a logarithmic sample complexity dependence on the dimensionality of the latent space. Our algorithms are quite general, namely they are applicable to 1) mixtures of many different canonical distributions including Uniform, Poisson, Laplace, Gaussians, etc. 2) Mixtures of linear regressions and linear classifiers with Gaussian covariates under different assumptions on the unknown parameters. In most of these settings, our results are the first guarantees on the problem while in the rest, our results provide improvements on existing works.
    Near-Optimal Distributed Linear-Quadratic Regulator for Networked Systems. (arXiv:2204.05551v2 [math.OC] UPDATED)
    This paper studies the trade-off between the degree of decentralization and the performance of a distributed controller in a linear-quadratic control setting. We study a system of interconnected agents over a graph and a distributed controller, called $\kappa$-distributed control, which lets the agents make control decisions based on the state information within distance $\kappa$ on the underlying graph. This controller can tune its degree of decentralization using the parameter $\kappa$ and thus allows a characterization of the relationship between decentralization and performance. We show that under mild assumptions, including stabilizability, detectability, and a subexponentially growing graph condition, the performance difference between $\kappa$-distributed control and centralized optimal control becomes exponentially small in $\kappa$. This result reveals that distributed control can achieve near-optimal performance with a moderate degree of decentralization, and thus it is an effective controller architecture for large-scale networked systems.
    Centroids Matching: an efficient Continual Learning approach operating in the embedding space. (arXiv:2208.02048v2 [cs.LG] UPDATED)
    Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realistic, and limited work has been done to achieve good results on more realistic scenarios. In this paper, we propose a novel regularization method called Centroids Matching, that, inspired by meta-learning approaches, fights CF by operating in the feature space produced by the neural network, achieving good results while requiring a small memory footprint. Specifically, the approach classifies the samples directly using the feature vectors produced by the neural network, by matching those vectors with the centroids representing the classes from the current task, or all the tasks up to that point. Centroids Matching is faster than competing baselines, and it can be exploited to efficiently mitigate CF, by preserving the distances between the embedding space produced by the model when past tasks were over, and the one currently produced, leading to a method that achieves high accuracy on all the tasks, without using an external memory when operating on easy scenarios, or using a small one for more realistic ones. Extensive experiments demonstrate that Centroids Matching achieves accuracy gains on multiple datasets and scenarios.
    On the Nash equilibrium of moment-matching GANs for stationary Gaussian processes. (arXiv:2203.07136v3 [stat.ML] UPDATED)
    Generative Adversarial Networks (GANs) learn an implicit generative model from data samples through a two-player game. In this paper, we study the existence of Nash equilibrium of the game which is consistent as the number of data samples grows to infinity. In a realizable setting where the goal is to estimate the ground-truth generator of a stationary Gaussian process, we show that the existence of consistent Nash equilibrium depends crucially on the choice of the discriminator family. The discriminator defined from second-order statistical moments can result in non-existence of Nash equilibrium, existence of consistent non-Nash equilibrium, or existence and uniqueness of consistent Nash equilibrium, depending on whether symmetry properties of the generator family are respected. We further study empirically the local stability and global convergence of gradient descent-ascent methods towards consistent equilibrium.
    Stream-based Active Learning with Verification Latency in Non-stationary Environments. (arXiv:2204.06822v2 [cs.LG] UPDATED)
    Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.
    The Classification of Optical Galaxy Morphology Using Unsupervised Learning Techniques. (arXiv:2206.06165v2 [cs.LG] UPDATED)
    In recent years, large scale data intensive astronomical surveys have resulted in more detailed images being produced than scientists can manually classify. Even attempts to crowd-source this work will soon be outpaced by the large amount of data generated by modern surveys. This has brought into question the viability of human-based methods for classifying galaxy morphology. While supervised learning methods require datasets with existing labels, unsupervised learning techniques do not. Therefore, this paper implements unsupervised learning techniques to classify the Galaxy Zoo DECaLS dataset. A convolutional autoencoder feature extractor was trained and implemented. The resulting features were then clustered via k-means, fuzzy c-means and agglomerative clustering. These clusters were compared against the true volunteer classifications provided by the Galaxy Zoo DECaLS project. The best results, in general, were produced by the agglomerate clustering method. However, the increase in performance compared to k-means clustering was not significant considering the increase in clustering time. After undergoing the appropriate clustering algorithm optimizations, this approach could prove useful for classifying the better performing questions and could serve as the basis for a novel approach to generating more "human-like" galaxy morphology classifications from unsupervised techniques.
    Reconstruction of Long-Term Historical Demand Data. (arXiv:2209.04693v1 [cs.LG])
    Long-term planning of a robust power system requires the understanding of changing demand patterns. Electricity demand is highly weather sensitive. Thus, the supply side variation from introducing intermittent renewable sources, juxtaposed with variable demand, will introduce additional challenges in the grid planning process. By understanding the spatial and temporal variability of temperature over the US, the response of demand to natural variability and climate change-related effects on temperature can be separated, especially because the effects due to the former factor are not known. Through this project, we aim to better support the technology & policy development process for power systems by developing machine and deep learning 'back-forecasting' models to reconstruct multidecadal demand records and study the natural variability of temperature and its influence on demand.
    The Bayan Algorithm: Detecting Communities in Networks Through Exact and Approximate Optimization of Modularity. (arXiv:2209.04562v1 [cs.SI])
    Community detection is a classic problem in network science with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize a utility function, modularity, across different ways that a network can be partitioned into communities. Despite their name and design philosophy, current modularity maximization algorithms generally fail to maximize modularity or guarantee any proximity to an optimal solution. We propose the Bayan algorithm which, unlike the existing methods, returns network partitions with a guarantee of either optimality or proximity to an optimal solution. At the core of the Bayan algorithm is a branch-and-cut scheme that solves a sparse integer programming formulation of the modularity maximization problem to optimality or approximate it within a factor. We analyze the performance of Bayan against 22 existing algorithms using synthetic and real networks. Through extensive experiments, we demonstrate Bayan's distinctive capabilities not only in maximizing modularity, but more importantly in accurate retrieval of ground-truth communities. Bayan's comparative level of performance remains stable over variations in the amount of noise in the data (graph) generation process. The performance of Bayan as an exact modularity maximization algorithm also reveals the theoretical capability limits of maximum-modularity partitions in accurate retrieval of communities. Overall our analysis points to Bayan as a suitable choice for a methodologically grounded detection of communities through exact (approximate) maximization of modularity in networks with up to $\sim10^3$ edges (and larger networks). Prospective advances in graph optimization and integer programming can push these limits further.
    An Improved Lightweight YOLOv5 Model Based on Attention Mechanism for Face Mask Detection. (arXiv:2203.16506v3 [cs.CV] UPDATED)
    Coronavirus 2019 has brought severe challenges to social stability and public health worldwide. One effective way of curbing the epidemic is to require people to wear masks in public places and monitor mask-wearing states by utilizing suitable automatic detectors. However, existing deep learning based models struggle to simultaneously achieve the requirements of both high precision and real-time performance. To solve this problem, we propose an improved lightweight face mask detector based on YOLOv5, which can achieve an excellent balance of precision and speed. Firstly, a novel backbone ShuffleCANet that combines ShuffleNetV2 network with Coordinate Attention mechanism is proposed as the backbone. Afterwards, an efficient path aggression network BiFPN is applied as the feature fusion neck. Furthermore, the localization loss is replaced with alpha-CIoU in model training phase to obtain higher-quality anchors. Some valuable strategies such as data augmentation, adaptive image scaling, and anchor cluster operation are also utilized. Experimental results on AIZOO face mask dataset show the superiority of the proposed model. Compared with the original YOLOv5, the proposed model increases the inference speed by 28.3% while still improving the precision by 0.58%. It achieves the best mean average precision of 95.2% compared with other seven existing models, which is 4.4% higher than the baseline.
    Gradient Descent Temporal Difference-difference Learning. (arXiv:2209.04624v1 [cs.LG])
    Off-policy algorithms, in which a behavior policy differs from the target policy and is used to gain experience for learning, have proven to be of great practical value in reinforcement learning. However, even for simple convex problems such as linear value function approximation, these algorithms are not guaranteed to be stable. To address this, alternative algorithms that are provably convergent in such cases have been introduced, the most well known being gradient descent temporal difference (GTD) learning. This algorithm and others like it, however, tend to converge much more slowly than conventional temporal difference learning. In this paper we propose gradient descent temporal difference-difference (Gradient-DD) learning in order to improve GTD2, a GTD algorithm, by introducing second-order differences in successive parameter updates. We investigate this algorithm in the framework of linear value function approximation, theoretically proving its convergence by applying the theory of stochastic approximation. %analytically showing its improvement over GTD2. Studying the model empirically on the random walk task, the Boyan-chain task, and the Baird's off-policy counterexample, we find substantial improvement over GTD2 and, in several cases, better performance even than conventional TD learning.
    Active Learning for Optimal Intervention Design in Causal Models. (arXiv:2209.04744v1 [cs.LG])
    An important problem across disciplines is the discovery of interventions that produce a desired outcome. When the space of possible interventions is large, making an exhaustive search infeasible, experimental design strategies are needed. In this context, encoding the causal relationships between the variables, and thus the effect of interventions on the system, is critical in order to identify desirable interventions efficiently. We develop an iterative causal method to identify optimal interventions, as measured by the discrepancy between the post-interventional mean of the distribution and a desired target mean. We formulate an active learning strategy that uses the samples obtained so far from different interventions to update the belief about the underlying causal model, as well as to identify samples that are most informative about optimal interventions and thus should be acquired in the next batch. The approach employs a Bayesian update for the causal model and prioritizes interventions using a carefully designed, causally informed acquisition function. This acquisition function is evaluated in closed form, allowing for efficient optimization. The resulting algorithms are theoretically grounded with information-theoretic bounds and provable consistency results. We illustrate the method on both synthetic data and real-world biological data, namely gene expression data from Perturb-CITE-seq experiments, to identify optimal perturbations that induce a specific cell state transition; the proposed causal approach is observed to achieve better sample efficiency compared to several baselines. In both cases we observe that the causally informed acquisition function notably outperforms existing criteria allowing for optimal intervention design with significantly less experiments.
    Temporal Pattern Mining for Analysis of Longitudinal Clinical Data: Identifying Risk Factors for Alzheimer's Disease. (arXiv:2209.04793v1 [cs.LG])
    A novel framework is proposed for handling the complex task of modelling and analysis of longitudinal, multivariate, heterogeneous clinical data. This method uses temporal abstraction to convert the data into a more appropriate form for modelling, temporal pattern mining, to discover patterns in the complex, longitudinal data and machine learning models of survival analysis to select the discovered patterns. The method is applied to a real-world study of Alzheimer's disease (AD), a progressive neurodegenerative disease that has no cure. The patterns discovered were predictive of AD in survival analysis models with a Concordance index of up to 0.8. This is the first work that performs survival analysis of AD data using temporal data collections for AD. A visualisation module also provides a clear picture of the discovered patterns for ease of interpretability.
    Pathfinding in Random Partially Observable Environments with Vision-Informed Deep Reinforcement Learning. (arXiv:2209.04801v1 [cs.LG])
    Deep reinforcement learning is a technique for solving problems in a variety of environments, ranging from Atari video games to stock trading. This method leverages deep neural network models to make decisions based on observations of a given environment with the goal of maximizing a reward function that can incorporate cost and rewards for reaching goals. With the aim of pathfinding, reward conditions can include reaching a specified target area along with costs for movement. In this work, multiple Deep Q-Network (DQN) agents are trained to operate in a partially observable environment with the goal of reaching a target zone in minimal travel time. The agent operates based on a visual representation of its surroundings, and thus has a restricted capability to observe the environment. A comparison between DQN, DQN-GRU, and DQN-LSTM is performed to examine each models capabilities with two different types of input. Through this evaluation, it is been shown that with equivalent training and analogous model architectures, a DQN model is able to outperform its recurrent counterparts.
    Rethink Decision Tree Traversal. (arXiv:2209.04825v1 [cs.LG])
    We will show how to evaluate binary decision tree traversal in the language of matrix computation motivated by \textit{QuickScorer} in \cite{lucchese2015quickscorer}. Our main contribution is a novel matrix representation of the hierarchical structure of the decision tree. And we propose some equivalent algorithms of binary decision tree traversal based on rigorous theoretical analysis. The core idea is to find the relation between the input and exit leaf node. Here we not only understand decisions without the recursive traverse but also dive into the partitioning nature of tree-based methods.
    DeepSTI: Towards Tensor Reconstruction using Fewer Orientations in Susceptibility Tensor Imaging. (arXiv:2209.04504v1 [eess.IV])
    Susceptibility tensor imaging (STI) is an emerging magnetic resonance imaging technique that characterizes the anisotropic tissue magnetic susceptibility with a second-order tensor model. STI has the potential to provide information for both the reconstruction of white matter fiber pathways and detection of myelin changes in the brain at mm resolution or less, which would be of great value for understanding brain structure and function in healthy and diseased brain. However, the application of STI in vivo has been hindered by its cumbersome and time-consuming acquisition requirement of measuring susceptibility induced MR phase changes at multiple (usually more than six) head orientations. This complexity is enhanced by the limitation in head rotation angles due to physical constraints of the head coil. As a result, STI has not yet been widely applied in human studies in vivo. In this work, we tackle these issues by proposing an image reconstruction algorithm for STI that leverages data-driven priors. Our method, called DeepSTI, learns the data prior implicitly via a deep neural network that approximates the proximal operator of a regularizer function for STI. The dipole inversion problem is then solved iteratively using the learned proximal network. Experimental results using both simulation and in vivo human data demonstrate great improvement over state-of-the-art algorithms in terms of the reconstructed tensor image, principal eigenvector maps and tractography results, while allowing for tensor reconstruction with MR phase measured at much less than six different orientations. Notably, promising reconstruction results are achieved by our method from only one orientation in human in vivo, and we demonstrate a potential application of this technique for estimating lesion susceptibility anisotropy in patients with multiple sclerosis.
    Examining stability of machine learning methods for predicting dementia at early phases of the disease. (arXiv:2209.04643v1 [cs.LG])
    Dementia is a neuropsychiatric brain disorder that usually occurs when one or more brain cells stop working partially or at all. Diagnosis of this disorder in the early phases of the disease is a vital task to rescue patients lives from bad consequences and provide them with better healthcare. Machine learning methods have been proven to be accurate in predicting dementia in the early phases of the disease. The prediction of dementia depends heavily on the type of collected data which usually are gathered from Normalized Whole Brain Volume (nWBV) and Atlas Scaling Factor (ASF) which are normally measured and corrected from Magnetic Resonance Imaging (MRIs). Other biological features such as age and gender can also help in the diagnosis of dementia. Although many studies use machine learning for predicting dementia, we could not reach a conclusion on the stability of these methods for which one is more accurate under different experimental conditions. Therefore, this paper investigates the conclusion stability regarding the performance of machine learning algorithms for dementia prediction. To accomplish this, a large number of experiments were run using 7 machine learning algorithms and two feature reduction algorithms namely, Information Gain (IG) and Principal Component Analysis (PCA). To examine the stability of these algorithms, thresholds of feature selection were changed for the IG from 20% to 100% and the PCA dimension from 2 to 8. This has resulted in 7x9 + 7x7= 112 experiments. In each experiment, various classification evaluation data were recorded. The obtained results show that among seven algorithms the support vector machine and Naive Bayes are the most stable algorithms while changing the selection threshold. Also, it was found that using IG would seem more efficient than using PCA for predicting Dementia.
    Shape Analysis for Pediatric Upper Body Motor Function Assessment. (arXiv:2209.04710v1 [cs.LG])
    Neuromuscular disorders, such as Spinal Muscular Atrophy (SMA) and Duchenne Muscular Dystrophy (DMD), cause progressive muscular degeneration and loss of motor function for 1 in 6,000 children. Traditional upper limb motor function assessments do not quantitatively measure patient-performed motions, which makes it difficult to track progress for incremental changes. Assessing motor function in children with neuromuscular disorders is particularly challenging because they can be nervous or excited during experiments, or simply be too young to follow precise instructions. These challenges translate to confounding factors such as performing different parts of the arm curl slower or faster (phase variability) which affects the assessed motion quality. This paper uses curve registration and shape analysis to temporally align trajectories while simultaneously extracting a mean reference shape. Distances from this mean shape are used to assess the quality of motion. The proposed metric is invariant to confounding factors, such as phase variability, while suggesting several clinically relevant insights. First, there are statistically significant differences between functional scores for the control and patient populations (p$=$0.0213$\le$0.05). Next, several patients in the patient cohort are able to perform motion on par with the healthy cohort and vice versa. Our metric, which is computed based on wearables, is related to the Brooke's score ((p$=$0.00063$\le$0.05)), as well as motor function assessments based on dynamometry ((p$=$0.0006$\le$0.05)). These results show promise towards ubiquitous motion quality assessment in daily life.
    Analyzing Wearables Dataset to Predict ADLs and Falls: A Pilot Study. (arXiv:2209.04785v1 [cs.LG])
    Healthcare is an important aspect of human life. Use of technologies in healthcare has increased manifolds after the pandemic. Internet of Things based systems and devices proposed in literature can help elders, children and adults facing/experiencing health problems. This paper exhaustively reviews thirty-nine wearable based datasets which can be used for evaluating the system to recognize Activities of Daily Living and Falls. A comparative analysis on the SisFall dataset using five machine learning methods i.e., Logistic Regression, Linear Discriminant Analysis, K-Nearest Neighbor, Decision Tree and Naive Bayes is performed in python. The dataset is modified in two ways, in first all the attributes present in dataset are used as it is and labelled in binary form. In second, magnitude of three axes(x,y,z) for three sensors value are computed and then used in experiment with label attribute. The experiments are performed on one subject, ten subjects and all the subjects and compared in terms of accuracy, precision and recall. The results obtained from this study proves that KNN outperforms other machine learning methods in terms of accuracy, precision and recall. It is also concluded that personalization of data improves accuracy.
    Defend Data Poisoning Attacks on Voice Authentication. (arXiv:2209.04547v1 [cs.CR])
    With the advances in deep learning, speaker verification has achieved very high accuracy and is gaining popularity as a type of biometric authentication option in many scenes of our daily life, especially the growing market of web services. Compared to traditional passwords, "vocal passwords" are much more convenient as they relieve people from memorizing different passwords. However, new machine learning attacks are putting these voice authentication systems at risk. Without a strong security guarantee, attackers could access legitimate users' web accounts by fooling the deep neural network (DNN) based voice recognition models. In this paper, we demonstrate an easy-to-implement data poisoning attack to the voice authentication system, which can hardly be captured by existing defense mechanisms. Thus, we propose a more robust defense method, called Guardian, which is a convolutional neural network-based discriminator. The Guardian discriminator integrates a series of novel techniques including bias reduction, input augmentation, and ensemble learning. Our approach is able to distinguish about 95% of attacked accounts from normal accounts, which is much more effective than existing approaches with only 60% accuracy.
    Ask Before You Act: Generalising to Novel Environments by Asking Questions. (arXiv:2209.04665v1 [cs.AI])
    Solving temporally-extended tasks is a challenge for most reinforcement learning (RL) algorithms [arXiv:1906.07343]. We investigate the ability of an RL agent to learn to ask natural language questions as a tool to understand its environment and achieve greater generalisation performance in novel, temporally-extended environments. We do this by endowing this agent with the ability of asking "yes-no" questions to an all-knowing Oracle. This allows the agent to obtain guidance regarding the task at hand, while limiting the access to new information. To study the emergence of such natural language questions in the context of temporally-extended tasks we first train our agent in a Mini-Grid environment. We then transfer the trained agent to a different, harder environment. We observe a significant increase in generalisation performance compared to a baseline agent unable to ask questions. Through grounding its understanding of natural language in its environment, the agent can reason about the dynamics of its environment to the point that it can ask new, relevant questions when deployed in a novel environment.
    The Space of Adversarial Strategies. (arXiv:2209.04521v1 [cs.CR])
    Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade. Yet, our understanding of this phenomenon stems from a rather fragmented pool of knowledge; at present, there are a handful of attacks, each with disparate assumptions in threat models and incomparable definitions of optimality. In this paper, we propose a systematic approach to characterize worst-case (i.e., optimal) adversaries. We first introduce an extensible decomposition of attacks in adversarial machine learning by atomizing attack components into surfaces and travelers. With our decomposition, we enumerate over components to create 576 attacks (568 of which were previously unexplored). Next, we propose the Pareto Ensemble Attack (PEA): a theoretical attack that upper-bounds attack performance. With our new attacks, we measure performance relative to the PEA on: both robust and non-robust models, seven datasets, and three extended lp-based threat models incorporating compute costs, formalizing the Space of Adversarial Strategies. From our evaluation we find that attack performance to be highly contextual: the domain, model robustness, and threat model can have a profound influence on attack efficacy. Our investigation suggests that future studies measuring the security of machine learning should: (1) be contextualized to the domain & threat models, and (2) go beyond the handful of known attacks used today.
    Revisiting Active Sets for Gaussian Process Decoders. (arXiv:2209.04636v1 [stat.ML])
    Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space. Such models (also known as GP-LVMs) are often expensive and notoriously difficult to train in practice, but can be scaled using variational inference and inducing points. In this paper, we revisit active set approximations. We develop a new stochastic estimate of the log-marginal likelihood based on recently discovered links to cross-validation, and propose a computationally efficient approximation thereof. We demonstrate that the resulting stochastic active sets (SAS) approximation significantly improves the robustness of GP decoder training while reducing computational cost. The SAS-GP obtains more structure in the latent space, scales to many datapoints and learns better representations than variational autoencoders, which is rarely the case for GP decoders.
    Application of Machine Learning for Online Reputation Systems. (arXiv:2209.04650v1 [cs.LG])
    Users on the internet usually require venues to provide better purchasing recommendations. This can be provided by a reputation system that processes ratings to provide recommendations. The rating aggregation process is a main part of reputation system to produce global opinion about the product quality. Naive methods that are frequently used do not consider consumer profiles in its calculation and cannot discover unfair ratings and trends emerging in new ratings. Other sophisticated rating aggregation methods that use weighted average technique focus on one or a few aspects of consumers profile data. This paper proposes a new reputation system using machine learning to predict reliability of consumers from consumer profile. In particular, we construct a new consumer profile dataset by extracting a set of factors that have great impact on consumer reliability, which serve as an input to machine learning algorithms. The predicted weight is then integrated with a weighted average method to compute product reputation score. The proposed model has been evaluated over three MovieLens benchmarking datasets, using 10-Folds cross validation. Furthermore, the performance of the proposed model has been compared to previous published rating aggregation models. The obtained results were promising which suggest that the proposed approach could be a potential solution for reputation systems. The results of comparison demonstrated the accuracy of our models. Finally, the proposed approach can be integrated with online recommendation systems to provide better purchasing recommendations and facilitate user experience on online shopping markets.
    Bayesian Algorithm Execution for Tuning Particle Accelerator Emittance with Partial Measurements. (arXiv:2209.04587v1 [physics.acc-ph])
    Traditional black-box optimization methods are inefficient when dealing with multi-point measurement, i.e. when each query in the control domain requires a set of measurements in a secondary domain to calculate the objective. In particle accelerators, emittance tuning from quadrupole scans is an example of optimization with multi-point measurements. Although the emittance is a critical parameter for the performance of high-brightness machines, including X-ray lasers and linear colliders, comprehensive optimization is often limited by the time required for tuning. Here, we extend the recently-proposed Bayesian Algorithm Execution (BAX) to the task of optimization with multi-point measurements. BAX achieves sample-efficiency by selecting and modeling individual points in the joint control-measurement domain. We apply BAX to emittance minimization at the Linac Coherent Light Source (LCLS) and the Facility for Advanced Accelerator Experimental Tests II (FACET-II) particle accelerators. In an LCLS simulation environment, we show that BAX delivers a 20x increase in efficiency while also being more robust to noise compared to traditional optimization methods. Additionally, we ran BAX live at both LCLS and FACET-II, matching the hand-tuned emittance at FACET-II and achieving an optimal emittance that was 24% lower than that obtained by hand-tuning at LCLS. We anticipate that our approach can readily be adapted to other types of optimization problems involving multi-point measurements commonly found in scientific instruments.
    Deep Learning with Non-Linear Factor Models: Adaptability and Avoidance of Curse of Dimensionality. (arXiv:2209.04512v1 [stat.ML])
    In this paper, we connect deep learning literature with non-linear factor models and show that deep learning estimation makes a substantial improvement in the non-linear additive factor model literature. We provide bounds on the expected risk and show that these upper bounds are uniform over a set of multiple response variables by extending Schmidt-Hieber (2020) theorems. We show that our risk bound does not depend on the number of factors. In order to construct a covariance matrix estimator for asset returns, we develop a novel data-dependent estimator of the error covariance matrix in deep neural networks. The estimator refers to a flexible adaptive thresholding technique which is robust to outliers in the innovations. We prove that the estimator is consistent in spectral norm. Then using that result, we show consistency and rate of convergence of covariance matrix and precision matrix estimator for asset returns. The rate of convergence in both results do not depend on the number of factors, hence ours is a new result in the factor model literature due to the fact that number of factors are impediment to better estimation and prediction. Except from the precision matrix result, all our results are obtained even with number of assets are larger than the time span, and both quantities are growing. Various Monte Carlo simulations confirm our large sample findings and reveal superior accuracies of the DNN-FM in estimating the true underlying functional form which connects the factors and observable variables, as well as the covariance and precision matrix compared to competing approaches. Moreover, in an out-of-sample portfolio forecasting application it outperforms in most of the cases alternative portfolio strategies in terms of out-of-sample portfolio standard deviation and Sharpe ratio.
    Learning sparse auto-encoders for green AI image coding. (arXiv:2209.04448v1 [eess.IV])
    Recently, convolutional auto-encoders (CAE) were introduced for image coding. They achieved performance improvements over the state-of-the-art JPEG2000 method. However, these performances were obtained using massive CAEs featuring a large number of parameters and whose training required heavy computational power.\\ In this paper, we address the problem of lossy image compression using a CAE with a small memory footprint and low computational power usage. In order to overcome the computational cost issue, the majority of the literature uses Lagrangian proximal regularization methods, which are time consuming themselves.\\ In this work, we propose a constrained approach and a new structured sparse learning method. We design an algorithm and test it on three constraints: the classical $\ell_1$ constraint, the $\ell_{1,\infty}$ and the new $\ell_{1,1}$ constraint. Experimental results show that the $\ell_{1,1}$ constraint provides the best structured sparsity, resulting in a high reduction of memory and computational cost, with similar rate-distortion performance as with dense networks.
    Accelerated Primal-Dual Methods for Convex-Strongly-Concave Saddle Point Problems. (arXiv:2209.04604v1 [math.OC])
    In this work, we aim to investigate Primal-Dual (PD) methods for convex-strongly-concave saddle point problems (SPP). In many cases, the computation of the proximal oracle over the primal-only function is inefficient. Hence, we use its first-order linear approximation in the proximal step resulting in a Linearized PD (LPD) method. Even when the coupling term is bilinear, we observe that LPD has a suboptimal dependence on the Lipschitz constant of the primal-only function. In contrast, LPD has optimal convergence for the strongly-convex concave case. This observation induces us to present our accelerated linearized primal-dual (ALPD) algorithm to solve convex strongly-concave SPP. ALPD is a single-loop algorithm that combines features of Nesterov's accelerated gradient descent (AGD) and LPD. We show that when the coupling term is semi-linear (which contains bilinear as a specific case), ALPD obtains the optimal dependence on the Lipschitz constant of primal-only function. Hence, it is an optimal algorithm. When the coupling term has a general nonlinear form, the ALPD algorithm has suboptimal dependence on the Lipschitz constant of the primal part of the coupling term. To improve this dependence, we present an inexact APD algorithm. This algorithm performs AGD iterations in the inner loop to find an approximate solution to a proximal subproblem of APD. We show that inexact APD maintains optimal number of gradients evaluations (gradient complexity) of primal-only and dual parts of the problem. It also significantly improves the gradient-complexity of the primal coupling term.
    Robustness through Cognitive Dissociation Mitigation in Contrastive Adversarial Training. (arXiv:2203.08959v3 [cs.LG] UPDATED)
    In this paper, we introduce a novel neural network training framework that increases model's adversarial robustness to adversarial attacks while maintaining high clean accuracy by combining contrastive learning (CL) with adversarial training (AT). We propose to improve model robustness to adversarial attacks by learning feature representations that are consistent under both data augmentations and adversarial perturbations. We leverage contrastive learning to improve adversarial robustness by considering an adversarial example as another positive example, and aim to maximize the similarity between random augmentations of data samples and their adversarial example, while constantly updating the classification head in order to avoid a cognitive dissociation between the classification head and the embedding space. This dissociation is caused by the fact that CL updates the network up to the embedding space, while freezing the classification head which is used to generate new positive adversarial examples. We validate our method, Contrastive Learning with Adversarial Features(CLAF), on the CIFAR-10 dataset on which it outperforms both robust accuracy and clean accuracy over alternative supervised and self-supervised adversarial learning methods.
    Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation. (arXiv:2203.03057v2 [cs.CV] UPDATED)
    Best-of-N (BoN) Average Displacement Error (ADE)/ Final Displacement Error (FDE) is the most used metric for evaluating trajectory prediction models. Yet, the BoN does not quantify the whole generated samples, resulting in an incomplete view of the model's prediction quality and performance. We propose a new metric, Average Mahalanobis Distance (AMD) to tackle this issue. AMD is a metric that quantifies how close the whole generated samples are to the ground truth. We also introduce the Average Maximum Eigenvalue (AMV) metric that quantifies the overall spread of the predictions. Our metrics are validated empirically by showing that the ADE/FDE is not sensitive to distribution shifts, giving a biased sense of accuracy, unlike the AMD/AMV metrics. We introduce the usage of Implicit Maximum Likelihood Estimation (IMLE) as a replacement for traditional generative models to train our model, Social-Implicit. IMLE training mechanism aligns with AMD/AMV objective of predicting trajectories that are close to the ground truth with a tight spread. Social-Implicit is a memory efficient deep model with only 5.8K parameters that runs in real time of about 580Hz and achieves competitive results. Interactive demo of the problem can be seen at https://www.abduallahmohamed.com/social-implicit-amdamv-adefde-demo . Code is available at https://github.com/abduallahmohamed/Social-Implicit .
    TruVR: Trustworthy Cybersickness Detection using Explainable Machine Learning. (arXiv:2209.05257v1 [cs.HC])
    Cybersickness can be characterized by nausea, vertigo, headache, eye strain, and other discomforts when using virtual reality (VR) systems. The previously reported machine learning (ML) and deep learning (DL) algorithms for detecting (classification) and predicting (regression) VR cybersickness use black-box models; thus, they lack explainability. Moreover, VR sensors generate a massive amount of data, resulting in complex and large models. Therefore, having inherent explainability in cybersickness detection models can significantly improve the model's trustworthiness and provide insight into why and how the ML/DL model arrived at a specific decision. To address this issue, we present three explainable machine learning (xML) models to detect and predict cybersickness: 1) explainable boosting machine (EBM), 2) decision tree (DT), and 3) logistic regression (LR). We evaluate xML-based models with publicly available physiological and gameplay datasets for cybersickness. The results show that the EBM can detect cybersickness with an accuracy of 99.75% and 94.10% for the physiological and gameplay datasets, respectively. On the other hand, while predicting the cybersickness, EBM resulted in a Root Mean Square Error (RMSE) of 0.071 for the physiological dataset and 0.27 for the gameplay dataset. Furthermore, the EBM-based global explanation reveals exposure length, rotation, and acceleration as key features causing cybersickness in the gameplay dataset. In contrast, galvanic skin responses and heart rate are most significant in the physiological dataset. Our results also suggest that EBM-based local explanation can identify cybersickness-causing factors for individual samples. We believe the proposed xML-based cybersickness detection method can help future researchers understand, analyze, and design simpler cybersickness detection and reduction models.  ( 3 min )
    Advanced Manufacturing Configuration by Sample-efficient Batch Bayesian Optimization. (arXiv:2205.11827v2 [cs.LG] UPDATED)
    We propose a framework for the configuration and operation of expensive-to-evaluate advanced manufacturing methods, based on Bayesian optimization. The framework unifies a tailored acquisition function, a parallel acquisition procedure, and the integration of process information providing context to the optimization procedure. \cmtb{The novel acquisition function is demonstrated, analyzed and compared on state-of-the-art benchmarking problems. We apply the optimization approach to atmospheric plasma spraying and fused deposition modeling.} Our results demonstrate that the proposed framework can efficiently find input parameters that produce the desired outcome and minimize the process cost.
    Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary Information. (arXiv:1908.01901v2 [cs.LG] UPDATED)
    Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumber relatively rare parasites. In this work, we describe a complete, fully-automated framework for thin film malaria analysis that applies ML methods, including convolutional neural nets (CNNs), trained on a large and diverse dataset of field-prepared thin blood films. Quantitation and species identification results are close to sufficiently accurate for the concrete needs of drug resistance monitoring and clinical use-cases on field-prepared samples. We focus our methods and our performance metrics on the field use-case requirements. We discuss key issues and important metrics for the application of ML methods to malaria microscopy.  ( 3 min )
    Federated Reinforcement Learning for Collective Navigation of Robotic Swarms. (arXiv:2202.01141v2 [cs.RO] UPDATED)
    The recent advancement of Deep Reinforcement Learning (DRL) contributed to robotics by allowing automatic controller design. The automatic controller design is a crucial approach for designing swarm robotic systems, which require more complex controllers than a single robot system to lead a desired collective behaviour. Although the DRL-based controller design method showed its effectiveness, the reliance on the central training server is a critical problem in real-world environments where robot-server communication is unstable or limited. We propose a novel Federated Learning (FL) based DRL training strategy (FLDDPG) for use in swarm robotic applications. Through the comparison with baseline strategies under a limited communication bandwidth scenario, it is shown that the FLDDPG method resulted in higher robustness and generalisation ability into a different environment and real robots, while the baseline strategies suffer from the limitation of communication bandwidth. This result suggests that the proposed method can benefit swarm robotic systems operating in environments with limited communication bandwidth, e.g., in high-radiation, underwater, or subterranean environments.
    Model-Based Deep Learning. (arXiv:2012.08405v3 [eess.SP] UPDATED)
    Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Such model-based methods utilize mathematical formulations that represent the underlying physics, prior information and additional domain knowledge. Simple classical models are useful but sensitive to inaccuracies and may lead to poor performance when real systems display complex or dynamic behavior. On the other hand, purely data-driven approaches that are model-agnostic are becoming increasingly popular as datasets become abundant and the power of modern deep learning pipelines increases. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance, especially for supervised problems. However, DNNs typically require massive amounts of data and immense computational resources, limiting their applicability for some signal processing scenarios. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches. Such model-based deep learning methods exploit both partial domain knowledge, via mathematical structures designed for specific problems, as well as learning from limited data. In this article we survey the leading approaches for studying and designing model-based deep learning systems. We divide hybrid model-based/data-driven systems into categories based on their inference mechanism. We provide a comprehensive review of the leading approaches for combining model-based algorithms with deep learning in a systematic manner, along with concrete guidelines and detailed signal processing oriented examples from recent literature. Our aim is to facilitate the design and study of future systems on the intersection of signal processing and machine learning that incorporate the advantages of both domains.
    Learning to Compose Soft Prompts for Compositional Zero-Shot Learning. (arXiv:2204.03574v2 [cs.LG] UPDATED)
    We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs). VLMs can represent arbitrary classes as natural language prompts in their flexible text encoders, but they underperform state-of-the-art methods on compositional zero-shot benchmark tasks. To improve VLMs, we propose a novel form of soft prompting. We treat the attributes and objects that are composed to define classes as learnable tokens of vocabulary and tune them on multiple prompt compositions. During inference, we recompose the learned attribute-object vocabulary in new combinations. We show that CSP outperforms the original VLM on benchmark datasets by an average of 10.9 percentage points on AUC. CSP also outperforms CoOp, a soft prompting method that tunes the prefix context, by an average of 5.8 percentage points on AUC. We perform additional experiments to show that CSP improves generalization to attribute-only classification, higher-order attribute-attribute-object compositions, and combinations of pretrained attributes and fine-tuned objects.  ( 2 min )
    Iterative Teaching by Label Synthesis. (arXiv:2110.14432v4 [cs.LG] UPDATED)
    In this paper, we consider the problem of iterative machine teaching, where a teacher provides examples sequentially based on the current iterative learner. In contrast to previous methods that have to scan over the entire pool and select teaching examples from it in each iteration, we propose a label synthesis teaching framework where the teacher randomly selects input teaching examples (e.g., images) and then synthesizes suitable outputs (e.g., labels) for them. We show that this framework can avoid costly example selection while still provably achieving exponential teachability. We propose multiple novel teaching algorithms in this framework. Finally, we empirically demonstrate the value of our framework.  ( 2 min )
    Graph Neural Modeling of Network Flows. (arXiv:2209.05208v1 [cs.LG])
    Network flow problems, which involve distributing traffic over a network such that the underlying infrastructure is used effectively, are ubiquitous in transportation and logistics. Due to the appeal of data-driven optimization, these problems have increasingly been approached using graph learning methods. Among them, the Multi-Commodity Network Flow (MCNF) problem is of particular interest given its generality, since it concerns the distribution of multiple flows (also called demands) of different sizes between several sources and sinks. The widely-used objective that we focus on is the maximum utilization of any link in the network, given traffic demands and a routing strategy. In this paper, we propose a novel approach based on Graph Neural Networks (GNNs) for the MCNF problem which uses distinctly parametrized message functions along each link, akin to a relational model where all edge types are unique. We show that our proposed method yields substantial gains over existing graph learning methods that constrain the routing unnecessarily. We extensively evaluate the proposed approach by means of an Internet routing case study using 17 Service Provider topologies and two flow routing schemes. We find that, in many networks, an MLP is competitive with a generic GNN that does not use our mechanism. Furthermore, we shed some light on the relationship between graph structure and the difficulty of data-driven routing of flows, an aspect that has not been considered in the existing work in the area.  ( 3 min )
    Monkeypox virus detection using pre-trained deep learning-based approaches. (arXiv:2209.04444v1 [eess.IV])
    Monkeypox virus is emerging slowly with the decline of COVID-19 virus infections around the world. People are afraid of it, thinking that it would appear as a pandemic like COVID-19. As such, it is crucial to detect them earlier before widespread community transmission. AI-based detection could help identify them at the early stage. In this paper, we first compare 13 different pre-trained deep learning (DL) models for the Monkeypox virus detection. For this, we first fine-tune them with the addition of universal custom layers for all of them and analyse using four well-established measures: Precision, Recall, F1-score, and Accuracy. After the identification of the best-performing DL models, we ensemble them to improve the overall performance using a majority voting over the probabilistic outputs obtained from them. We perform our experiments on a publicly available dataset, which shows that our ensemble method provides Precision, Recall, F1-score, and Accuracy of 85.44\%, 85.47\%, 85.40\%, and 87.13\%, respectively. These encouraging results suggest that the proposed approach is applicable to health practitioners for mass screening.
    Gluformer: Transformer-Based Personalized Glucose Forecasting with Uncertainty Quantification. (arXiv:2209.04526v1 [cs.LG])
    Deep learning models achieve state-of-the art results in predicting blood glucose trajectories, with a wide range of architectures being proposed. However, the adaptation of such models in clinical practice is slow, largely due to the lack of uncertainty quantification of provided predictions. In this work, we propose to model the future glucose trajectory conditioned on the past as an infinite mixture of basis distributions (i.e., Gaussian, Laplace, etc.). This change allows us to learn the uncertainty and predict more accurately in the cases when the trajectory has a heterogeneous or multi-modal distribution. To estimate the parameters of the predictive distribution, we utilize the Transformer architecture. We empirically demonstrate the superiority of our method over existing state-of-the-art techniques both in terms of accuracy and uncertainty on the synthetic and benchmark glucose data sets.  ( 2 min )
    GFCL: A GRU-based Federated Continual Learning Framework against Data Poisoning Attacks in IoV. (arXiv:2204.11010v2 [cs.LG] UPDATED)
    Integration of machine learning (ML) in 5G-based Internet of Vehicles (IoV) networks has enabled intelligent transportation and smart traffic management. Nonetheless, the security against adversarial poisoning attacks is also increasingly becoming a challenging task. Specifically, Deep Reinforcement Learning (DRL) is one of the widely used ML designs in IoV applications. The standard ML security techniques are not effective in DRL where the algorithm learns to solve sequential decision-making through continuous interaction with the environment, and the environment is time-varying, dynamic, and mobile. In this paper, we propose a Gated Recurrent Unit (GRU)-based federated continual learning (GFCL) anomaly detection framework against Sybil-based data poisoning attacks in IoV. The objective is to present a lightweight and scalable framework that learns and detects the illegitimate behavior without having a-priori training dataset consisting of attack samples. We use GRU to predict a future data sequence to analyze and detect illegitimate behavior from vehicles in a federated learning-based distributed manner. We investigate the performance of our framework using real-world vehicle mobility traces. The results demonstrate the effectiveness of our proposed solution in terms of different performance metrics.  ( 3 min )
    Conditional Gradients for the Approximatel Vanishing Ideal. (arXiv:2202.03349v12 [cs.LG] UPDATED)
    The vanishing ideal of a set of points $X\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the pairwise conditional gradients approximate vanishing ideal algorithm (PCGAVI) that constructs a set of generators of the approximate vanishing ideal. The constructed generators capture polynomial structures in data and give rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In PCGAVI, we construct the set of generators by solving constrained convex optimization problems with the pairwise conditional gradients algorithm. Thus, PCGAVI not only constructs few but also sparse generators, making the corresponding feature transformation robust and compact. Furthermore, we derive several learning guarantees for PCGAVI that make the algorithm theoretically better motivated than related generator-constructing methods.  ( 3 min )
    Robust Uncertainty Bounds in Reproducing Kernel Hilbert Spaces: A Convex Optimization Approach. (arXiv:2104.09582v3 [cs.LG] UPDATED)
    The problem of establishing out-of-sample bounds for the values of an unkonwn ground-truth function is considered. Kernels and their associated Hilbert spaces are the main formalism employed herein along with an observational model where outputs are corrupted by bounded measurement noise. The noise can originate from any compactly supported distribution and no independence assumptions are made on the available data. In this setting, we show how computing tight, finite-sample uncertainty bounds amounts to solving parametric quadratically constrained linear programs. Next, properties of our approach are established and its relationship with another methods is studied. Numerical experiments are presented to exemplify how the theory can be applied in a number of scenarios, and to contrast it with other closed-form alternatives.  ( 2 min )
    Fine-grain Inference on Out-of-Distribution Data with Hierarchical Classification. (arXiv:2209.04493v1 [cs.LG])
    Machine learning methods must be trusted to make appropriate decisions in real-world environments, even when faced with out-of-distribution (OOD) samples. Many current approaches simply aim to detect OOD examples and alert the user when an unrecognized input is given. However, when the OOD sample significantly overlaps with the training data, a binary anomaly detection is not interpretable or explainable, and provides little information to the user. We propose a new model for OOD detection that makes predictions at varying levels of granularity as the inputs become more ambiguous, the model predictions become coarser and more conservative. Consider an animal classifier that encounters an unknown bird species and a car. Both cases are OOD, but the user gains more information if the classifier recognizes that its uncertainty over the particular species is too large and predicts bird instead of detecting it as OOD. Furthermore, we diagnose the classifiers performance at each level of the hierarchy improving the explainability and interpretability of the models predictions. We demonstrate the effectiveness of hierarchical classifiers for both fine- and coarse-grained OOD tasks.
    An Interactive Automation for Human Biliary Tree Diagnosis Using Computer Vision. (arXiv:2209.04646v1 [eess.IV])
    The biliary tree is a network of tubes that connects the liver to the gallbladder, an organ right beneath it. The bile duct is the major tube in the biliary tree. The dilatation of a bile duct is a key indicator for more major problems in the human body, such as stones and tumors, which are frequently caused by the pancreas or the papilla of vater. The detection of bile duct dilatation can be challenging for beginner or untrained medical personnel in many circumstances. Even professionals are unable to detect bile duct dilatation with the naked eye. This research presents a unique vision-based model for biliary tree initial diagnosis. To segment the biliary tree from the Magnetic Resonance Image, the framework used different image processing approaches (MRI). After the image's region of interest was segmented, numerous calculations were performed on it to extract 10 features, including major and minor axes, bile duct area, biliary tree area, compactness, and some textural features (contrast, mean, variance and correlation). This study used a database of images from King Hussein Medical Center in Amman, Jordan, which included 200 MRI images, 100 normal cases, and 100 patients with dilated bile ducts. After the characteristics are extracted, various classifiers are used to determine the patients' condition in terms of their health (normal or dilated). The findings demonstrate that the extracted features perform well with all classifiers in terms of accuracy and area under the curve. This study is unique in that it uses an automated approach to segment the biliary tree from MRI images, as well as scientifically correlating retrieved features with biliary tree status that has never been done before in the literature.
    Deep Baseline Network for Time Series Modeling and Anomaly Detection. (arXiv:2209.04561v1 [cs.LG])
    Deep learning has seen increasing applications in time series in recent years. For time series anomaly detection scenarios, such as in finance, Internet of Things, data center operations, etc., time series usually show very flexible baselines depending on various external factors. Anomalies unveil themselves by lying far away from the baseline. However, the detection is not always easy due to some challenges including baseline shifting, lacking of labels, noise interference, real time detection in streaming data, result interpretability, etc. In this paper, we develop a novel deep architecture to properly extract the baseline from time series, namely Deep Baseline Network (DBLN). By using this deep network, we can easily locate the baseline position and then provide reliable and interpretable anomaly detection result. Empirical evaluation on both synthetic and public real-world datasets shows that our purely unsupervised algorithm achieves superior performance compared with state-of-art methods and has good practical applications.
    Two-step reinforcement learning for model-free redesign of nonlinear optimal regulator. (arXiv:2103.03808v2 [eess.SY] UPDATED)
    In many practical control applications, the performance level of a closed-loop system degrades over time due to the change of plant characteristics. Thus, there is a strong need for redesigning a controller without going through the system modeling process, which is often difficult for closed-loop systems. Reinforcement learning (RL) is one of the promising approaches that enable model-free redesign of optimal controllers for nonlinear dynamical systems based only on the measurement of the closed-loop system. However, the learning process of RL requires a considerable number of trial-and-error experiments using the poorly controlled system that may accumulate wear on the plant. To overcome this limitation, we propose a model-free two-step design method that improves the transient learning performance of RL in an optimal regulator redesign problem for unknown nonlinear systems. Specifically, we first design a linear control law that attains some degree of control performance in a model-free manner, and then, train the nonlinear optimal control law with online RL by using the designed linear control law in parallel. We introduce an offline RL algorithm for the design of the linear control law and theoretically guarantee its convergence to the LQR controller under mild assumptions. Numerical simulations show that the proposed method improves the transient learning performance and efficiency in hyperparameter tuning of RL.  ( 3 min )
    Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor Problem. (arXiv:2209.05093v1 [cs.LG])
    We study the feature-based newsvendor problem, in which a decision-maker has access to historical data consisting of demand observations and exogenous features. In this setting, we investigate feature selection, aiming to derive sparse, explainable models with improved out-of-sample performance. Up to now, state-of-the-art methods utilize regularization, which penalizes the number of selected features or the norm of the solution vector. As an alternative, we introduce a novel bilevel programming formulation. The upper-level problem selects a subset of features that minimizes an estimate of the out-of-sample cost of ordering decisions based on a held-out validation set. The lower-level problem learns the optimal coefficients of the decision function on a training set, using only the features selected by the upper-level. We present a mixed integer linear program reformulation for the bilevel program, which can be solved to optimality with standard optimization solvers. Our computational experiments show that the method accurately recovers ground-truth features already for instances with a sample size of a few hundred observations. In contrast, regularization-based techniques often fail at feature recovery or require thousands of observations to obtain similar accuracy. Regarding out-of-sample generalization, we achieve improved or comparable cost performance.  ( 2 min )
    Low Precision Decentralized Distributed Training over IID and non-IID Data. (arXiv:2111.09389v3 [cs.LG] UPDATED)
    Decentralized distributed learning is the key to enabling large-scale machine learning (training) on edge devices utilizing private user-generated local data, without relying on the cloud. However, the practical realization of such on-device training is limited by the communication and compute bottleneck. In this paper, we propose and show the convergence of low precision decentralized training that aims to reduce the computational complexity and communication cost of decentralized training. Many feedback-based compression techniques have been proposed in the literature to reduce communication costs. To the best of our knowledge, there is no work that applies and shows compute efficient training techniques such as quantization, pruning, etc., for peer-to-peer decentralized learning setups. Since real-world applications have a significant skew in the data distribution, we design "Range-EvoNorm" as the normalization activation layer which is better suited for low precision training over non-IID data. Moreover, we show that the proposed low precision training can be used in synergy with other communication compression methods decreasing the communication cost further. Our experiments indicate that 8-bit decentralized training has minimal accuracy loss compared to its full precision counterpart even with non-IID data. However, when low precision training is accompanied by communication compression through sparsification we observe a 1-2% drop in accuracy. The proposed low precision decentralized training decreases computational complexity, memory usage, and communication cost by 4x and compute energy by a factor of ~20x, while trading off less than a $1\%$ accuracy for both IID and non-IID data. In particular, with higher skew values, we observe an increase in accuracy (by ~ 0.5%) with low precision training, indicating the regularization effect of the quantization.  ( 3 min )
    Wake-Cough: cough spotting and cougher identification for personalised long-term cough monitoring. (arXiv:2110.03771v2 [cs.SD] UPDATED)
    We present `wake-cough', an application of wake-word spotting to coughs using a Resnet50 and the identification of coughers using i-vectors, for the purpose of a long-term, personalised cough monitoring system. Coughs, recorded in a quiet (73$\pm$5 dB) and noisy (34$\pm$17 dB) environment, were used to extract i-vectors, x-vectors and d-vectors, used as features to the classifiers. The system achieves 90.02\% accuracy when using an MLP to discriminate between 51 coughers using 2-sec long cough segments in the noisy environment. When discriminating between 5 and 14 coughers using longer (100 sec) segments in the quiet environment, this accuracy improves to 99.78% and 98.39% respectively. Unlike speech, i-vectors outperform x-vectors and d-vectors in identifying coughers. These coughs were added as an extra class to the Google Speech Commands dataset and features were extracted by preserving the end-to-end time-domain information in a trigger phrase. The highest accuracy of 88.58% is achieved in spotting coughs among 35 other trigger phrases using a Resnet50. Thus, wake-cough represents a personalised, non-intrusive cough monitoring system, which is power-efficient as on-device wake-word detection can keep a smartphone-based monitoring device mostly dormant. This makes wake-cough extremely attractive in multi-bed ward environments to monitor patients' long-term recovery from lung ailments such as tuberculosis (TB) and COVID-19.  ( 3 min )
    LEMON: LanguagE ModeL for Negative Sampling of Knowledge Graph Embeddings. (arXiv:2203.04703v2 [cs.AI] UPDATED)
    Knowledge Graph Embedding models have become an important area of machine learning.Those models provide a latent representation of entities and relations in a knowledge graph which can then be used in downstream machine learning tasks such as link prediction. The learning process of such models can be performed by contrasting positive and negative triples. While all triples of a KG are considered positive, negative triples are usually not readily available. Therefore, the choice of the sampling method to obtain the negative triples play a crucial role in the performance and effectiveness of Knowledge Graph Embedding models. Most of the current methods fetch negative samples from a random distribution of entities in the underlying Knowledge Graph which also often includes meaningless triples. Other known methods use adversarial techniques or generative neural networks which consequently reduce the efficiency of the process. In this paper, we propose an approach for generating informative negative samples considering available complementary knowledge about entities. Particularly, Pre-trained Language Models are used to form neighborhood clusters by utilizing the distances between entities to obtain representations of symbolic entities via their textual information. Our comprehensive evaluations demonstrate the effectiveness of the proposed approach on benchmark Knowledge Graphs with textual information for the link prediction task.
    A Note on the Efficient Evaluation of PAC-Bayes Bounds. (arXiv:2209.05188v1 [cs.LG])
    When utilising PAC-Bayes theory for risk certification, it is usually necessary to estimate and bound the Gibbs risk of the PAC-Bayes posterior. Many works in the literature employ a method for this which requires a large number of passes of the dataset, incurring high computational cost. This manuscript presents a very general alternative which makes computational savings on the order of the dataset size.  ( 2 min )
    The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks. (arXiv:2108.11489v3 [stat.ML] UPDATED)
    The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign overfitting}$ has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk when the covariates satisfy sub-Gaussianity and anti-concentration properties, and the noise is independent and sub-Gaussian. By leveraging recent results that characterize the implicit bias of this estimator, our bounds emphasize the role of both the quality of the initialization as well as the properties of the data covariance matrix in achieving low excess risk.  ( 2 min )
    Preserving Privacy in Federated Learning with Ensemble Cross-Domain Knowledge Distillation. (arXiv:2209.04599v1 [cs.CR])
    Federated Learning (FL) is a machine learning paradigm where local nodes collaboratively train a central model while the training data remains decentralized. Existing FL methods typically share model parameters or employ co-distillation to address the issue of unbalanced data distribution. However, they suffer from communication bottlenecks. More importantly, they risk privacy leakage. In this work, we develop a privacy preserving and communication efficient method in a FL framework with one-shot offline knowledge distillation using unlabeled, cross-domain public data. We propose a quantized and noisy ensemble of local predictions from completely trained local models for stronger privacy guarantees without sacrificing accuracy. Based on extensive experiments on image classification and text classification tasks, we show that our privacy-preserving method outperforms baseline FL algorithms with superior performance in both accuracy and communication efficiency.
    Diffusion Models in Vision: A Survey. (arXiv:2209.04747v1 [cs.CV])
    Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original input data by learning to gradually reverse the diffusion process, step by step. Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens, i.e. low speeds due to the high number of steps involved during sampling. In this survey, we provide a comprehensive review of articles on denoising diffusion models applied in vision, comprising both theoretical and practical contributions in the field. First, we identify and present three generic diffusion modeling frameworks, which are based on denoising diffusion probabilistic models, noise conditioned score networks, and stochastic differential equations. We further discuss the relations between diffusion models and other deep generative models, including variational auto-encoders, generative adversarial networks, energy-based models, autoregressive models and normalizing flows. Then, we introduce a multi-perspective categorization of diffusion models applied in computer vision. Finally, we illustrate the current limitations of diffusion models and envision some interesting directions for future research.
    Unsupervised Domain Adaptation for Extra Features in the Target Domain Using Optimal Transport. (arXiv:2209.04594v1 [cs.LG])
    Domain adaptation aims to transfer knowledge of labeled instances obtained from a source domain to a target domain to fill the gap between the domains. Most domain adaptation methods assume that the source and target domains have the same dimensionality. Methods that are applicable when the number of features is different in each domain have rarely been studied, especially when no label information is given for the test data obtained from the target domain. In this paper, it is assumed that common features exist in both domains and that extra (new additional) features are observed in the target domain; hence, the dimensionality of the target domain is higher than that of the source domain. To leverage the homogeneity of the common features, the adaptation between these source and target domains is formulated as an optimal transport (OT) problem. In addition, a learning bound in the target domain for the proposed OT-based method is derived. The proposed algorithm is validated using both simulated and real-world data.
    Structured Q-learning For Antibody Design. (arXiv:2209.04698v1 [cs.LG])
    Optimizing combinatorial structures is core to many real-world problems, such as those encountered in life sciences. For example, one of the crucial steps involved in antibody design is to find an arrangement of amino acids in a protein sequence that improves its binding with a pathogen. Combinatorial optimization of antibodies is difficult due to extremely large search spaces and non-linear objectives. Even for modest antibody design problems, where proteins have a sequence length of eleven, we are faced with searching over 2.05 x 10^14 structures. Applying traditional Reinforcement Learning algorithms such as Q-learning to combinatorial optimization results in poor performance. We propose Structured Q-learning (SQL), an extension of Q-learning that incorporates structural priors for combinatorial optimization. Using a molecular docking simulator, we demonstrate that SQL finds high binding energy sequences and performs favourably against baselines on eight challenging antibody design tasks, including designing antibodies for SARS-COV.
    Robust Adaptive Submodular Maximization. (arXiv:2107.11333v4 [cs.DS] UPDATED)
    The goal of a sequential decision making problem is to design an interactive policy that adaptively selects a group of items, each selection is based on the feedback from the past, in order to maximize the expected utility of selected items. It has been shown that the utility functions of many real-world applications are adaptive submodular. However, most of existing studies on adaptive submodular optimization focus on the average-case. Unfortunately, a policy that has a good average-case performance may have very poor performance under the worst-case realization. In this study, we propose to study two variants of adaptive submodular optimization problems, namely, worst-case adaptive submodular maximization and robust submodular maximization. The first problem aims to find a policy that maximizes the worst-case utility and the latter one aims to find a policy, if any, that achieves both near optimal average-case utility and worst-case utility simultaneously. We introduce a new class of stochastic functions, called \emph{worst-case submodular function}. For the worst-case adaptive submodular maximization problem subject to a $p$-system constraint, we develop an adaptive worst-case greedy policy that achieves a $\frac{1}{p+1}$ approximation ratio against the optimal worst-case utility if the utility function is worst-case submodular. For the robust adaptive submodular maximization problem subject to cardinality constraints (resp. partition matroid constraints), if the utility function is both worst-case submodular and adaptive submodular, we develop a hybrid adaptive policy that achieves an approximation close to $1-e^{-\frac{1}{2}}$ (resp. $1/3$) under both worst- and average-case settings simultaneously. We also describe several applications of our theoretical results, including pool-base active learning, stochastic submodular set cover and adaptive viral marketing.  ( 3 min )
    Free Energy Node Embedding via Generalized Skip-gram with Negative Sampling. (arXiv:2105.09182v2 [cs.LG] UPDATED)
    A widely established set of unsupervised node embedding methods can be interpreted as consisting of two distinctive steps: i) the definition of a similarity matrix based on the graph of interest followed by ii) an explicit or implicit factorization of such matrix. Inspired by this viewpoint, we propose improvements in both steps of the framework. On the one hand, we propose to encode node similarities based on the free energy distance, which interpolates between the shortest path and the commute time distances, thus, providing an additional degree of flexibility. On the other hand, we propose a matrix factorization method based on a loss function that generalizes that of the skip-gram model with negative sampling to arbitrary similarity matrices. Compared with factorizations based on the widely used $\ell_2$ loss, the proposed method can better preserve node pairs associated with higher similarity scores. Moreover, it can be easily implemented using advanced automatic differentiation toolkits and computed efficiently by leveraging GPU resources. Node clustering, node classification, and link prediction experiments on real-world datasets demonstrate the effectiveness of incorporating free-energy-based similarities as well as the proposed matrix factorization compared with state-of-the-art alternatives.  ( 3 min )
    A Deterministic Approximation to Neural SDEs. (arXiv:2006.08973v6 [cs.LG] UPDATED)
    Neural Stochastic Differential Equations (NSDEs) model the drift and diffusion functions of a stochastic process as neural networks. While NSDEs are known to make accurate predictions, their uncertainty quantification properties have been remained unexplored so far. We report the empirical finding that obtaining well-calibrated uncertainty estimations from NSDEs is computationally prohibitive. As a remedy, we develop a computationally affordable deterministic scheme which accurately approximates the transition kernel, when dynamics is governed by a NSDE. Our method introduces a bidimensional moment matching algorithm: vertical along the neural net layers and horizontal along the time direction, which benefits from an original combination of effective approximations. Our deterministic approximation of the transition kernel is applicable to both training and prediction. We observe in multiple experiments that the uncertainty calibration quality of our method can be matched by Monte Carlo sampling only after introducing high computational cost. Thanks to the numerical stability of deterministic training, our method also improves prediction accuracy.  ( 3 min )
    GRNN: Generative Regression Neural Network -- A Data Leakage Attack for Federated Learning. (arXiv:2105.00529v3 [cs.LG] UPDATED)
    Data privacy has become an increasingly important issue in Machine Learning (ML), where many approaches have been developed to tackle this challenge, e.g. cryptography (Homomorphic Encryption (HE), Differential Privacy (DP), etc.) and collaborative training (Secure Multi-Party Computation (MPC), Distributed Learning and Federated Learning (FL)). These techniques have a particular focus on data encryption or secure local computation. They transfer the intermediate information to the third party to compute the final result. Gradient exchanging is commonly considered to be a secure way of training a robust model collaboratively in Deep Learning (DL). However, recent researches have demonstrated that sensitive information can be recovered from the shared gradient. Generative Adversarial Network (GAN), in particular, has shown to be effective in recovering such information. However, GAN based techniques require additional information, such as class labels which are generally unavailable for privacy-preserved learning. In this paper, we show that, in the FL system, image-based privacy data can be easily recovered in full from the shared gradient only via our proposed Generative Regression Neural Network (GRNN). We formulate the attack to be a regression problem and optimize two branches of the generative model by minimizing the distance between gradients. We evaluate our method on several image classification tasks. The results illustrate that our proposed GRNN outperforms state-of-the-art methods with better stability, stronger robustness, and higher accuracy. It also has no convergence requirement to the global FL model. Moreover, we demonstrate information leakage using face re-identification. Some defense strategies are also discussed in this work.  ( 3 min )
    Continual learning benefits from multiple sleep mechanisms: NREM, REM, and Synaptic Downscaling. (arXiv:2209.05245v1 [cs.NE])
    Learning new tasks and skills in succession without losing prior learning (i.e., catastrophic forgetting) is a computational challenge for both artificial and biological neural networks, yet artificial systems struggle to achieve parity with their biological analogues. Mammalian brains employ numerous neural operations in support of continual learning during sleep. These are ripe for artificial adaptation. Here, we investigate how modeling three distinct components of mammalian sleep together affects continual learning in artificial neural networks: (1) a veridical memory replay process observed during non-rapid eye movement (NREM) sleep; (2) a generative memory replay process linked to REM sleep; and (3) a synaptic downscaling process which has been proposed to tune signal-to-noise ratios and support neural upkeep. We find benefits from the inclusion of all three sleep components when evaluating performance on a continual learning CIFAR-100 image classification benchmark. Maximum accuracy improved during training and catastrophic forgetting was reduced during later tasks. While some catastrophic forgetting persisted over the course of network training, higher levels of synaptic downscaling lead to better retention of early tasks and further facilitated the recovery of early task accuracy during subsequent training. One key takeaway is that there is a trade-off at hand when considering the level of synaptic downscaling to use - more aggressive downscaling better protects early tasks, but less downscaling enhances the ability to learn new tasks. Intermediate levels can strike a balance with the highest overall accuracies during training. Overall, our results both provide insight into how to adapt sleep components to enhance artificial continual learning systems and highlight areas for future neuroscientific sleep research to further such systems.  ( 3 min )
    Exploring Autoencoder-based Error-bounded Compression for Scientific Data. (arXiv:2105.11730v6 [cs.LG] UPDATED)
    Error-bounded lossy compression is becoming an indispensable technique for the success of today's scientific projects with vast volumes of data produced during the simulations or instrument data acquisitions. Not only can it significantly reduce data size, but it also can control the compression errors based on user-specified error bounds. Autoencoder (AE) models have been widely used in image compression, but few AE-based compression approaches support error-bounding features, which are highly required by scientific applications. To address this issue, we explore using convolutional autoencoders to improve error-bounded lossy compression for scientific data, with the following three key contributions. (1) We provide an in-depth investigation of the characteristics of various autoencoder models and develop an error-bounded autoencoder-based framework in terms of the SZ model. (2) We optimize the compression quality for main stages in our designed AE-based error-bounded compression framework, fine-tuning the block sizes and latent sizes and also optimizing the compression efficiency of latent vectors. (3) We evaluate our proposed solution using five real-world scientific datasets and comparing them with six other related works. Experiments show that our solution exhibits a very competitive compression quality from among all the compressors in our tests. In absolute terms, it can obtain a much better compression quality (100% ~ 800% improvement in compression ratio with the same data distortion) compared with SZ2.1 and ZFP in cases with a high compression ratio.  ( 3 min )
    ProductAE: Towards Training Larger Channel Codes based on Neural Product Codes. (arXiv:2110.04466v2 [cs.IT] UPDATED)
    There have been significant research activities in recent years to automate the design of channel encoders and decoders via deep learning. Due the dimensionality challenge in channel coding, it is prohibitively complex to design and train relatively large neural channel codes via deep learning techniques. Consequently, most of the results in the literature are limited to relatively short codes having less than 100 information bits. In this paper, we construct ProductAEs, a computationally efficient family of deep-learning driven (encoder, decoder) pairs, that aim at enabling the training of relatively large channel codes (both encoders and decoders) with a manageable training complexity. We build upon the ideas from classical product codes, and propose constructing large neural codes using smaller code components. More specifically, instead of directly training the encoder and decoder for a large neural code of dimension $k$ and blocklength $n$, we provide a framework that requires training neural encoders and decoders for the code parameters $(n_1,k_1)$ and $(n_2,k_2)$ such that $n_1 n_2=n$ and $k_1 k_2=k$. Our training results show significant gains, over all ranges of signal-to-noise ratio (SNR), for a code of parameters $(225,100)$ and a moderate-length code of parameters $(441,196)$, over polar codes under successive cancellation (SC) decoder. Moreover, our results demonstrate meaningful gains over Turbo Autoencoder (TurboAE) and state-of-the-art classical codes. This is the first work to design product autoencoders and a pioneering work on training large channel codes.  ( 3 min )
    Solving non-linear Kolmogorov equations in large dimensions by using deep learning: a numerical comparison of discretization schemes. (arXiv:2012.07747v3 [math.NA] UPDATED)
    Non-linear partial differential Kolmogorov equations are successfully used to describe a wide range of time dependent phenomena, in natural sciences, engineering or even finance. For example, in physical systems, the Allen-Cahn equation describes pattern formation associated to phase transitions. In finance, instead, the Black-Scholes equation describes the evolution of the price of derivative investment instruments. Such modern applications often require to solve these equations in high-dimensional regimes in which classical approaches are ineffective. Recently, an interesting new approach based on deep learning has been introduced by E, Han, and Jentzen [1][2]. The main idea is to construct a deep network which is trained from the samples of discrete stochastic differential equations underlying Kolmogorov's equation. The network is able to approximate, numerically at least, the solutions of the Kolmogorov equation with polynomial complexity in whole spatial domains. In this contribution we study variants of the deep networks by using different discretizations schemes of the stochastic differential equation. We compare the performance of the associated networks, on benchmarked examples, and show that, for some discretization schemes, improvements in the accuracy are possible without affecting the observed computational complexity.  ( 3 min )
    Monitoring of functional profiles combining the notion of Fr\'echet mean and the framework of deformation models with application in ambient air pollution surveillance. (arXiv:2010.02968v2 [stat.ME] UPDATED)
    A framework suitable for monitoring functional profiles combining the notion of Fr\'echet mean and the concept of deformation models is developed and proposed. The generalized sense of mean that the notion of the Fr\'echet mean offers is employed to capture the typical functional shape of the data, while the concept of deformation models allows for interpretable parameterizations of profile's deviations from the typical shape. Functional EWMA-type control charts are built and proposed based on shape characteristics of the functional data, allowing for (a) identifying shifts from the in-control behaviour and (b) providing causal relationships of the potential shifts with significant deviances of certain qualitative characteristics (e.g amplitude or phase deformations). The functional monitoring scheme is implemented to assess ambient air pollution. In particular, the method is implemented to a synthetic data example to assess its performance under various conditions, and to a real-world example using sensor data from an area in the city of Athens, where air pollutants profiles and their characteristics are successfully analyzed and out-of-control behaviours are identified.  ( 3 min )
    Memorization and Generalization in Neural Code Intelligence Models. (arXiv:2106.08704v3 [cs.LG] UPDATED)
    Deep Neural Networks (DNNs) are increasingly being used in software engineering and code intelligence tasks. These are powerful tools that are capable of learning highly generalizable patterns from large datasets through millions of parameters. At the same time, their large capacity can render them prone to memorizing data points. Recent work suggests that the memorization risk manifests especially strongly when the training dataset is noisy, involving many ambiguous or questionable samples, and memorization is the only recourse. The goal of this paper is to evaluate and compare the extent of memorization and generalization in neural code intelligence models. It aims to provide insights on how memorization may impact the learning behavior of neural models in code intelligence systems. To observe the extent of memorization in models, we add random noise to the original training dataset and use various metrics to quantify the impact of noise on various aspects of training and testing. We evaluate several state-of-the-art neural code intelligence models and benchmarks based on Java, Python, and Ruby codebases. Our results highlight important risks: millions of trainable parameters allow the neural networks to memorize anything, including noisy data, and provide a false sense of generalization. We observed all models manifest some forms of memorization. This can be potentially troublesome in most code intelligence tasks where they rely on rather noise-prone and repetitive data sources, such as code from GitHub. To the best of our knowledge, we provide the first study to quantify memorization effects in the domain of software engineering and code intelligence systems. This work raises awareness and provides new insights into important issues of training neural models in code intelligence systems that are usually overlooked by software engineering researchers.  ( 3 min )
    On the Hyperparameters in Stochastic Gradient Descent with Momentum. (arXiv:2108.03947v2 [cs.LG] UPDATED)
    Following the same routine as [SSJ20], we continue to present the theoretical analysis for stochastic gradient descent with momentum (SGD with momentum) in this paper. Differently, for SGD with momentum, we demonstrate it is the two hyperparameters together, the learning rate and the momentum coefficient, that play the significant role for the linear rate of convergence in non-convex optimization. Our analysis is based on the use of a hyperparameters-dependent stochastic differential equation (hp-dependent SDE) that serves as a continuous surrogate for SGD with momentum. Similarly, we establish the linear convergence for the continuous-time formulation of SGD with momentum and obtain an explicit expression for the optimal linear rate by analyzing the spectrum of the Kramers-Fokker-Planck operator. By comparison, we demonstrate how the optimal linear rate of convergence and the final gap for SGD only about the learning rate varies with the momentum coefficient increasing from zero to one when the momentum is introduced. Then, we propose a mathematical interpretation why the SGD with momentum converges faster and more robust about the learning rate than the standard SGD in practice. Finally, we show the Nesterov momentum under the existence of noise has no essential difference with the standard momentum.  ( 3 min )
    A Deep Learning Approach To Estimation Using Measurements Received Over a Network. (arXiv:2201.08020v2 [cs.LG] UPDATED)
    We propose a novel deep neural network (DNN) based approximation architecture to learn estimates of measurements. We detail an algorithm that enables training of the DNN. The DNN estimator only uses measurements, if and when they are received over a communication network. The measurements are communicated over a network as packets, at a rate unknown to the estimator. Packets may suffer drops and need retransmission. They may suffer waiting delays as they traverse a network path. Works on estimation often assume knowledge of the dynamic model of the measured system, which may not be available in practice. The DNN estimator doesn't assume knowledge of the dynamic system model or the communication network. It doesn't require a history of measurements, often used by other works. The DNN estimator results in significantly smaller average estimation error than the commonly used Time-varying Kalman Filter and the Unscented Kalman Filter, in simulations of linear and nonlinear dynamic systems. The DNN need not be trained separately for different communications network settings. It is robust to errors in estimation of network delays that occur due to imperfect time synchronization between the measurement source and the estimator. Last but not the least, our simulations shed light on the rate of updates that result in low estimation error.  ( 3 min )
    Exploration and Incentives in Reinforcement Learning. (arXiv:2103.00360v3 [cs.LG] UPDATED)
    How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously. To the best of our knowledge, this is the first work to consider mechanism design in a stateful, reinforcement learning setting.  ( 2 min )
    Forecasting Daily COVID-19 Related Calls in VA Health Care System: Predictive Model Development. (arXiv:2111.13980v3 [cs.LG] UPDATED)
    Background: COVID-19 has become a challenge worldwide and properly planning of medical resources is the key to combating COVID-19. In the US Veteran Affairs Health Care System (VA), many of the enrollees are susceptible to COVID-19. Predicting the COVID-19 to allocate medical resources promptly becomes a critical issue. When the VA enrollees have COVID-19 symptoms, it is recommended that their first step should be to call the VA Call Center. For confirmed COVID-19 patients, the median time from the first symptom to hospital admission was seven days. By predicting the number of COVID-19 related calls, we could predict imminent surges in healthcare use and plan medical resources ahead. Objective: The study aims to develop a method to forecast the daily number of COVID-19 related calls for each of the 110 VA medical centers. Methods: In the proposed method, we pre-trained a model using a cluster of medical centers and fine-tuned it for individual medical centers. At the cluster level, we performed feature selection to select significant features and automatic hyper-parameter search to select optimal hyper-parameter value combinations for the model. Conclusions: This study proposed an accurate method to forecast the daily number of COVID-19 related calls for VA medical centers. The proposed method was able to overcome modeling challenges by grouping similar medical centers into clusters to enlarge the dataset for training models, and using hyper-parameter search to automatically find optimal hyper-parameter value combinations for models. With the proposed method, surges in health care can be predicted ahead. This allows health care practitioners to better plan medical resources and combat COVID-19.  ( 3 min )
    SeanNet: Semantic Understanding Network for Localization Under Object Dynamics. (arXiv:2110.02276v2 [cs.RO] UPDATED)
    We aim for domestic robots to perform long-term indoor service. Under the object-level scene dynamics induced by daily human activities, a robot needs to robustly localize itself in the environment subject to scene uncertainties. Previous works have addressed visual-based localization in static environments, yet the object-level scene dynamics challenge existing methods for the long-term deployment of the robot. This paper proposes a SEmantic understANding Network (SeanNet) architecture that enables an effective learning process with coupled visual and semantic inputs. With a dataset that contains object dynamics, we propose a cascaded contrastive learning scheme to train the SeanNet for learning a vector scene embedding. Subsequently, we can measure the similarity between the current observed scene and the target scene, whereby enables robust localization under object-level dynamics. In our experiments, we benchmark SeanNet against state-of-the-art image-encoding networks (baselines) on scene similarity measures. The SeanNet architecture with the proposed training method can achieve an 85.02\% accuracy which is higher than baselines. We further integrate the SeanNet and the other networks as the localizers into a visual navigation application. We demonstrate that SeanNet achieves higher success rates compared to the baselines.  ( 3 min )
    Spotting Virus from Satellites: Modeling the Circulation of West Nile Virus Through Graph Neural Networks. (arXiv:2209.05251v1 [cs.CV])
    The occurrence of West Nile Virus (WNV) represents one of the most common mosquito-borne zoonosis viral infections. Its circulation is usually associated with climatic and environmental conditions suitable for vector proliferation and virus replication. On top of that, several statistical models have been developed to shape and forecast WNV circulation: in particular, the recent massive availability of Earth Observation (EO) data, coupled with the continuous advances in the field of Artificial Intelligence, offer valuable opportunities. In this paper, we seek to predict WNV circulation by feeding Deep Neural Networks (DNNs) with satellite images, which have been extensively shown to hold environmental and climatic features. Notably, while previous approaches analyze each geographical site independently, we propose a spatial-aware approach that considers also the characteristics of close sites. Specifically, we build upon Graph Neural Networks (GNN) to aggregate features from neighbouring places, and further extend these modules to consider multiple relations, such as the difference in temperature and soil moisture between two sites, as well as the geographical distance. Moreover, we inject time-related information directly into the model to take into account the seasonality of virus spread. We design an experimental setting that combines satellite images - from Landsat and Sentinel missions - with ground truth observations of WNV circulation in Italy. We show that our proposed Multi-Adjacency Graph Attention Network (MAGAT) consistently leads to higher performance when paired with an appropriate pre-training stage. Finally, we assess the importance of each component of MAGAT in our ablation studies.  ( 3 min )
    Weight Expansion: A New Perspective on Dropout and Generalization. (arXiv:2201.09209v2 [cs.LG] UPDATED)
    While dropout is known to be a successful regularization technique, insights into the mechanisms that lead to this success are still lacking. We introduce the concept of \emph{weight expansion}, an increase in the signed volume of a parallelotope spanned by the column or row vectors of the weight covariance matrix, and show that weight expansion is an effective means of increasing the generalization in a PAC-Bayesian setting. We provide a theoretical argument that dropout leads to weight expansion and extensive empirical support for the correlation between dropout and weight expansion. To support our hypothesis that weight expansion can be regarded as an \emph{indicator} of the enhanced generalization capability endowed by dropout, and not just as a mere by-product, we have studied other methods that achieve weight expansion (resp.\ contraction), and found that they generally lead to an increased (resp.\ decreased) generalization ability. This suggests that dropout is an attractive regularizer, because it is a computationally cheap method for obtaining weight expansion. This insight justifies the role of dropout as a regularizer, while paving the way for identifying regularizers that promise improved generalization through weight expansion.  ( 3 min )
    On the Stability of Nonlinear Receding Horizon Control: A Geometric Perspective. (arXiv:2103.15010v2 [math.OC] UPDATED)
    The widespread adoption of nonlinear Receding Horizon Control (RHC) strategies by industry has led to more than 30 years of intense research efforts to provide stability guarantees for these methods. However, current theoretical guarantees require that each (generally nonconvex) planning problem can be solved to (approximate) global optimality, which is an unrealistic requirement for the derivative-based local optimization methods generally used in practical implementations of RHC. This paper takes the first step towards understanding stability guarantees for nonlinear RHC when the inner planning problem is solved to first-order stationary points, but not necessarily global optima. Special attention is given to feedback linearizable systems, and a mixture of positive and negative results are provided. We establish that, under certain strong conditions, first-order solutions to RHC exponentially stabilize linearizable systems. Crucially, this guarantee requires that state costs applied to the planning problems are in a certain sense `compatible' with the global geometry of the system, and a simple counter-example demonstrates the necessity of this condition. These results highlight the need to rethink the role of global geometry in the context of optimization-based control.  ( 3 min )
    Gradient-Free Methods for Saddle-Point Problem. (arXiv:2005.05913v4 [math.OC] UPDATED)
    In the paper, we generalize the approach Gasnikov et. al, 2017, which allows to solve (stochastic) convex optimization problems with an inexact gradient-free oracle, to the convex-concave saddle-point problem. The proposed approach works, at least, like the best existing approaches. But for a special set-up (simplex type constraints and closeness of Lipschitz constants in 1 and 2 norms) our approach reduces $\frac{n}{\log n}$ times the required number of oracle calls (function calculations). Our method uses a stochastic approximation of the gradient via finite differences. In this case, the function must be specified not only on the optimization set itself, but in a certain neighbourhood of it. In the second part of the paper, we analyze the case when such an assumption cannot be made, we propose a general approach on how to modernize the method to solve this problem, and also we apply this approach to particular cases of some classical sets.  ( 3 min )
    Subquadratic Kronecker Regression with Applications to Tensor Decomposition. (arXiv:2209.04876v1 [cs.DS])
    Kronecker regression is a highly-structured least squares problem $\min_{\mathbf{x}} \lVert \mathbf{K}\mathbf{x} - \mathbf{b} \rVert_{2}^2$, where the design matrix $\mathbf{K} = \mathbf{A}^{(1)} \otimes \cdots \otimes \mathbf{A}^{(N)}$ is a Kronecker product of factor matrices. This regression problem arises in each step of the widely-used alternating least squares (ALS) algorithm for computing the Tucker decomposition of a tensor. We present the first subquadratic-time algorithm for solving Kronecker regression to a $(1+\varepsilon)$-approximation that avoids the exponential term $O(\varepsilon^{-N})$ in the running time. Our techniques combine leverage score sampling and iterative methods. By extending our approach to block-design matrices where one block is a Kronecker product, we also achieve subquadratic-time algorithms for (1) Kronecker ridge regression and (2) updating the factor matrix of a Tucker decomposition in ALS, which is not a pure Kronecker regression problem, thereby improving the running time of all steps of Tucker ALS. We demonstrate the speed and accuracy of this Kronecker regression algorithm on synthetic data and real-world image tensors.  ( 2 min )
    The shape and simplicity biases of adversarially robust ImageNet-trained CNNs. (arXiv:2006.09373v6 [cs.CV] UPDATED)
    Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years. Yet, vanilla CNNs often fall short in generalizing to adversarial or out-of-distribution (OOD) examples which humans demonstrate superior performance. Adversarial training is a leading learning algorithm for improving the robustness of CNNs on adversarial and OOD data; however, little is known about the properties, specifically the shape bias and internal features learned inside adversarially-robust CNNs. In this paper, we perform a thorough, systematic study to understand the shape bias and some internal mechanisms that enable the generalizability of AlexNet, GoogLeNet, and ResNet-50 models trained via adversarial training. We find that while standard ImageNet classifiers have a strong texture bias, their R counterparts rely heavily on shapes. Remarkably, adversarial training induces three simplicity biases into hidden neurons in the process of "robustifying" CNNs. That is, each convolutional neuron in R networks often changes to detecting (1) pixel-wise smoother patterns, i.e., a mechanism that blocks high-frequency noise from passing through the network; (2) more lower-level features i.e. textures and colors (instead of objects);and (3) fewer types of inputs. Our findings reveal the interesting mechanisms that made networks more adversarially robust and also explain some recent findings e.g., why R networks benefit from a much larger capacity (Xie et al. 2020) and can act as a strong image prior in image synthesis (Santurkar et al. 2019).  ( 3 min )
    $\mu$DARTS: Model Uncertainty-Aware Differentiable Architecture Search. (arXiv:2107.11500v2 [cs.LG] UPDATED)
    We present a Model Uncertainty-aware Differentiable ARchiTecture Search ($\mu$DARTS) that optimizes neural networks to simultaneously achieve high accuracy and low uncertainty. We introduce concrete dropout within DARTS cells and include a Monte-Carlo regularizer within the training loss to optimize the concrete dropout probabilities. A predictive variance term is introduced in the validation loss to enable searching for architecture with minimal model uncertainty. The experiments on CIFAR10, CIFAR100, SVHN, and ImageNet verify the effectiveness of $\mu$DARTS in improving accuracy and reducing uncertainty compared to existing DARTS methods. Moreover, the final architecture obtained from $\mu$DARTS shows higher robustness to noise at the input image and model parameters compared to the architecture obtained from existing DARTS methods.  ( 2 min )
    A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing. (arXiv:2209.05278v1 [cs.LG])
    Amazon Customer Service provides real-time support for millions of customer contacts every year. While bot-resolver helps automate some traffic, we still see high demand for human agents, also called subject matter experts (SMEs). Customers outreach with questions in different domains (return policy, device troubleshooting, etc.). Depending on their training, not all SMEs are eligible to handle all contacts. Routing contacts to eligible SMEs turns out to be a non-trivial problem because SMEs' domain eligibility is subject to training quality and can change over time. To optimally recommend SMEs while simultaneously learning the true eligibility status, we propose to formulate the routing problem with a nonparametric contextual bandit algorithm (K-Boot) plus an eligibility control (EC) algorithm. K-Boot models reward with a kernel smoother on similar past samples selected by $k$-NN, and Bootstrap Thompson Sampling for exploration. EC filters arms (SMEs) by the initially system-claimed eligibility and dynamically validates the reliability of this information. The proposed K-Boot is a general bandit algorithm, and EC is applicable to other bandits. Our simulation studies show that K-Boot performs on par with state-of-the-art Bandit models, and EC boosts K-Boot performance when stochastic eligibility signal exists.  ( 2 min )
    Detecting Network-based Internet Censorship via Latent Feature Representation Learning. (arXiv:2209.05152v1 [cs.LG])
    Internet censorship is a phenomenon of societal importance and attracts investigation from multiple disciplines. Several research groups, such as Censored Planet, have deployed large scale Internet measurement platforms to collect network reachability data. However, existing studies generally rely on manually designed rules (i.e., using censorship fingerprints) to detect network-based Internet censorship from the data. While this rule-based approach yields a high true positive detection rate, it suffers from several challenges: it requires human expertise, is laborious, and cannot detect any censorship not captured by the rules. Seeking to overcome these challenges, we design and evaluate a classification model based on latent feature representation learning and an image-based classification model to detect network-based Internet censorship. To infer latent feature representations from network reachability data, we propose a sequence-to-sequence autoencoder to capture the structure and the order of data elements in the data. To estimate the probability of censorship events from the inferred latent features, we rely on a densely connected multi-layer neural network model. Our image-based classification model encodes a network reachability data record as a gray-scale image and classifies the image as censored or not using a dense convolutional neural network. We compare and evaluate both approaches using data sets from Censored Planet via a hold-out evaluation. Both classification models are capable of detecting network-based Internet censorship as we were able to identify instances of censorship not detected by the known fingerprints. Latent feature representations likely encode more nuances in the data since the latent feature learning approach discovers a greater quantity, and a more diverse set, of new censorship instances.  ( 3 min )
    Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization. (arXiv:2105.05612v3 [cs.LG] UPDATED)
    Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features and can ignore complex, equally-predictive ones. This simplicity bias can explain their lack of robustness out of distribution (OOD). The more complex the task to learn, the more likely it is that statistical artifacts (i.e. selection biases, spurious correlations) are simpler than the mechanisms to learn. We demonstrate that the simplicity bias can be mitigated and OOD generalization improved. We train a set of similar models to fit the data in different ways using a penalty on the alignment of their input gradients. We show theoretically and empirically that this induces the learning of more complex predictive patterns. OOD generalization fundamentally requires information beyond i.i.d. examples, such as multiple training environments, counterfactual examples, or other side information. Our approach shows that we can defer this requirement to an independent model selection stage. We obtain SOTA results in visual recognition on biased data and generalization across visual domains. The method - the first to evade the simplicity bias - highlights the need for a better understanding and control of inductive biases in deep learning.  ( 3 min )
    Stochastic Compositional Gradient Descent under Compositional Constraints. (arXiv:2012.09400v4 [cs.LG] UPDATED)
    This work studies constrained stochastic optimization problems where the objective and constraint functions are convex and expressed as compositions of stochastic functions. The problem arises in the context of fair classification, fair regression, and the design of queuing systems. Of particular interest is the large-scale setting where an oracle provides the stochastic gradients of the constituent functions, and the goal is to solve the problem with a minimal number of calls to the oracle. Owing to the compositional form, the stochastic gradients provided by the oracle do not yield unbiased estimates of the objective or constraint gradients. Instead, we construct approximate gradients by tracking the inner function evaluations, resulting in a quasi-gradient saddle point algorithm. We prove that the proposed algorithm is guaranteed to find the optimal and feasible solution almost surely. We further establish that the proposed algorithm requires $\mathcal{O}(1/\epsilon^4)$ data samples in order to obtain an $\epsilon$-approximate optimal point while also ensuring zero constraint violation. The result matches the sample complexity of the stochastic compositional gradient descent method for unconstrained problems and improves upon the best-known sample complexity results for the constrained settings. The efficacy of the proposed algorithm is tested on both fair classification and fair regression problems. The numerical results show that the proposed algorithm outperforms the state-of-the-art algorithms in terms of the convergence rate.  ( 3 min )
    Ordinal Graph Gamma Belief Network for Social Recommender Systems. (arXiv:2209.05106v1 [cs.IR])
    To build recommender systems that not only consider user-item interactions represented as ordinal variables, but also exploit the social network describing the relationships between the users, we develop a hierarchical Bayesian model termed ordinal graph factor analysis (OGFA), which jointly models user-item and user-user interactions. OGFA not only achieves good recommendation performance, but also extracts interpretable latent factors corresponding to representative user preferences. We further extend OGFA to ordinal graph gamma belief network, which is a multi-stochastic-layer deep probabilistic model that captures the user preferences and social communities at multiple semantic levels. For efficient inference, we develop a parallel hybrid Gibbs-EM algorithm, which exploits the sparsity of the graphs and is scalable to large datasets. Our experimental results show that the proposed models not only outperform recent baselines on recommendation datasets with explicit or implicit feedback, but also provide interpretable latent representations.  ( 2 min )
    Optimal mesh generation for a blade passage using deep reinforcement learning. (arXiv:2209.05280v1 [cs.LG])
    A mesh generation method that can generate an optimal mesh for a blade passage at a single attempt is developed using deep reinforcement learning (DRL). Unlike the conventional methods, where meshing parameters must be specified by the user or iteratively optimized from scratch for a newly given geometry, the developed method employs DRL-based multi-condition (MC) optimization to define meshing parameters for various geometries optimally. The method involves the following steps: (1) development of a base algorithm for structured mesh generation of a blade passage; (2) formulation of an MC optimization problem to optimize meshing parameters introduced while developing the base algorithm; and (3) development of a DRL-based mesh generation algorithm by solving the MC optimization problem using DRL. As a result, the developed algorithm is able to successfully generate optimal meshes at a single trial for various blades.  ( 2 min )
    Fairness in Forecasting of Observations of Linear Dynamical Systems. (arXiv:2209.05274v1 [cs.LG])
    In machine learning, training data often capture the behaviour of multiple subgroups of some underlying human population. When the nature of training data for subgroups are not controlled carefully, under-representation bias arises. To counter this effect we introduce two natural notions of subgroup fairness and instantaneous fairness to address such under-representation bias in time-series forecasting problems. Here we show globally convergent methods for the fairness-constrained learning problems using hierarchies of convexifications of non-commutative polynomial optimisation problems. Our empirical results on a biased data set motivated by insurance applications and the well-known COMPAS data set demonstrate the efficacy of our methods. We also show that by exploiting sparsity in the convexifications, we can reduce the run time of our methods considerably.  ( 2 min )
    Modeling Dependent Structure for Utterances in ASR Evaluation. (arXiv:2209.05281v1 [eess.AS])
    The bootstrap resampling method has been popular for performing significance analysis on word error rate (WER) in automatic speech recognition (ASR) evaluations. To deal with the issue of dependent speech data, the blockwise bootstrap approach is also proposed that by dividing utterances into uncorrelated blocks, it resamples these blocks instead of original data. However, it is always nontrivial to uncover the dependent structure among utterances, which could lead to subjective findings in statistical testing. In this paper, we present graphical lasso based methods to explicitly model such dependency and estimate the independent blocks of utterances in a rigorous way. Then the blockwise bootstrap is applied on top of the inferred blocks. We show that the resulting variance estimator for WER is consistent under mild conditions. We also demonstrate the validity of proposed approach on LibriSpeech data.  ( 2 min )
    A Differentiable Loss Function for Learning Heuristics in A*. (arXiv:2209.05206v1 [cs.LG])
    Optimization of heuristic functions for the A* algorithm, realized by deep neural networks, is usually done by minimizing square root loss of estimate of the cost to goal values. This paper argues that this does not necessarily lead to a faster search of A* algorithm since its execution relies on relative values instead of absolute ones. As a mitigation, we propose a L* loss, which upper-bounds the number of excessively expanded states inside the A* search. The L* loss, when used in the optimization of state-of-the-art deep neural networks for automated planning in maze domains like Sokoban and maze with teleports, significantly improves the fraction of solved problems, the quality of founded plans, and reduces the number of expanded states to approximately 50%  ( 2 min )
    Amortised Inference in Structured Generative Models with Explaining Away. (arXiv:2209.05212v1 [cs.LG])
    A key goal of unsupervised learning is to go beyond density estimation and sample generation to reveal the structure inherent within observed data. Such structure can be expressed in the pattern of interactions between explanatory latent variables captured through a probabilistic graphical model. Although the learning of structured graphical models has a long history, much recent work in unsupervised modelling has instead emphasised flexible deep-network-based generation, either transforming independent latent generators to model complex data or assuming that distinct observed variables are derived from different latent nodes. Here, we extend the output of amortised variational inference to incorporate structured factors over multiple variables, able to capture the observation-induced posterior dependence between latents that results from "explaining away" and thus allow complex observations to depend on multiple nodes of a structured graph. We show that appropriately parameterised factors can be combined efficiently with variational message passing in elaborate graphical structures. We instantiate the framework based on Gaussian Process Factor Analysis models, and empirically evaluate its improvement over existing methods on synthetic data with known generative processes. We then fit the structured model to high-dimensional neural spiking time-series from the hippocampus of freely moving rodents, demonstrating that the model identifies latent signals that correlate with behavioural covariates.  ( 2 min )
    An Improved Algorithm For Online Reranking. (arXiv:2209.04870v1 [cs.DS])
    We study a fundamental model of online preference aggregation, where an algorithm maintains an ordered list of $n$ elements. An input is a stream of preferred sets $R_1, R_2, \dots, R_t, \dots$. Upon seeing $R_t$ and without knowledge of any future sets, an algorithm has to rerank elements (change the list ordering), so that at least one element of $R_t$ is found near the list front. The incurred cost is a sum of the list update costs (the number of swaps of neighboring list elements) and access costs (position of the first element of $R_t$ on the list). This scenario occurs naturally in applications such as ordering items in an online shop using aggregated preferences of shop customers. The theoretical underpinning of this problem is known as Min-Sum Set Cover. Unlike previous work (Fotakis et al., ICALP 2020, NIPS 2020) that mostly studied the performance of an online algorithm ALG against the static optimal solution (a single optimal list ordering), in this paper, we study an arguably harder variant where the benchmark is the provably stronger optimal dynamic solution OPT (that may also modify the list ordering). In terms of an online shop, this means that the aggregated preferences of its user base evolve with time. We construct a computationally efficient randomized algorithm whose competitive ratio (ALG-to-OPT cost ratio) is $O(r^2)$ and prove the existence of a deterministic $O(r^4)$-competitive algorithm. Here, $r$ is the maximum cardinality of sets $R_t$. This is the first algorithm whose ratio does not depend on $n$: the previously best algorithm for this problem was $O(r^{3/2} \cdot \sqrt{n})$-competitive and $\Omega(r)$ is a lower bound on the performance of any deterministic online algorithm.  ( 3 min )
    Exploring Simple and Transferable Recognition-Aware Image Processing. (arXiv:1910.09185v4 [cs.CV] UPDATED)
    Recent progress in image recognition has stimulated the deployment of vision systems at an unprecedented scale. As a result, visual data are now often consumed not only by humans but also by machines. Existing image processing methods only optimize for better human perception, yet the resulting images may not be accurately recognized by machines. This can be undesirable, e.g., the images can be improperly handled by search engines or recommendation systems. In this work, we examine simple approaches to improve machine recognition of processed images: optimizing the recognition loss directly on the image processing network or through an intermediate input transformation model. Interestingly, the processing model's ability to enhance recognition quality can transfer when evaluated on models of different architectures, recognized categories, tasks and training datasets. This makes the methods applicable even when we do not have the knowledge of future recognition models, e.g., when uploading processed images to the Internet. We conduct experiments on multiple image processing tasks paired with ImageNet classification and PASCAL VOC detection as recognition tasks. With these simple yet effective methods, substantial accuracy gain can be achieved with strong transferability and minimal image quality loss. Through a user study we further show that the accuracy gain can transfer to a black-box cloud model. Finally, we try to explain this transferability phenomenon by demonstrating the similarities of different models' decision boundaries. Code is available at https://github.com/liuzhuang13/Transferable_RA .  ( 3 min )
    Reproducibility in machine learning for medical imaging. (arXiv:2209.05097v1 [cs.CV])
    Reproducibility is a cornerstone of science, as the replication of findings is the process through which they become knowledge. It is widely considered that many fields of science are undergoing a reproducibility crisis. This has led to the publications of various guidelines in order to improve research reproducibility. This didactic chapter intends at being an introduction to reproducibility for researchers in the field of machine learning for medical imaging. We first distinguish between different types of reproducibility. For each of them, we aim at defining it, at describing the requirements to achieve it and at discussing its utility. The chapter ends with a discussion on the benefits of reproducibility and with a plea for a non-dogmatic approach to this concept and its implementation in research practice.  ( 2 min )
    SmartKex: Machine Learning Assisted SSH Keys Extraction From The Heap Dump. (arXiv:2209.05243v1 [cs.CR])
    Digital forensics is the process of extracting, preserving, and documenting evidence in digital devices. A commonly used method in digital forensics is to extract data from the main memory of a digital device. However, the main challenge is identifying the important data to be extracted. Several pieces of crucial information reside in the main memory, like usernames, passwords, and cryptographic keys such as SSH session keys. In this paper, we propose SmartKex, a machine-learning assisted method to extract session keys from heap memory snapshots of an OpenSSH process. In addition, we release an openly available dataset and the corresponding toolchain for creating additional data. Finally, we compare SmartKex with naive brute-force methods and empirically show that SmartKex can extract the session keys with high accuracy and high throughput. With the provided resources, we intend to strengthen the research on the intersection between digital forensics, cybersecurity, and machine learning.  ( 2 min )
    Adaptive Perturbation-Based Gradient Estimation for Discrete Latent Variable Models. (arXiv:2209.04862v1 [cs.LG])
    The integration of discrete algorithmic components in deep learning architectures has numerous applications. Recently, Implicit Maximum Likelihood Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient estimators for discrete exponential family distributions, was proposed by combining implicit differentiation through perturbation with the path-wise gradient estimator. However, due to the finite difference approximation of the gradients, it is especially sensitive to the choice of the finite difference step size which needs to be specified by the user. In this work, we present Adaptive IMLE (AIMLE) the first adaptive gradient estimator for complex discrete distributions: it adaptively identifies the target distribution for IMLE by trading off the density of gradient information with the degree of bias in the gradient estimates. We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks. In our experiments, we show that our adaptive gradient estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.  ( 2 min )
    Efficient Approximate Kernel Based Spike Sequence Classification. (arXiv:2209.04952v1 [cs.LG])
    Machine learning (ML) models, such as SVM, for tasks like classification and clustering of sequences, require a definition of distance/similarity between pairs of sequences. Several methods have been proposed to compute the similarity between sequences, such as the exact approach that counts the number of matches between $k$-mers (sub-sequences of length $k$) and an approximate approach that estimates pairwise similarity scores. Although exact methods yield better classification performance, they pose high computational costs, limiting their applicability to a small number of sequences. The approximate algorithms are proven to be more scalable and perform comparably to (sometimes better than) the exact methods -- they are designed in a "general" way to deal with different types of sequences (e.g., music, protein, etc.). Although general applicability is a desired property of an algorithm, it is not the case in all scenarios. For example, in the current COVID-19 (coronavirus) pandemic, there is a need for an approach that can deal specifically with the coronavirus. To this end, we propose a series of ways to improve the performance of the approximate kernel (using minimizers and information gain) in order to enhance its predictive performance pm coronavirus sequences. More specifically, we improve the quality of the approximate kernel using domain knowledge (computed using information gain) and efficient preprocessing (using minimizers computation) to classify coronavirus spike protein sequences corresponding to different variants (e.g., Alpha, Beta, Gamma). We report results using different classification and clustering algorithms and evaluate their performance using multiple evaluation metrics. Using two datasets, we show that our proposed method helps improve the kernel's performance compared to the baseline and state-of-the-art approaches in the healthcare domain.  ( 3 min )
    HandMime: Sign Language Fingerspelling Acquisition via Imitation Learning. (arXiv:2209.05135v1 [cs.RO])
    Learning fine-grained movements is among the most challenging topics in robotics. This holds true especially for robotic hands. Robotic sign language acquisition or, more specifically, fingerspelling sign language acquisition in robots can be considered a specific instance of such challenge. In this paper, we propose an approach for learning dexterous motor imitation from videos examples, without the use of any additional information. We build an URDF model of a robotic hand with a single actuator for each joint. By leveraging pre-trained deep vision models, we extract the 3D pose of the hand from RGB videos. Then, using state-of-the-art reinforcement learning algorithms for motion imitation (namely, proximal policy optimisation), we train a policy to reproduce the movement extracted from the demonstrations. We identify the best set of hyperparameters to perform imitation based on a reference motion. Additionally, we demonstrate the ability of our approach to generalise over 6 different fingerspelled letters.  ( 2 min )
    Personalized Federated Learning with Communication Compression. (arXiv:2209.05148v1 [cs.LG])
    In contrast to training traditional machine learning (ML) models in data centers, federated learning (FL) trains ML models over local datasets contained on resource-constrained heterogeneous edge devices. Existing FL algorithms aim to learn a single global model for all participating devices, which may not be helpful to all devices participating in the training due to the heterogeneity of the data across the devices. Recently, Hanzely and Richt\'{a}rik (2020) proposed a new formulation for training personalized FL models aimed at balancing the trade-off between the traditional global model and the local models that could be trained by individual devices using their private data only. They derived a new algorithm, called Loopless Gradient Descent (L2GD), to solve it and showed that this algorithms leads to improved communication complexity guarantees in regimes when more personalization is required. In this paper, we equip their L2GD algorithm with a bidirectional compression mechanism to further reduce the communication bottleneck between the local devices and the server. Unlike other compression-based algorithms used in the FL-setting, our compressed L2GD algorithm operates on a probabilistic communication protocol, where communication does not happen on a fixed schedule. Moreover, our compressed L2GD algorithm maintains a similar convergence rate as vanilla SGD without compression. To empirically validate the efficiency of our algorithm, we perform diverse numerical experiments on both convex and non-convex problems and using various compression techniques.  ( 3 min )
    Dimensionality Reduction using Elastic Measures. (arXiv:2209.04933v1 [cs.LG])
    With the recent surge in big data analytics for hyper-dimensional data there is a renewed interest in dimensionality reduction techniques for machine learning applications. In order for these methods to improve performance gains and understanding of the underlying data, a proper metric needs to be identified. This step is often overlooked and metrics are typically chosen without consideration of the underlying geometry of the data. In this paper, we present a method for incorporating elastic metrics into the t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP). We apply our method to functional data, which is uniquely characterized by rotations, parameterization, and scale. If these properties are ignored, they can lead to incorrect analysis and poor classification performance. Through our method we demonstrate improved performance on shape identification tasks for three benchmark data sets (MPEG-7, Car data set, and Plane data set of Thankoor), where we achieve 0.77, 0.95, and 1.00 F1 score, respectively.  ( 2 min )
    Bias Challenges in Counterfactual Data Augmentation. (arXiv:2209.05104v1 [cs.LG])
    Deep learning models tend not to be out-of-distribution robust primarily due to their reliance on spurious features to solve the task. Counterfactual data augmentations provide a general way of (approximately) achieving representations that are counterfactual-invariant to spurious features, a requirement for out-of-distribution (OOD) robustness. In this work, we show that counterfactual data augmentations may not achieve the desired counterfactual-invariance if the augmentation is performed by a {\em context-guessing machine}, an abstract machine that guesses the most-likely context of a given input. We theoretically analyze the invariance imposed by such counterfactual data augmentations and describe an exemplar NLP task where counterfactual data augmentation by a context-guessing machine does not lead to robust OOD classifiers.  ( 2 min )
    A Comparative Study of Classical and Quantum Machine Learning Models for Sentimental Analysis. (arXiv:2209.05142v1 [quant-ph])
    We analyse and classify the sentiments of a text data constructed from movie reviews. For this, we use the kernel-based approach from quantum machine learning algorithms. In order to compose a quantum kernel, we use a circuit constructed using a combination of different Pauli rotational gates where the rotational parameter is a classical non-linear function of data points obtained from the text data. For analysing the performance of the proposed model, we analyse the quantum model using decision tree, gradient boosting classifier, and classical and quantum support vector machines. Our results show that quantum kernel model or quantum support vector machine outperforms all other algorithms used for analysis in terms of all evaluation metrics. In comparison to a classical support vector machine, the quantum support vector machine leads to significantly better results even with increased number of features or dimensions. The results clearly demonstrate increase in precision score by $9.4 \%$ using a quantum support vector machine as against a classical support vector machine if the number of features are $15$.  ( 2 min )
    CARE: Certifiably Robust Learning with Reasoning via Variational Inference. (arXiv:2209.05055v1 [cs.LG])
    Despite great recent advances achieved by deep neural networks (DNNs), they are often vulnerable to adversarial attacks. Intensive research efforts have been made to improve the robustness of DNNs; however, most empirical defenses can be adaptively attacked again, and the theoretically certified robustness is limited, especially on large-scale datasets. One potential root cause of such vulnerabilities for DNNs is that although they have demonstrated powerful expressiveness, they lack the reasoning ability to make robust and reliable predictions. In this paper, we aim to integrate domain knowledge to enable robust learning with the reasoning paradigm. In particular, we propose a certifiably robust learning with reasoning pipeline (CARE), which consists of a learning component and a reasoning component. Concretely, we use a set of standard DNNs to serve as the learning component to make semantic predictions, and we leverage the probabilistic graphical models, such as Markov logic networks (MLN), to serve as the reasoning component to enable knowledge/logic reasoning. However, it is known that the exact inference of MLN (reasoning) is #P-complete, which limits the scalability of the pipeline. To this end, we propose to approximate the MLN inference via variational inference based on an efficient expectation maximization algorithm. In particular, we leverage graph convolutional networks (GCNs) to encode the posterior distribution during variational inference and update the parameters of GCNs (E-step) and the weights of knowledge rules in MLN (M-step) iteratively. We conduct extensive experiments on different datasets and show that CARE achieves significantly higher certified robustness compared with the state-of-the-art baselines. We additionally conducted different ablation studies to demonstrate the empirical robustness of CARE and the effectiveness of different knowledge integration.  ( 3 min )
    Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization. (arXiv:2209.05045v1 [math.OC])
    Nonsmooth nonconvex optimization problems broadly emerge in machine learning and business decision making, whereas two core challenges impede the development of efficient solution methods with finite-time convergence guarantee: the lack of computationally tractable optimality criterion and the lack of computationally powerful oracles. The contributions of this paper are two-fold. First, we establish the relationship between the celebrated Goldstein subdifferential~\citep{Goldstein-1977-Optimization} and uniform smoothing, thereby providing the basis and intuition for the design of gradient-free methods that guarantee the finite-time convergence to a set of Goldstein stationary points. Second, we propose the gradient-free method (GFM) and stochastic GFM for solving a class of nonsmooth nonconvex optimization problems and prove that both of them can return a $(\delta,\epsilon)$-Goldstein stationary point of a Lipschitz function $f$ at an expected convergence rate at $O(d^{3/2}\delta^{-1}\epsilon^{-4})$ where $d$ is the problem dimension. Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results. Finally, we demonstrate the effectiveness of 2-SGFM on training ReLU neural networks with the \textsc{Minst} dataset.  ( 2 min )
    SELTO: Sample-Efficient Learned Topology Optimization. (arXiv:2209.05098v1 [cs.LG])
    We present a sample-efficient deep learning strategy for topology optimization. Our end-to-end approach is supervised and includes physics-based preprocessing and equivariant networks. We analyze how different components of our deep learning pipeline influence the number of required training samples via a large-scale comparison. The results demonstrate that including physical concepts not only drastically improves the sample efficiency but also the predictions' physical correctness. Finally, we publish two topology optimization datasets containing problems and corresponding ground truth solutions. We are confident that these datasets will improve comparability and future progress in the field.  ( 2 min )
    TMSS: An End-to-End Transformer-based Multimodal Network for Segmentation and Survival Prediction. (arXiv:2209.05036v1 [eess.IV])
    When oncologists estimate cancer patient survival, they rely on multimodal data. Even though some multimodal deep learning methods have been proposed in the literature, the majority rely on having two or more independent networks that share knowledge at a later stage in the overall model. On the other hand, oncologists do not do this in their analysis but rather fuse the information in their brain from multiple sources such as medical images and patient history. This work proposes a deep learning method that mimics oncologists' analytical behavior when quantifying cancer and estimating patient survival. We propose TMSS, an end-to-end Transformer based Multimodal network for Segmentation and Survival prediction that leverages the superiority of transformers that lies in their abilities to handle different modalities. The model was trained and validated for segmentation and prognosis tasks on the training dataset from the HEad & NeCK TumOR segmentation and the outcome prediction in PET/CT images challenge (HECKTOR). We show that the proposed prognostic model significantly outperforms state-of-the-art methods with a concordance index of 0.763+/-0.14 while achieving a comparable dice score of 0.772+/-0.030 to a standalone segmentation model. The code is publicly available.  ( 2 min )
    "Calibeating": Beating Forecasters at Their Own Game. (arXiv:2209.04892v1 [econ.TH])
    In order to identify expertise, forecasters should not be tested by their calibration score, which can always be made arbitrarily small, but rather by their Brier score. The Brier score is the sum of the calibration score and the refinement score; the latter measures how good the sorting into bins with the same forecast is, and thus attests to "expertise." This raises the question of whether one can gain calibration without losing expertise, which we refer to as "calibeating." We provide an easy way to calibeat any forecast, by a deterministic online procedure. We moreover show that calibeating can be achieved by a stochastic procedure that is itself calibrated, and then extend the results to simultaneously calibeating multiple procedures, and to deterministic procedures that are continuously calibrated.  ( 2 min )
    A novel learning-based robust model predictive control energy management strategy for fuel cell electric vehicles. (arXiv:2209.04995v1 [cs.SY])
    The multi-source electromechanical coupling makes the energy management of fuel cell electric vehicles (FCEVs) relatively nonlinear and complex especially in the types of 4-wheel-drive (4WD) FCEVs. Accurate state observing for complicated nonlinear system is the basis for fantastic energy managing in FCEVs. Aiming at releasing the energy-saving potential of FCEVs, a novel learning-based robust model predictive control (LRMPC) strategy is proposed for a 4WD FCEV, contributing to suitable power distribution among multiple energy sources. The well-designed strategy based on machine learning (ML) translates the knowledge of the nonlinear system to the explicit controlling scheme with superior robust performance. To start with, ML methods with high regression accuracy and superior generalization ability are trained offline to establish the precise state observer for SOC. Then, explicit data tables for SOC generated by state observer are used for grabbing accurate state changing, whose input features include the vehicle status and the states of vehicle components. To be specific, the vehicle velocity estimation for providing future speed reference is constructed by deep forest. Next, the components including explicit data tables and vehicle velocity estimation are combined with model predictive control (MPC) to release the state-of-the-art energy-saving ability for the multi-freedom system in FCEVs, whose name is LRMPC. At last, the detailed assessment is performed in simulation test to validate the advancing performance of LRMPC. The corresponding results highlight the optimal control effect in energy-saving potential and strong real-time application ability of LRMPC.  ( 3 min )
    Symbolic Knowledge Extraction from Opaque Predictors Applied to Cosmic-Ray Data Gathered with LISA Pathfinder. (arXiv:2209.04697v1 [astro-ph.HE])
    Machine learning models are nowadays ubiquitous in space missions, performing a wide variety of tasks ranging from the prediction of multivariate time series through the detection of specific patterns in the input data. Adopted models are usually deep neural networks or other complex machine learning algorithms providing predictions that are opaque, i.e., human users are not allowed to understand the rationale behind the provided predictions. Several techniques exist in the literature to combine the impressive predictive performance of opaque machine learning models with human-intelligible prediction explanations, as for instance the application of symbolic knowledge extraction procedures. In this paper are reported the results of different knowledge extractors applied to an ensemble predictor capable of reproducing cosmic-ray data gathered on board the LISA Pathfinder space mission. A discussion about the readability/fidelity trade-off of the extracted knowledge is also presented.  ( 2 min )
    Vision Transformer with Convolutional Encoder-Decoder for Hand Gesture Recognition using 24 GHz Doppler Radar. (arXiv:2209.05032v1 [eess.SP])
    Transformers combined with convolutional encoders have been recently used for hand gesture recognition (HGR) using micro-Doppler signatures. We propose a vision-transformer-based architecture for HGR with multi-antenna continuous-wave Doppler radar receivers. The proposed architecture consists of three modules: a convolutional encoderdecoder, an attention module with three transformer layers, and a multi-layer perceptron. The novel convolutional decoder helps to feed patches with larger sizes to the attention module for improved feature extraction. Experimental results obtained with a dataset corresponding to a two-antenna continuous-wave Doppler radar receiver operating at 24 GHz (published by Skaria et al.) confirm that the proposed architecture achieves an accuracy of 98.3% which substantially surpasses the state-of-the-art on the used dataset.  ( 2 min )
    Learning When to Say "I Don't Know". (arXiv:2209.04944v1 [cs.CV])
    We propose a new Reject Option Classification technique to identify and remove regions of uncertainty in the decision space for a given neural classifier and dataset. Such existing formulations employ a learned rejection (remove)/selection (keep) function and require either a known cost for rejecting examples or strong constraints on the accuracy or coverage of the selected examples. We consider an alternative formulation by instead analyzing the complementary reject region and employing a validation set to learn per-class softmax thresholds. The goal is to maximize the accuracy of the selected examples subject to a natural randomness allowance on the rejected examples (rejecting more incorrect than correct predictions). We provide results showing the benefits of the proposed method over na\"ively thresholding calibrated/uncalibrated softmax scores with 2-D points, imagery, and text classification datasets using state-of-the-art pretrained models. Source code is available at https://github.com/osu-cvl/learning-idk.  ( 2 min )
    A Complex Network based Graph Embedding Method for Link Prediction. (arXiv:2209.04884v1 [cs.LG])
    Graph embedding methods aim at finding useful graph representations by mapping nodes to a low-dimensional vector space. It is a task with important downstream applications, such as link prediction, graph reconstruction, data visualization, node classification, and language modeling. In recent years, the field of graph embedding has witnessed a shift from linear algebraic approaches towards local, gradient-based optimization methods combined with random walks and deep neural networks to tackle the problem of embedding large graphs. However, despite this improvement in the optimization tools, graph embedding methods are still generically designed in a way that is oblivious to the particularities of real-life networks. Indeed, there has been significant progress in understanding and modeling complex real-life networks in recent years. However, the obtained results have had a minor influence on the development of graph embedding algorithms. This paper aims to remedy this by designing a graph embedding method that takes advantage of recent valuable insights from the field of network science. More precisely, we present a novel graph embedding approach based on the popularity-similarity and local attraction paradigms. We evaluate the performance of the proposed approach on the link prediction task on a large number of real-life networks. We show, using extensive experimental analysis, that the proposed method outperforms state-of-the-art graph embedding algorithms. We also demonstrate its robustness to data scarcity and the choice of embedding dimensionality.  ( 3 min )
    An Investigation of Smart Contract for Collaborative Machine Learning Model Training. (arXiv:2209.05017v1 [cs.LG])
    Machine learning (ML) has penetrated various fields in the era of big data. The advantage of collaborative machine learning (CML) over most conventional ML lies in the joint effort of decentralized nodes or agents that results in better model performance and generalization. As the training of ML models requires a massive amount of good quality data, it is necessary to eliminate concerns about data privacy and ensure high-quality data. To solve this problem, we cast our eyes on the integration of CML and smart contracts. Based on blockchain, smart contracts enable automatic execution of data preserving and validation, as well as the continuity of CML model training. In our simulation experiments, we define incentive mechanisms on the smart contract, investigate the important factors such as the number of features in the dataset (num_words), the size of the training data, the cost for the data holders to submit data, etc., and conclude how these factors impact the performance metrics of the model: the accuracy of the trained model, the gap between the accuracies of the model before and after simulation, and the time to use up the balance of bad agent. For instance, the increase of the value of num_words leads to higher model accuracy and eliminates the negative influence of malicious agents in a shorter time from our observation of the experiment results. Statistical analyses show that with the help of smart contracts, the influence of invalid data is efficiently diminished and model robustness is maintained. We also discuss the gap in existing research and put forward possible future directions for further works.  ( 3 min )
    On topological data analysis for structural dynamics: an introduction to persistent homology. (arXiv:2209.05134v1 [stat.ML])
    Topological methods can provide a way of proposing new metrics and methods of scrutinising data, that otherwise may be overlooked. In this work, a method of quantifying the shape of data, via a topic called topological data analysis will be introduced. The main tool within topological data analysis (TDA) is persistent homology. Persistent homology is a method of quantifying the shape of data over a range of length scales. The required background and a method of computing persistent homology is briefly discussed in this work. Ideas from topological data analysis are then used for nonlinear dynamics to analyse some common attractors, by calculating their embedding dimension, and then to assess their general topologies. A method will also be proposed, that uses topological data analysis to determine the optimal delay for a time-delay embedding. TDA will also be applied to a Z24 Bridge case study in structural health monitoring, where it will be used to scrutinise different data partitions, classified by the conditions at which the data were collected. A metric, from topological data analysis, is used to compare data between the partitions. The results presented demonstrate that the presence of damage alters the manifold shape more significantly than the effects present from temperature.  ( 3 min )
    Bounding The Rademacher Complexity of Fourier Neural Operator. (arXiv:2209.05150v1 [cs.LG])
    A Fourier neural operator (FNO) is one of the physics-inspired machine learning methods. In particular, it is a neural operator. In recent times, several types of neural operators have been developed, e.g., deep operator networks, GNO, and MWTO. Compared with other models, the FNO is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. In this study, we investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the FNO model. In addition, we investigated the correlation between the empirical generalization error and the proposed capacity of FNO. Based on this investigation, we gained insight into the impact of the model architecture on the generalization error and estimated the amount of information about FNO models stored in various types of capacities.  ( 2 min )
    Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs. (arXiv:2209.05020v1 [cs.LG])
    We investigate efficient learning from higher-order graph convolution and learning directly from adjacency matrices for node classification. We revisit the scaled graph residual network and remove ReLU activation from residual layers and apply a single weight matrix at each residual layer. We show that the resulting model lead to new graph convolution models as a polynomial of the normalized adjacency matrix, the residual weight matrix, and the residual scaling parameter. Additionally, we propose adaptive learning between directly graph polynomial convolution models and learning directly from the adjacency matrix. Furthermore, we propose fully adaptive models to learn scaling parameters at each residual layer. We show that generalization bounds of proposed methods are bounded as a polynomial of eigenvalue spectrum, scaling parameters, and upper bounds of residual weights. By theoretical analysis, we argue that the proposed models can obtain improved generalization bounds by limiting the higher-orders of convolutions and direct learning from the adjacency matrix. Using a wide set of real-data, we demonstrate that the proposed methods obtain improved accuracy for node-classification of non-homophilous graphs.  ( 2 min )
    Clifford Neural Layers for PDE Modeling. (arXiv:2209.04934v1 [cs.LG])
    Partial differential equations (PDEs) see widespread use in sciences and engineering to describe simulation of physical processes as scalar and vector fields interacting and coevolving over time. Due to the computationally expensive nature of their standard solution methods, neural PDE surrogates have become an active research topic to accelerate these simulations. However, current methods do not explicitly take into account the relationship between different fields and their internal components, which are often correlated. Viewing the time evolution of such correlated fields through the lens of multivector fields allows us to overcome these limitations. Multivector fields consist of scalar, vector, as well as higher-order components, such as bivectors and trivectors. Their algebraic properties, such as multiplication, addition and other arithmetic operations can be described by Clifford algebras. To our knowledge, this paper presents the first usage of such multivector representations together with Clifford convolutions and Clifford Fourier transforms in the context of deep learning. The resulting Clifford neural layers are universally applicable and will find direct use in the areas of fluid dynamics, weather forecasting, and the modeling of physical systems in general. We empirically evaluate the benefit of Clifford neural layers by replacing convolution and Fourier operations in common neural PDE surrogates by their Clifford counterparts on two-dimensional Navier-Stokes and weather modeling tasks, as well as three-dimensional Maxwell equations. Clifford neural layers consistently improve generalization capabilities of the tested neural PDE surrogates.  ( 3 min )
  • Open

    Bounding The Rademacher Complexity of Fourier Neural Operator. (arXiv:2209.05150v1 [cs.LG])
    A Fourier neural operator (FNO) is one of the physics-inspired machine learning methods. In particular, it is a neural operator. In recent times, several types of neural operators have been developed, e.g., deep operator networks, GNO, and MWTO. Compared with other models, the FNO is computationally efficient and can learn nonlinear operators between function spaces independent of a certain finite basis. In this study, we investigated the bounding of the Rademacher complexity of the FNO based on specific group norms. Using capacity based on these norms, we bound the generalization error of the FNO model. In addition, we investigated the correlation between the empirical generalization error and the proposed capacity of FNO. Based on this investigation, we gained insight into the impact of the model architecture on the generalization error and estimated the amount of information about FNO models stored in various types of capacities.
    SocialInteractionGAN: Multi-person Interaction Sequence Generation. (arXiv:2103.05916v2 [cs.NE] UPDATED)
    Prediction of human actions in social interactions has important applications in the design of social robots or artificial avatars. In this paper, we focus on a unimodal representation of interactions and propose to tackle interaction generation in a data-driven fashion. In particular, we model human interaction generation as a discrete multi-sequence generation problem and present SocialInteractionGAN, a novel adversarial architecture for conditional interaction generation. Our model builds on a recurrent encoder-decoder generator network and a dual-stream discriminator, that jointly evaluates the realism of interactions and individual action sequences and operates at different time scales. Crucially, contextual information on interacting participants is shared among agents and reinjected in both the generation and the discriminator evaluation processes. Experiments show that albeit dealing with low dimensional data, SocialInteractionGAN succeeds in producing high realism action sequences of interacting people, comparing favorably to a diversity of recurrent and convolutional discriminator baselines, and we argue that this work will constitute a first stone towards higher dimensional and multimodal interaction generation. Evaluations are conducted using classical GAN metrics, that we specifically adapt for discrete sequential data. Our model is shown to properly learn the dynamics of interaction sequences, while exploiting the full range of available actions.
    Advanced Manufacturing Configuration by Sample-efficient Batch Bayesian Optimization. (arXiv:2205.11827v2 [cs.LG] UPDATED)
    We propose a framework for the configuration and operation of expensive-to-evaluate advanced manufacturing methods, based on Bayesian optimization. The framework unifies a tailored acquisition function, a parallel acquisition procedure, and the integration of process information providing context to the optimization procedure. \cmtb{The novel acquisition function is demonstrated, analyzed and compared on state-of-the-art benchmarking problems. We apply the optimization approach to atmospheric plasma spraying and fused deposition modeling.} Our results demonstrate that the proposed framework can efficiently find input parameters that produce the desired outcome and minimize the process cost.
    Statistical Learning Theory for Control: A Finite Sample Perspective. (arXiv:2209.05423v1 [eess.SY])
    This tutorial survey provides an overview of recent non-asymptotic advances in statistical learning theory as relevant to control and system identification. While there has been substantial progress across all areas of control, the theory is most well-developed when it comes to linear system identification and learning for the linear quadratic regulator, which are the focus of this manuscript. From a theoretical perspective, much of the labor underlying these advances has been in adapting tools from modern high-dimensional statistics and learning theory. While highly relevant to control theorists interested in integrating tools from machine learning, the foundational material has not always been easily accessible. To remedy this, we provide a self-contained presentation of the relevant material, outlining all the key ideas and the technical machinery that underpin recent results. We also present a number of open problems and future directions.
    Model interpretation using improved local regression with variable importance. (arXiv:2209.05371v1 [stat.ML])
    A fundamental question on the use of ML models concerns the explanation of their predictions for increasing transparency in decision-making. Although several interpretability methods have emerged, some gaps regarding the reliability of their explanations have been identified. For instance, most methods are unstable (meaning that they give very different explanations with small changes in the data), and do not cope well with irrelevant features (that is, features not related to the label). This article introduces two new interpretability methods, namely VarImp and SupClus, that overcome these issues by using local regressions fits with a weighted distance that takes into account variable importance. Whereas VarImp generates explanations for each instance and can be applied to datasets with more complex relationships, SupClus interprets clusters of instances with similar explanations and can be applied to simpler datasets where clusters can be found. We compare our methods with state-of-the art approaches and show that it yields better explanations according to several metrics, particularly in high-dimensional problems with irrelevant features, as well as when the relationship between features and target is non-linear.
    Fast Bayesian Variable Selection in Binomial and Negative Binomial Regression. (arXiv:2106.14981v2 [stat.ME] UPDATED)
    Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates or non-conjugate likelihoods. Generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond, represent an important special case. Here we introduce an efficient MCMC scheme for variable selection in binomial and negative binomial regression that exploits Tempered Gibbs Sampling (Zanella and Roberts, 2019) and that includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our approach, including on cancer data with seventeen thousand covariates.
    Decision Tree-Based Predictive Models for Academic Achievement Using College Students' Support Networks. (arXiv:2108.13947v2 [stat.ML] UPDATED)
    In this study, we examine a set of primary data collected from 484 students enrolled in a large public university in the Mid-Atlantic United States region during the early stages of the COVID-19 pandemic. The data, called Ties data, included students' demographic and support network information. The support network data comprised of information that highlighted the type of support, (i.e. emotional or educational; routine or intense). Using this data set, models for predicting students' academic achievement, quantified by their self-reported GPA, were created using Chi-Square Automatic Interaction Detection (CHAID), a decision tree algorithm, and cforest, a random forest algorithm that uses conditional inference trees. We compare the methods' accuracy and variation in the set of important variables suggested by each algorithm. Each algorithm found different variables important for different student demographics with some overlap. For White students, different types of educational support were important in predicting academic achievement, while for non-White students, different types of emotional support were important in predicting academic achievement. The presence of differing types of routine support were important in predicting academic achievement for cisgender women, while differing types of intense support were important in predicting academic achievement for cisgender men.  ( 3 min )
    Bias Challenges in Counterfactual Data Augmentation. (arXiv:2209.05104v1 [cs.LG])
    Deep learning models tend not to be out-of-distribution robust primarily due to their reliance on spurious features to solve the task. Counterfactual data augmentations provide a general way of (approximately) achieving representations that are counterfactual-invariant to spurious features, a requirement for out-of-distribution (OOD) robustness. In this work, we show that counterfactual data augmentations may not achieve the desired counterfactual-invariance if the augmentation is performed by a {\em context-guessing machine}, an abstract machine that guesses the most-likely context of a given input. We theoretically analyze the invariance imposed by such counterfactual data augmentations and describe an exemplar NLP task where counterfactual data augmentation by a context-guessing machine does not lead to robust OOD classifiers.  ( 2 min )
    Wasserstein Distributional Learning. (arXiv:2209.04991v1 [stat.ME])
    Learning conditional densities and identifying factors that influence the entire distribution are vital tasks in data-driven applications. Conventional approaches work mostly with summary statistics, and are hence inadequate for a comprehensive investigation. Recently, there have been developments on functional regression methods to model density curves as functional outcomes. A major challenge for developing such models lies in the inherent constraint of non-negativity and unit integral for the functional space of density outcomes. To overcome this fundamental issue, we propose Wasserstein Distributional Learning (WDL), a flexible density-on-scalar regression modeling framework that starts with the Wasserstein distance $W_2$ as a proper metric for the space of density outcomes. We then introduce a heterogeneous and flexible class of Semi-parametric Conditional Gaussian Mixture Models (SCGMM) as the model class $\mathfrak{F} \otimes \mathcal{T}$. The resulting metric space $(\mathfrak{F} \otimes \mathcal{T}, W_2)$ satisfies the required constraints and offers a dense and closed functional subspace. For fitting the proposed model, we further develop an efficient algorithm based on Majorization-Minimization optimization with boosted trees. Compared with methods in the previous literature, WDL better characterizes and uncovers the nonlinear dependence of the conditional densities, and their derived summary statistics. We demonstrate the effectiveness of the WDL framework through simulations and real-world applications.  ( 2 min )
    Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach. (arXiv:2209.05186v1 [stat.ML])
    In an Markov decision process (MDP), unobservable confounders may exist and have impacts on the data generating process, so that the classic off-policy evaluation (OPE) estimators may fail to identify the true value function of the target policy. In this paper, we study the statistical properties of OPE in confounded MDPs with observable instrumental variables. Specifically, we propose a two-stage estimator based on the instrumental variables and establish its statistical properties in the confounded MDPs with a linear structure. For non-asymptotic analysis, we prove a $\mathcal{O}(n^{-1/2})$-error bound where $n$ is the number of samples. For asymptotic analysis, we prove that the two-stage estimator is asymptotically normal with a typical rate of $n^{1/2}$. To the best of our knowledge, we are the first to show such statistical results of the two-stage estimator for confounded linear MDPs via instrumental variables.  ( 2 min )
    Kernel Learning for Explainable Climate Science. (arXiv:2209.04947v1 [cs.LG])
    The Upper Indus Basin, Himalayas provides water for 270 million people and countless ecosystems. However, precipitation, a key component to hydrological modelling, is poorly understood in this area. A key challenge surrounding this uncertainty comes from the complex spatial-temporal distribution of precipitation across the basin. In this work we propose Gaussian processes with structured non-stationary kernels to model precipitation patterns in the UIB. Previous attempts to quantify or model precipitation in the Hindu Kush Karakoram Himalayan region have often been qualitative or include crude assumptions and simplifications which cannot be resolved at lower resolutions. This body of research also provides little to no error propagation. We account for the spatial variation in precipitation with a non-stationary Gibbs kernel parameterised with an input dependent lengthscale. This allows the posterior function samples to adapt to the varying precipitation patterns inherent in the distinct underlying topography of the Indus region. The input dependent lengthscale is governed by a latent Gaussian process with a stationary squared-exponential kernel to allow the function level hyperparameters to vary smoothly. In ablation experiments we motivate each component of the proposed kernel by demonstrating its ability to model the spatial covariance, temporal structure and joint spatio-temporal reconstruction. We benchmark our model with a stationary Gaussian process and a Deep Gaussian processes.
    On the Hyperparameters in Stochastic Gradient Descent with Momentum. (arXiv:2108.03947v2 [cs.LG] UPDATED)
    Following the same routine as [SSJ20], we continue to present the theoretical analysis for stochastic gradient descent with momentum (SGD with momentum) in this paper. Differently, for SGD with momentum, we demonstrate it is the two hyperparameters together, the learning rate and the momentum coefficient, that play the significant role for the linear rate of convergence in non-convex optimization. Our analysis is based on the use of a hyperparameters-dependent stochastic differential equation (hp-dependent SDE) that serves as a continuous surrogate for SGD with momentum. Similarly, we establish the linear convergence for the continuous-time formulation of SGD with momentum and obtain an explicit expression for the optimal linear rate by analyzing the spectrum of the Kramers-Fokker-Planck operator. By comparison, we demonstrate how the optimal linear rate of convergence and the final gap for SGD only about the learning rate varies with the momentum coefficient increasing from zero to one when the momentum is introduced. Then, we propose a mathematical interpretation why the SGD with momentum converges faster and more robust about the learning rate than the standard SGD in practice. Finally, we show the Nesterov momentum under the existence of noise has no essential difference with the standard momentum.
    Understanding the Behavior of Belief Propagation. (arXiv:2209.05464v1 [cs.AI])
    Probabilistic graphical models are a powerful concept for modeling high-dimensional distributions. Besides modeling distributions, probabilistic graphical models also provide an elegant framework for performing statistical inference; because of the high-dimensional nature, however, one must often use approximate methods for this purpose. Belief propagation performs approximate inference, is efficient, and looks back on a long success-story. Yet, in most cases, belief propagation lacks any performance and convergence guarantees. Many realistic problems are presented by graphical models with loops, however, in which case belief propagation is neither guaranteed to provide accurate estimates nor that it converges at all. This thesis investigates how the model parameters influence the performance of belief propagation. We are particularly interested in their influence on (i) the number of fixed points, (ii) the convergence properties, and (iii) the approximation quality.  ( 2 min )
    A framework for bilevel optimization that enables stochastic and global variance reduction algorithms. (arXiv:2201.13409v2 [stat.ML] UPDATED)
    Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.
    A Note on the Efficient Evaluation of PAC-Bayes Bounds. (arXiv:2209.05188v1 [cs.LG])
    When utilising PAC-Bayes theory for risk certification, it is usually necessary to estimate and bound the Gibbs risk of the PAC-Bayes posterior. Many works in the literature employ a method for this which requires a large number of passes of the dataset, incurring high computational cost. This manuscript presents a very general alternative which makes computational savings on the order of the dataset size.
    On topological data analysis for structural dynamics: an introduction to persistent homology. (arXiv:2209.05134v1 [stat.ML])
    Topological methods can provide a way of proposing new metrics and methods of scrutinising data, that otherwise may be overlooked. In this work, a method of quantifying the shape of data, via a topic called topological data analysis will be introduced. The main tool within topological data analysis (TDA) is persistent homology. Persistent homology is a method of quantifying the shape of data over a range of length scales. The required background and a method of computing persistent homology is briefly discussed in this work. Ideas from topological data analysis are then used for nonlinear dynamics to analyse some common attractors, by calculating their embedding dimension, and then to assess their general topologies. A method will also be proposed, that uses topological data analysis to determine the optimal delay for a time-delay embedding. TDA will also be applied to a Z24 Bridge case study in structural health monitoring, where it will be used to scrutinise different data partitions, classified by the conditions at which the data were collected. A metric, from topological data analysis, is used to compare data between the partitions. The results presented demonstrate that the presence of damage alters the manifold shape more significantly than the effects present from temperature.
    Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary Information. (arXiv:1908.01901v2 [cs.LG] UPDATED)
    Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumber relatively rare parasites. In this work, we describe a complete, fully-automated framework for thin film malaria analysis that applies ML methods, including convolutional neural nets (CNNs), trained on a large and diverse dataset of field-prepared thin blood films. Quantitation and species identification results are close to sufficiently accurate for the concrete needs of drug resistance monitoring and clinical use-cases on field-prepared samples. We focus our methods and our performance metrics on the field use-case requirements. We discuss key issues and important metrics for the application of ML methods to malaria microscopy.
    Granger Causal Chain Discovery for Sepsis-Associated Derangements via Multivariate Hawkes Processes. (arXiv:2209.04480v1 [stat.AP])
    Modern health care systems are conducting continuous, automated surveillance of the electronic medical record (EMR) to identify adverse events with increasing frequency; however, many events such as sepsis do not have clearly elucidated prodromes (i.e., event chains) that can be used to identify and intercept the adverse event early in its course. Currently there does not exist a reliable framework for discovering or describing causal chains that precede adverse hospital events. Clinically relevant and interpretable results require a framework that can (1) infer temporal interactions across multiple patient features found in EMR data (e.g., labs, vital signs, etc.) and (2) can identify pattern(s) which precede and are specific to an impending adverse event (e.g., sepsis). In this work, we propose a linear multivariate Hawkes process model, coupled with $g(x) = x^+$ link function to allow potential inhibition effects, in order to recover a Granger Causal (GC) graph. We develop a two-phase gradient-based scheme to maximize a surrogate of likelihood to estimate the problem parameters. This two-phase algorithm is scalable and shown to be effective via our numerical simulation. It is subsequently extended to a data set of patients admitted to Grady hospital system in Atalanta, GA, where the fitted Granger Causal graph identifies several highly interpretable chains that precede sepsis.  ( 3 min )
    Robust Geometric Metric Learning. (arXiv:2202.11550v2 [stat.ML] UPDATED)
    This paper proposes new algorithms for the metric learning problem. We start by noticing that several classical metric learning formulations from the literature can be viewed as modified covariance matrix estimation problems. Leveraging this point of view, a general approach, called Robust Geometric Metric Learning (RGML), is then studied. This method aims at simultaneously estimating the covariance matrix of each class while shrinking them towards their (unknown) barycenter. We focus on two specific costs functions: one associated with the Gaussian likelihood (RGML Gaussian), and one with Tyler's M -estimator (RGML Tyler). In both, the barycenter is defined with the Riemannian distance, which enjoys nice properties of geodesic convexity and affine invariance. The optimization is performed using the Riemannian geometry of symmetric positive definite matrices and its submanifold of unit determinant. Finally, the performance of RGML is asserted on real datasets. Strong performance is exhibited while being robust to mislabeled data.  ( 2 min )
    Monitoring of functional profiles combining the notion of Fr\'echet mean and the framework of deformation models with application in ambient air pollution surveillance. (arXiv:2010.02968v2 [stat.ME] UPDATED)
    A framework suitable for monitoring functional profiles combining the notion of Fr\'echet mean and the concept of deformation models is developed and proposed. The generalized sense of mean that the notion of the Fr\'echet mean offers is employed to capture the typical functional shape of the data, while the concept of deformation models allows for interpretable parameterizations of profile's deviations from the typical shape. Functional EWMA-type control charts are built and proposed based on shape characteristics of the functional data, allowing for (a) identifying shifts from the in-control behaviour and (b) providing causal relationships of the potential shifts with significant deviances of certain qualitative characteristics (e.g amplitude or phase deformations). The functional monitoring scheme is implemented to assess ambient air pollution. In particular, the method is implemented to a synthetic data example to assess its performance under various conditions, and to a real-world example using sensor data from an area in the city of Athens, where air pollutants profiles and their characteristics are successfully analyzed and out-of-control behaviours are identified.  ( 3 min )
    Revisiting Active Sets for Gaussian Process Decoders. (arXiv:2209.04636v1 [stat.ML])
    Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space. Such models (also known as GP-LVMs) are often expensive and notoriously difficult to train in practice, but can be scaled using variational inference and inducing points. In this paper, we revisit active set approximations. We develop a new stochastic estimate of the log-marginal likelihood based on recently discovered links to cross-validation, and propose a computationally efficient approximation thereof. We demonstrate that the resulting stochastic active sets (SAS) approximation significantly improves the robustness of GP decoder training while reducing computational cost. The SAS-GP obtains more structure in the latent space, scales to many datapoints and learns better representations than variational autoencoders, which is rarely the case for GP decoders.  ( 2 min )
    Learning Consumer Preferences from Bundle Sales Data. (arXiv:2209.04942v1 [stat.ML])
    Product bundling is a common selling mechanism used in online retailing. To set profitable bundle prices, the seller needs to learn consumer preferences from the transaction data. When customers purchase bundles or multiple products, classical methods such as discrete choice models cannot be used to estimate customers' valuations. In this paper, we propose an approach to learn the distribution of consumers' valuations toward the products using bundle sales data. The approach reduces it to an estimation problem where the samples are censored by polyhedral regions. Using the EM algorithm and Monte Carlo simulation, our approach can recover the distribution of consumers' valuations. The framework allows for unobserved no-purchases and clustered market segments. We provide theoretical results on the identifiability of the probability model and the convergence of the EM algorithm. The performance of the approach is also demonstrated numerically.  ( 2 min )
    Modeling Dependent Structure for Utterances in ASR Evaluation. (arXiv:2209.05281v1 [eess.AS])
    The bootstrap resampling method has been popular for performing significance analysis on word error rate (WER) in automatic speech recognition (ASR) evaluations. To deal with the issue of dependent speech data, the blockwise bootstrap approach is also proposed that by dividing utterances into uncorrelated blocks, it resamples these blocks instead of original data. However, it is always nontrivial to uncover the dependent structure among utterances, which could lead to subjective findings in statistical testing. In this paper, we present graphical lasso based methods to explicitly model such dependency and estimate the independent blocks of utterances in a rigorous way. Then the blockwise bootstrap is applied on top of the inferred blocks. We show that the resulting variance estimator for WER is consistent under mild conditions. We also demonstrate the validity of proposed approach on LibriSpeech data.  ( 2 min )
    Time-uniform central limit theory, asymptotic confidence sequences, and anytime-valid causal inference. (arXiv:2103.06476v4 [math.ST] UPDATED)
    Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under very weak assumptions, and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals. To elaborate, our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, and hence do not enjoy the aforementioned broad applicability of asymptotic confidence intervals. Our work bridges the gap by giving a definition for "asymptotic CSs", and deriving a universal asymptotic CS that requires only weak CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles (stemming from the seminal 1970s work of Komlos, Major, and Tusnady) to uniformly approximate the entire sample average process by an implicit Gaussian process. We demonstrate their utility by deriving nonparametric asymptotic CSs for the average treatment effect based on doubly robust estimators in observational studies, for which no nonasymptotic methods can exist even in the fixed-time regime (due to confounding bias). These enable doubly robust causal inference that can be continuously monitored and adaptively stopped.  ( 3 min )
    Centroids Matching: an efficient Continual Learning approach operating in the embedding space. (arXiv:2208.02048v2 [cs.LG] UPDATED)
    Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realistic, and limited work has been done to achieve good results on more realistic scenarios. In this paper, we propose a novel regularization method called Centroids Matching, that, inspired by meta-learning approaches, fights CF by operating in the feature space produced by the neural network, achieving good results while requiring a small memory footprint. Specifically, the approach classifies the samples directly using the feature vectors produced by the neural network, by matching those vectors with the centroids representing the classes from the current task, or all the tasks up to that point. Centroids Matching is faster than competing baselines, and it can be exploited to efficiently mitigate CF, by preserving the distances between the embedding space produced by the model when past tasks were over, and the one currently produced, leading to a method that achieves high accuracy on all the tasks, without using an external memory when operating on easy scenarios, or using a small one for more realistic ones. Extensive experiments demonstrate that Centroids Matching achieves accuracy gains on multiple datasets and scenarios.  ( 3 min )
    Deep Learning with Non-Linear Factor Models: Adaptability and Avoidance of Curse of Dimensionality. (arXiv:2209.04512v1 [stat.ML])
    In this paper, we connect deep learning literature with non-linear factor models and show that deep learning estimation makes a substantial improvement in the non-linear additive factor model literature. We provide bounds on the expected risk and show that these upper bounds are uniform over a set of multiple response variables by extending Schmidt-Hieber (2020) theorems. We show that our risk bound does not depend on the number of factors. In order to construct a covariance matrix estimator for asset returns, we develop a novel data-dependent estimator of the error covariance matrix in deep neural networks. The estimator refers to a flexible adaptive thresholding technique which is robust to outliers in the innovations. We prove that the estimator is consistent in spectral norm. Then using that result, we show consistency and rate of convergence of covariance matrix and precision matrix estimator for asset returns. The rate of convergence in both results do not depend on the number of factors, hence ours is a new result in the factor model literature due to the fact that number of factors are impediment to better estimation and prediction. Except from the precision matrix result, all our results are obtained even with number of assets are larger than the time span, and both quantities are growing. Various Monte Carlo simulations confirm our large sample findings and reveal superior accuracies of the DNN-FM in estimating the true underlying functional form which connects the factors and observable variables, as well as the covariance and precision matrix compared to competing approaches. Moreover, in an out-of-sample portfolio forecasting application it outperforms in most of the cases alternative portfolio strategies in terms of out-of-sample portfolio standard deviation and Sharpe ratio.  ( 3 min )
    Batch Bayesian Optimization via Particle Gradient Flows. (arXiv:2209.04722v1 [stat.ML])
    Bayesian Optimisation (BO) methods seek to find global optima of objective functions which are only available as a black-box or are expensive to evaluate. Such methods construct a surrogate model for the objective function, quantifying the uncertainty in that surrogate through Bayesian inference. Objective evaluations are sequentially determined by maximising an acquisition function at each step. However, this ancilliary optimisation problem can be highly non-trivial to solve, due to the non-convexity of the acquisition function, particularly in the case of batch Bayesian optimisation, where multiple points are selected in every step. In this work we reformulate batch BO as an optimisation problem over the space of probability measures. We construct a new acquisition function based on multipoint expected improvement which is convex over the space of probability measures. Practical schemes for solving this `inner' optimisation problem arise naturally as gradient flows of this objective function. We demonstrate the efficacy of this new method on different benchmark functions and compare with state-of-the-art batch BO methods.  ( 2 min )
    Weight Expansion: A New Perspective on Dropout and Generalization. (arXiv:2201.09209v2 [cs.LG] UPDATED)
    While dropout is known to be a successful regularization technique, insights into the mechanisms that lead to this success are still lacking. We introduce the concept of \emph{weight expansion}, an increase in the signed volume of a parallelotope spanned by the column or row vectors of the weight covariance matrix, and show that weight expansion is an effective means of increasing the generalization in a PAC-Bayesian setting. We provide a theoretical argument that dropout leads to weight expansion and extensive empirical support for the correlation between dropout and weight expansion. To support our hypothesis that weight expansion can be regarded as an \emph{indicator} of the enhanced generalization capability endowed by dropout, and not just as a mere by-product, we have studied other methods that achieve weight expansion (resp.\ contraction), and found that they generally lead to an increased (resp.\ decreased) generalization ability. This suggests that dropout is an attractive regularizer, because it is a computationally cheap method for obtaining weight expansion. This insight justifies the role of dropout as a regularizer, while paving the way for identifying regularizers that promise improved generalization through weight expansion.  ( 3 min )
    Convergence of Batch Stochastic Gradient Descent Methods with Approximate Gradients and/or Noisy Measurements: Theory and Computational Results. (arXiv:2209.05372v1 [math.OC])
    In this paper, we study convex optimization using a very general formulation called BSGD (Block Stochastic Gradient Descent). At each iteration, some but not necessary all components of the argument are updated. The direction of the update can be one of two possibilities: (i) A noise-corrupted measurement of the true gradient, or (ii) an approximate gradient computed using a first-order approximation, using function values that might themselves be corrupted by noise. This formulation embraces most of the currently used stochastic gradient methods. We establish conditions for BSGD to converge to the global minimum, based on stochastic approximation theory. Then we verify the predicted convergence through numerical experi- ments. Out results show that when approximate gradients are used, BSGD converges while momentum-based methods can diverge. However, not just our BSGD, but also standard (full- update) gradient descent, and various momentum-based methods, all converge, even with noisy gradients.  ( 2 min )
    Support Recovery in Mixture Models with Sparse Parameters. (arXiv:2202.11940v2 [cs.LG] UPDATED)
    Mixture models are widely used to fit complex and multimodal datasets. In this paper we study mixtures with high dimensional sparse latent parameter vectors and consider the problem of support recovery of those vectors. While parameter learning in mixture models is well-studied, the sparsity constraint remains relatively unexplored. Sparsity of parameter vectors is a natural constraint in variety of settings, and support recovery is a major step towards parameter estimation. We provide efficient algorithms for support recovery that have a logarithmic sample complexity dependence on the dimensionality of the latent space. Our algorithms are quite general, namely they are applicable to 1) mixtures of many different canonical distributions including Uniform, Poisson, Laplace, Gaussians, etc. 2) Mixtures of linear regressions and linear classifiers with Gaussian covariates under different assumptions on the unknown parameters. In most of these settings, our results are the first guarantees on the problem while in the rest, our results provide improvements on existing works.  ( 2 min )
    A Deterministic Approximation to Neural SDEs. (arXiv:2006.08973v6 [cs.LG] UPDATED)
    Neural Stochastic Differential Equations (NSDEs) model the drift and diffusion functions of a stochastic process as neural networks. While NSDEs are known to make accurate predictions, their uncertainty quantification properties have been remained unexplored so far. We report the empirical finding that obtaining well-calibrated uncertainty estimations from NSDEs is computationally prohibitive. As a remedy, we develop a computationally affordable deterministic scheme which accurately approximates the transition kernel, when dynamics is governed by a NSDE. Our method introduces a bidimensional moment matching algorithm: vertical along the neural net layers and horizontal along the time direction, which benefits from an original combination of effective approximations. Our deterministic approximation of the transition kernel is applicable to both training and prediction. We observe in multiple experiments that the uncertainty calibration quality of our method can be matched by Monte Carlo sampling only after introducing high computational cost. Thanks to the numerical stability of deterministic training, our method also improves prediction accuracy.  ( 3 min )
    If Influence Functions are the Answer, Then What is the Question?. (arXiv:2209.05364v1 [cs.LG])
    Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters. While influence estimates align well with leave-one-out retraining for linear models, recent works have shown this alignment is often poor in neural networks. In this work, we investigate the specific factors that cause this discrepancy by decomposing it into five separate terms. We study the contributions of each term on a variety of architectures and datasets and how they vary with factors such as network width and training time. While practical influence function estimates may be a poor match to leave-one-out retraining for nonlinear networks, we show they are often a good approximation to a different object we term the proximal Bregman response function (PBRF). Since the PBRF can still be used to answer many of the questions motivating influence functions, such as identifying influential or mislabeled examples, our results suggest that current algorithms for influence function estimation give more informative results than previous error analyses would suggest.  ( 2 min )
    On the Nash equilibrium of moment-matching GANs for stationary Gaussian processes. (arXiv:2203.07136v3 [stat.ML] UPDATED)
    Generative Adversarial Networks (GANs) learn an implicit generative model from data samples through a two-player game. In this paper, we study the existence of Nash equilibrium of the game which is consistent as the number of data samples grows to infinity. In a realizable setting where the goal is to estimate the ground-truth generator of a stationary Gaussian process, we show that the existence of consistent Nash equilibrium depends crucially on the choice of the discriminator family. The discriminator defined from second-order statistical moments can result in non-existence of Nash equilibrium, existence of consistent non-Nash equilibrium, or existence and uniqueness of consistent Nash equilibrium, depending on whether symmetry properties of the generator family are respected. We further study empirically the local stability and global convergence of gradient descent-ascent methods towards consistent equilibrium.  ( 2 min )
    Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-Start. (arXiv:2202.03397v2 [stat.ML] UPDATED)
    We analyze a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, equilibrium models, hyperparameter optimization and data poisoning adversarial attacks. Several recent works have proposed algorithms which warm-start the lower level problem, i.e. they use the previous lower-level approximate solution as a staring point for the lower-level solver. This warm-start procedure allows one to improve the sample complexity in both the stochastic and deterministic settings, achieving in some cases the order-wise optimal sample complexity. However, there are situations, e.g., meta learning and equilibrium models, in which the warm-start procedure is not well-suited or ineffective. In this work we show that without warm-start, it is still possible to achieve order-wise optimal or near-optimal sample complexity. In particular, we propose a simple method which uses stochastic fixed point iterations at the lower-level and projected inexact gradient descent at the upper-level, that reaches an $\epsilon$-stationary point using $O(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-1})$ samples for the stochastic and the deterministic setting, respectively. Finally, compared to methods using warm-start, our approach yields a simpler analysis that does not need to study the coupled interactions between the upper-level and lower-level iterates  ( 3 min )
    The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks. (arXiv:2108.11489v3 [stat.ML] UPDATED)
    The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign overfitting}$ has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk when the covariates satisfy sub-Gaussianity and anti-concentration properties, and the noise is independent and sub-Gaussian. By leveraging recent results that characterize the implicit bias of this estimator, our bounds emphasize the role of both the quality of the initialization as well as the properties of the data covariance matrix in achieving low excess risk.  ( 2 min )
    Data Augmentation by Selecting Mixed Classes Considering Distance Between Classes. (arXiv:2209.05122v1 [cs.CV])
    Data augmentation is an essential technique for improving recognition accuracy in object recognition using deep learning. Methods that generate mixed data from multiple data sets, such as mixup, can acquire new diversity that is not included in the training data, and thus contribute significantly to accuracy improvement. However, since the data selected for mixing are randomly sampled throughout the training process, there are cases where appropriate classes or data are not selected. In this study, we propose a data augmentation method that calculates the distance between classes based on class probabilities and can select data from suitable classes to be mixed in the training process. Mixture data is dynamically adjusted according to the training trend of each class to facilitate training. The proposed method is applied in combination with conventional methods for generating mixed data. Evaluation experiments show that the proposed method improves recognition performance on general and long-tailed image recognition datasets.  ( 2 min )
    Reproducibility in machine learning for medical imaging. (arXiv:2209.05097v1 [cs.CV])
    Reproducibility is a cornerstone of science, as the replication of findings is the process through which they become knowledge. It is widely considered that many fields of science are undergoing a reproducibility crisis. This has led to the publications of various guidelines in order to improve research reproducibility. This didactic chapter intends at being an introduction to reproducibility for researchers in the field of machine learning for medical imaging. We first distinguish between different types of reproducibility. For each of them, we aim at defining it, at describing the requirements to achieve it and at discussing its utility. The chapter ends with a discussion on the benefits of reproducibility and with a plea for a non-dogmatic approach to this concept and its implementation in research practice.  ( 2 min )

  • Open

    Zombie Apocalypse Post-apocalyptic punk cities ravaged
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    Prompt: 31 AI Generated Poems
    submitted by /u/perfectlmao [link] [comments]  ( 87 min )
    Question about AI and urban planning!
    Hey everybody! I am a student who is going to go into an urban planning masters soon and I am curious about the effect you think AI will have on planning. With tools like stable diffusion and language models getting better by the day, do you think urban planning has a future? Or will it be handed over to AI systems. There has been some past discussion about this over on the planning subreddit and they seem pretty skeptical, but I wanted to see what people more familiar with AI's capabilities think! :) Thank you so much! :) submitted by /u/Jumpy_Ad830 [link] [comments]  ( 87 min )
    Win 1k Credits on Pixelz AI - See below
    submitted by /u/mdfnb [link] [comments]  ( 90 min )
    AI Dream 1hour EPIC 1000 Subscribers Celebration!
    submitted by /u/LordPewPew777 [link] [comments]  ( 86 min )
    Meta AI Open Sources Flashlight: Fast and Flexible Machine Learning Toolkit in C++
    While deep learning and machine learning ML frameworks perform well, customizing their underlying components has always been challenging. Low-level internals can be mistakenly obfuscated, closed-source, or hand-tuned for specific purposes, making it difficult and time-consuming to find the proper code to alter. To fuel ground-breaking research, FAIR developed Flashlight, a new open-source machine learning (ML) toolkit based in C++ that allows teams to quickly and efficiently change deep and ML frameworks to better suit their needs. Flashlight was built from the ground up to be fully adjustable by the user. It’s easy to use because it includes the fundamental elements of a study environment. Because of its basic design and lack of language bindings, rebuilding the whole Flashlight library and its training pipelines takes only a few seconds whenever its essential components are modified. Continue reading | Check out the paper and github link submitted by /u/ai-lover [link] [comments]  ( 87 min )
    AI image of moonshine Bill Hader
    submitted by /u/mandeheks [link] [comments]  ( 86 min )
    Diffusers-Interpret v0.4.0 is out! Explainability for Stable Diffusion
    submitted by /u/JClub [link] [comments]  ( 87 min )
    AI Dream 81 - Wild new Project! Part 6
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    The Easiest Way to Use Stable Diffusion Right Now
    submitted by /u/pwillia7 [link] [comments]  ( 89 min )
    Stable Diffusion AI Art Deforum Notebook now with Cadence Mode Faster th...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    PyTorch: Meta transitions AI framework to PyTorch Foundation
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 87 min )
    "Google’s New AI: Fly INTO Photos! 🐦"
    submitted by /u/the_anonymizer [link] [comments]  ( 87 min )
    Liquid Robotica
    submitted by /u/widgia [link] [comments]  ( 87 min )
    God Save The Queen by Sex Pistols, Lyrics Illustrated by Midjourney
    submitted by /u/Swisheater [link] [comments]  ( 101 min )
    9 Best Artificial Intelligence books for beginners to expert to read in 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 87 min )
    Best image upscaler?
    Hi, I am looking to increase the resolution of an image as I plan to print it on a poster (60cmx40cm) but the original image (749x1114) is a bit small for this project. Therefore, I am looking for a way to increase the resolution while maintaining the best possible quality. What do you recommend? I already tried with Topaz Gigapixel AI, but the letters at the bottom (the rules) don't look quite right when upscaled. I attach the original image. ​ https://preview.redd.it/i2f7nx6qhcn91.jpg?width=749&format=pjpg&auto=webp&s=a0ac977640c40b61383b6da892b2475bde617eae submitted by /u/eStwart [link] [comments]  ( 87 min )
    Google earth scanning AI
    Does anyone know if an AI exists that effectively scans google earth / maps in a given radius and identifies certain cars? To pretty much find rare cars. If not, how would one go about in making / training this ? (Never done this before) Thank You ! submitted by /u/Few_Sample7624 [link] [comments]  ( 87 min )
    Anyone know where to access ALL of the kim-jung-gi anatomy clips i see everywhere on youtube? Thanks :-)
    submitted by /u/Aggravating-Door80 [link] [comments]  ( 87 min )
    [P] envd: A command-line tool to create development environments for AI/ML
    GitHub: https://github.com/tensorchord/envd What is envd? envd (ɪnˈvdɪ) is a command-line tool that helps you create the container-based development environment for AI/ML. Development environments are full of python and system dependencies, CUDA, BASH scripts, Dockerfiles, SSH configurations, Kubernetes YAMLs, and many other clunky things that are always breaking. envd is to solve the problem: Declare the list of dependencies (CUDA, python packages, your favorite IDE, and so on) in build.envd Simply run envd up. Develop in an isolated environment. https://i.redd.it/jynmoy30vbn91.gif Why use envd? Environments built with envd provide the following features out-of-the-box: ❤️ Knowledge reuse in your team envd build functions can be reused. Use include function to import any gi…  ( 89 min )
  • Open

    Automation of Ports – Global Supply Chains
    International trade grew to a record $28.5 trillion in 2021, according to an estimate by UNCTAD - a 25 percent increase from 2020. Recently, Covid-19 Stop-and-Go maneuvers have highlighted the difficulties that modern ports face. It includes handling an overwhelming amount of goods, their corresponding paperwork, and other administrative tasks, congestion, delays, no coordination between terminal operators, and most importantly, the dependency on manpower. In addition to these, environmental issues have taken the front seat. The post Automation of Ports – Global Supply Chains appeared first on Data Science Central.  ( 20 min )
    How AI/ML will Impact iOS App Development in 2023
    Artificial intelligence has gained huge popularity in the last few years, with its application surging across every business sector. AI has impressively gained massive acceptance in the mobile tech world by bringing diverse facilities to our fingertips. The post How AI/ML will Impact iOS App Development in 2023 appeared first on Data Science Central.  ( 21 min )
    Art and AI: The Line Blurs Further
    Let's look at this story in more detail because in my view, it shows something much more fundamental which is often overlooked. In my view, the future of all jobs will be in collaborating with AI ! The post Art and AI: The Line Blurs Further appeared first on Data Science Central.  ( 20 min )
  • Open

    Save the date: Join AWS at NVIDIA GTC, September 19–22
    Register free for NVIDIA GTC to learn from experts on how AI and the evolution of the 3D internet are profoundly impacting industries—and society as a whole. We have prepared several AWS sessions to give you guidance on how to use AWS services powered by NVIDIA technology to meet your goals. Amazon Elastic Compute Cloud […]  ( 4 min )
    How Medidata used Amazon SageMaker asynchronous inference to accelerate ML inference predictions up to 30 times faster
    This post is co-written with Rajnish Jain, Priyanka Kulkarni and Daniel Johnson from Medidata. Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical devices, and diagnostics companies as well as academic researchers with accelerating value, minimizing risk, […]  ( 11 min )
  • Open

    [P] Introducing VDP: open-source visual data ETL
    Hello folks!Just want to share that we announced the release of our project VDP in Introducing VDP: open-source visual data ETL last month 🚀. The goal of VDP is to streamline the end-to-end visual data processing pipeline: Extract unstructured visual data from pre-built data sources such as cloud/on-prem storage, or IoT devices Transform it into analysable structured data by Vision AI models imported from various ML platforms Load the transformed data into warehouses, applications, or other destinations ⭐️ VDP on GitHub 👉 VDP documentation submitted by /u/SnooDogs5688 [link] [comments]  ( 88 min )
    [D] How do I add minimum size constraints to k-means clustering?
    Hi everyone, I'm trying to add cluster size constraints to my k-means clustering algorithm. How do I add minimum cluster size constraints to my k-means clustering algorithm? I was able to add max cluster size constraints using below code snippet : while cluster_size[assignment[ j ]]== max_size: j += 1 labels[ i ] = assignment[ j ] cluster_size[assignment[j]] += 1 I'm trying to modify and add min size constraints but couldn't. Can anyone please help me with this? Thanks in advance! submitted by /u/tinkerpal [link] [comments]  ( 90 min )
    [D] PyTorch Distributed Data Parallelism: Under The Hood
    https://lambdalabs.com/blog/multi-node-pytorch-distributed-training-guide/ This is a step-by-step guide that: Walks you through how to scale your PyTorch training across multiple nodes. Provides examples that showcase the boilerplate of PyTorch DDP training code. Shows you how to launch applications using PyTorch’s distributed.launch and torchrun methods, as well as Open MPI’s mpirun method. submitted by /u/mippie_moe [link] [comments]  ( 88 min )
    [D] Taichi VS Triton ?
    How do they compare? What has better optimization? Which is easier to learn? Potential task where I intend to use them : matrix multiplication for SA, augmentation for vision problems, edge deployment for IoT or browser (voice recognition, TTS, translation, etc) submitted by /u/tororo-in [link] [comments]  ( 89 min )
    [D] PyTorch is moving to the Linux Foundation
    https://pytorch.org/blog/PyTorchfoundation/ I wonder if this will lead to a lot of departures at Meta. submitted by /u/m___ke [link] [comments]  ( 94 min )
    [D] How to understand the bias term in language model head (when we tie the word embeddings)?
    I was learning the masked language modeling codebase in Huggingface Transformers. Just a question to understand the language model head. Here at the final linear layer where we project hidden size to vocab size (https://github.com/huggingface/transformers/blob/f2fbe4475386bfcfb3b83d0a3223ba216a3c3a91/src/transformers/models/bert/modeling_bert.py#L685-L702). python3 self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False) self.bias = nn.Parameter(torch.zeros(config.vocab_size)) self.decoder.bias = self.bias We set the bias term to zero at the moment. And later when we initialize the weight, we tie the weight of the linear layer and the word embedding. But we don't do such a thing for the bias term. I wonder how we can understand that and why we want to initialize the …  ( 89 min )
    [P] (code release) Fine-tune your own stable-diffusion vae decoder and dalle-mini decoder
    A few weeks ago, before stable-diffusion was officially released, I found that fine-tuning Dalle-mini's VQGAN decoder can improve the performance on anime images. See: https://preview.redd.it/eekf9hjt3gn91.png?width=1280&format=png&auto=webp&s=25938a4ad284e6cfff958ad0d69968cd2c01ed18 And with a few lines of code change, I was able to train the stable-diffusion VAE decoder. See: https://preview.redd.it/45xogflo5gn91.png?width=1129&format=png&auto=webp&s=43f98e863b918bba9d7471a0cfa7de4dcc8df98c You can find the exact training code used in this repo: https://github.com/cccntu/fine-tune-models/ More details about the models are also in the repo. And you can play with the former model at https://github.com/cccntu/anim_e submitted by /u/cccntu [link] [comments]  ( 89 min )
    [Project] Machine learning and dependent types, with Idris and XLA
    I've built a library to explore machine learning with functional programming and dependent types, including statically-verified shapes like transpose : Tensor [m, n] dtype -> Tensor [n, m] dtype It compiles with XLA, and shares a similar approach to JAX and Dex. I've made significant progress since my post last December: I've implemented much of the linear algebra API, and it now runs on GPU. I next want autodiff, gradient descent and vectorized map (a la JAX's vmap). I've started work on these but there's still much to do. submitted by /u/tmp-1379 [link] [comments]  ( 99 min )
    [Discussion] Embedding based on binary tests
    Hi everyone, I would like to build an embedding of people's tastes based on their preferences given binary choices. In other words : - we have many people - we have many products - we have asked many times people what product they prefer between two randomly given ones And we want to build an embedding for people's tastes. Does anyone have any idea what kind of mathematical problem this is and what kind of algorithm exists ? Thanks submitted by /u/marcollo63 [link] [comments]  ( 90 min )
    [Project] Has anyone used ML to categorize Facebook group posts data?
    I’m imagining a way to catalog all of the crucial “tribal knowledge” that’s currently just lost in the sands of the feed, or at least “prep” the post data with keyword metadata of some kind that could be manually sorted out. submitted by /u/mike_the_seventh [link] [comments]  ( 88 min )
    AlexaTM 20B Large Language Model [R]
    Alexa Teacher Models (AlexaTM) are transformer-based seq-to-seq large-scale multilingual language models. Given only a few examples of a task in a new language, AlexaTM can transfer what it knows to the new language with no extra human supervision. It achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. In zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. And its carbon footprint during training is only one-fifth of GPT-3’s. I have made a video on AlexaTM 20B. Do checkout. https://youtu.be/Wd8amyHlrA4 submitted by /u/Sea-Photo5230 [link] [comments]  ( 100 min )
    [Project] Create a ML model to classify spectrograms
    Hi All, I have a large sample of spectrograms of short audio samples (say 10 different categories). How do I build a Machine Learning model that classifies new Spectrograms into these 10 categories + 1 class? It's part of a small informal project I have picked up, and if possible, kindly recommend multiple methods, so I can evaluate which method works best. TIA! Apologies if this seems very trivial, but I'm kind of stuck on how to even do this. (not an ML student but my mentor insists on using ML for this task) ​ submitted by /u/geeksid2k [link] [comments]  ( 92 min )
    C++ cross platform library for conv-net inference only, recommendations? [D]
    Anyone got a recommendation for a library to do convolutional neural net inference (actually, image to image transformation) , running on as many platform as possible with priorities being: Windows, Linux, Mac OSX/Apple Silicon ; Nvidia GPUs, AMD APUs, Apple silicon as the most important targets. NOT CUDA because of the latter devices. seems the ML world revolves around python for training but I'm guessing an appetite for C++ libs must exist for embedded/"edge" use cases. I dont want to involve python in this at all. how safe is OpenCL these days? I've implemented CNN training myself many years ago in OpenCL, but I'm suspicious about it's future given apple's attitude these days; and rather than rolling my own, something well supported that has a chance of using all the accelerators these days would be better. Ideally I want to integrate this in an SDL2 project. I did a test running convolutions in GL shaders (webgl subset) but I think this gets limited by going through texture lookups instead of general purpose compute reads, but thats my baseline if I can't find something better.. submitted by /u/dobkeratops [link] [comments]  ( 90 min )
    [P] Choosing Edge Board for neural network inference in 2022
    Last few months, I published a few videos about the most popular Edge boards for computer vision (and other NN-models). I tried to compare cheap boards (<150 USD). And not only in terms of speed. I tried to compare the platforms by their “usability.” How easy it would be to export networks, how good the support is. And how easy to work with them. Here is summarising article - https://medium.com/@zlodeibaal/choosing-computer-vision-board-in-2022-b27eb4ca7a7c And here is the benchmark result - https://docs.google.com/spreadsheets/d/1BMj8WImysOSuiT-6O3g15gqHnYF-pUGUhi8VmhhAat4/edit submitted by /u/Wormkeeper [link] [comments]  ( 89 min )
    [R] Maritime Computer Vision Workshop (MaCVi)
    Some time ago, I asked for some advice regarding organizing a workhop as part of WACV 2023. Some people gave some feedback, which I highly appreciated. Now, the workhop titled "1st Workshop on Maritime Computer Vision (MaCVi)" will take place in January in Waikoloa, Hawaii as part of WACV 2023. Just wanted to drop the news in case any1 is interested. We will accept paper submissions (will be published in the WACV Workshop Proceedings and will be included in IEEE Xplore) offer 4 challenges (UAV-based Object Detection & Tracking and USV-based (autonomous boats) Obstacle Segmentation and Detection) provide keynote talks mostly on industry-related maritime CV (whaleseeker, searchwing,...) Key dates are end of October for submission of papers and participation in the challenge and January for the actual workshop. Find all dates on https://seadronessee.cs.uni-tuebingen.de/wacv23. Teaser here ;) https://www.youtube.com/watch?v=qjJP80Q9Xo4 ​ Let me know if you have any questions. Cheers! submitted by /u/SP4ETZUENDER [link] [comments]  ( 89 min )
    [D] How to understand the frequency of Convolution kernel in Graph Convolutional Network?
    I read this paper recently, There is opion GCN have favor of low frequency feature, combining differenct frequency feature will improve the result well in this paper. I don't know how to understand the frequency in GCN, anyone have some advice ? Sincerely thanks. submitted by /u/waa007 [link] [comments]  ( 101 min )
    [D] Masters Programmes or PhD (taught?) with a strong focus on Applied Mathematics and Statistics
    So I’m currently working at a research institute in applied research. My background has been interdisciplinary and I did my masters in CS. I really enjoy the research and I’m hoping to do a PhD once I solved my financial situation a little, my contract here ends and I have published a paper. My CS masters was a little bit of a downer and I certainly didn’t learn as much as I hoped for. I would say my ML knowledge is solid, I have an understanding of advanced topics and decent programming skills due to industry- and research experience. I do notice though I am lacking some foundational math and statistics skills. My undergraduate only involved some math and masters close to none. And I really notice this, I have this strong urge to really dive deep into understanding the maths. Especially conceptually. I’m comfortable with the basics like Linear Algebra, vector calculus. Bayesian statistics, and applied math is a little foreign to me. I hope to get a really solid understanding in maths and applied maths because I want to do applied ml research. I’m hoping to learn the skills that when I have a problem, and I know what I would like to optimize for, how to represent this mathematically, which formulas to use etc. And I really just want to dig deep into the math. Anyways I’m looking for a Programme in the EU where I would be learning exactly that. I tried teaching myself but my work projects suck up too much time. Next question would also be if it’s necessary to do another masters or if I can jump right into a PhD. Anyways, I’d appreciate recommendations on mathematically rigorous Programms with a focus on statistics and applied math! submitted by /u/ameli__c [link] [comments]  ( 90 min )
    [R] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
    https://arxiv.org/abs/2208.07339 submitted by /u/That_Violinist_18 [link] [comments]  ( 101 min )
    [D] What would be the best way to match an image to a database of existing images?
    I run a business where we print customer artwork on a tees, hoodies and we want to build a system that we can take a photo of a printed garment, and associate it with an existing image in our database of our customer's logos to see which order this particular item belongs too. Is there any starting points for me to looking into building such a system? EDIT: Thanks for all the great answers! To add a bit more context, we only have a few hundred current and recent artworks we need to associate the printed tee shirt too. Some of them may be a complex full colour logo, others are simple words or short sentences or something in between. All are stored in our database as a 300 DPI transparent png file. The reason we need to associate the t-shirt to an order, is that by this association, we print the correct fedex or other courier label. We currently use a QR code sticker on each item, but these often fall off, and take a bit of time to scan. I was hoping to put a camera on the table where the t-shirts are folded so the system can identify which customer the t-shirt belongs too, so it can print the correct courier label. submitted by /u/onthedoleagain [link] [comments]  ( 97 min )
    [R] Learning with Differentiable Algorithms
    submitted by /u/hardmaru [link] [comments]  ( 103 min )
  • Open

    How do you compute rewards when you are using parallel environments?
    I am looking at this repo: https://github.com/marlbenchmark/on-policy/blob/af4dc22aaf05b281d9e2e4f43c9ebb9eca48137e/onpolicy/runner/separated/mpe_runner.py#L67 And I am wondering: how do you compute rewards when you have multiple parallel environments? Do you take an average? I don't see any reference here to multiple rollout threads, so what happens when you do have some? Thanks! submitted by /u/No_Possibility_7588 [link] [comments]  ( 89 min )
    What modifications can maximize the efficacy of the REINFORCE algorithm for a policy gradient task?
    I am straying out of my domain knowledge to attempt a basic reinforcement learning task in a toy environment and have become fairly familiar with the REINFORCE algorithm for policy gradient agents, especially PyTorch’s implementation (found here). It is clear to me now that there are superior methods to train RL agents (PPO for instance), but as I read, these feel beyond my current intellectual or time resources. As such, I’d like to eek out as much power through modifications of REINFORCE as possible before determining how I might move on. As such, are there modifications to the REINFORCE training algorithm that might yield benefits without straying far into new algorithm territory? Or perhaps, what is the “SOTA” version of REINFORCE? For instance, perhaps a simple gradient clip in some way approximates some of PPO’s benefits? Or maybe setting a baseline reward based on a rolling reward set of previous episodes? ----------------------------- If a specific context is useful, I’m applying this to a self-made simple grid environment where agents receive points namely for moving closer to and acquiring “targets.” There are other rewards of lesser significance, but key is the environment is set for a sort of “continuous play” such that agents are very frequently receiving rewards (due to closeness) and occasionally receiving reward spikes (due to getting a target), but there is no true episode definition other than an arbitrary timestep length. I am not using batches (perhaps that is useful?), as I have found that gradually stepping up the episode length allows the agents quicker access to simpler rewards and appears a sort of scaffolding to more complex behavior. Agents might reasonably gather the “large” rewards in as few as 10-25 steps. Generally things are working fine, and I am most interested in how to extract as much value from the agent updating mechanism (REINFORCE) as possible. submitted by /u/jshkk [link] [comments]  ( 90 min )
    Learning continuous actions from images.
    Hi everyone, I'm working on a robotics project where I'm learning a task from raw RGB images. the issue I'm having is that when I use a discrete action space it works very well whereas, whenever I use a continuous action space the model doesn't converge and the reward fluctuates a lot. anyone might have an explanation? submitted by /u/Many_Reception_4921 [link] [comments]  ( 87 min )
    How to view image a 3d image using custom callback
    Hi guys, i have a custom environment which has a 3d numpy image, and i am doing path planning in this 3d array/image ,what i want to do is create a custom call back such that i can do real time tracking of the path formed i.e i dont want to plot image everytime just the path on the image ,so how can i do tha submitted by /u/Historical-Stock-750 [link] [comments]  ( 88 min )
    How to convert timestep based learning to episodic learning
    I am using a custom environment to to path planning using ddpg algorithm , model = DDPG("MlpPolicy", env, action_noise=action_noise, verbose=1) model.learn(total_timesteps=10000, log_interval=1) model.save("sb3_ddpg_model") here model.learn is used for timestep based learning but i want to convert it to certain like 3000 steps per episode and have multiple episodes,how can i achieve that? submitted by /u/Historical-Stock-750 [link] [comments]  ( 99 min )
  • Open

    Graphing Japanese Prefectures
    The two previous posts looked at adjacency networks. The first used examples of US states and Texas counties. The second post made suggestions for using these networks in a classroom. This post is a continuation of the previous post using examples from Japan. Japan is divided into 8 regions and 47 prefectures. Here is a […] Graphing Japanese Prefectures first appeared on John D. Cook.  ( 5 min )
    Classroom exercise with networks
    In the previous post I looked at graphs created from representing geographic regions with nodes and connecting nodes with edges if the corresponding regions share a border. It’s an interesting exercise to recover the geographic regions from the network. For example, take a look at the graph for the continental United States. It’s easy to […] Classroom exercise with networks first appeared on John D. Cook.  ( 5 min )
    Adjacency networks
    Suppose you want to color a map with no two bordering regions having the same color. If this is a map on a plane, you can do this using only four colors, but maybe you’d like to use more. You can reduce the problem to coloring the nodes in a graph. Each node corresponds to […] Adjacency networks first appeared on John D. Cook.  ( 4 min )
  • Open

    Face skin analyzer with Fast.ai and Gradio
    Fast.ai is a revolutionary library created by Jeremy Howard who was a former Kaggle no 1 Grandmaster. He has developed the Fast.ai course…  ( 7 min )
  • Open

    A quick question about feeding the output of one neural network to another to adjust predicted values. Is it sensible or could it create/reinforce bias?
    Hello, So basically I am working in a very slow manner on the project that will try to predict sequential data. So basically, let's say that we have 2 neural nets: 1) LSTM neural net to predict next item from the given sequention 2) Common neural net (layer dense) to adjust prediction of the 1'st neural net. So let's say the LSTM predicts that the value will be 25, but in real life the value is 24.33, is there any sense in feeding the prediction of 25 with etiquete of 24.33 to the second neural network to adjust the prediction? My Data consists of 15 000 items in each attribute. I am just asking because I am not sure if it's even something that might give good results or something that will just reinforce bias or create one. And since creating, training and evaluating neural nets take a lot of time I thought I'd better ask :) Thanks for help in advance :) submitted by /u/skollehatti [link] [comments]  ( 89 min )
  • Open

    The need for a more human-centered approach to designing and validating transparent AI in medical image analysis -- Guidelines and Evidence from a Systematic Review. (arXiv:2112.12596v2 [cs.HC] UPDATED)
    Transparency in Machine Learning (ML), attempts to reveal the working mechanisms of complex models. Transparent ML promises to advance human factors engineering goals of human-centered AI in the target users. From a human-centered design perspective, transparency is not a property of the ML model but an affordance, i.e. a relationship between algorithm and user; as a result, iterative prototyping and evaluation with users is critical to attaining adequate solutions that afford transparency. However, following human-centered design principles in healthcare and medical image analysis is challenging due to the limited availability of and access to end users. To investigate the state of transparent ML in medical image analysis, we conducted a systematic review of the literature. Our review reveals multiple severe shortcomings in the design and validation of transparent ML for medical image analysis applications. We find that most studies to date approach transparency as a property of the model itself, similar to task performance, without considering end users during neither development nor evaluation. Additionally, the lack of user research, and the sporadic validation of transparency claims put contemporary research on transparent ML for medical image analysis at risk of being incomprehensible to users, and thus, clinically irrelevant. To alleviate these shortcomings in forthcoming research while acknowledging the challenges of human-centered design in healthcare, we introduce the INTRPRT guideline, a systematic design directive for transparent ML systems in medical image analysis. The INTRPRT guideline suggests formative user research as the first step of transparent model design to understand user needs and domain requirements. Following this process produces evidence to support design choices, and ultimately, increases the likelihood that the algorithms afford transparency.  ( 3 min )
    Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence. (arXiv:2112.14466v2 [cs.AI] UPDATED)
    Explainable artificial intelligence and interpretable machine learning are research domains growing in importance. Yet, the underlying concepts remain somewhat elusive and lack generally agreed definitions. While recent inspiration from social sciences has refocused the work on needs and expectations of human recipients, the field still misses a concrete conceptualisation. We take steps towards addressing this challenge by reviewing the philosophical and social foundations of human explainability, which we then translate into the technological realm. In particular, we scrutinise the notion of algorithmic black boxes and the spectrum of understanding determined by explanatory processes and explainees' background knowledge. This approach allows us to define explainability as (logical) reasoning applied to transparent insights (into, possibly black-box, predictive systems) interpreted under background knowledge and placed within a specific context -- a process that engenders understanding in a selected group of explainees. We then employ this conceptualisation to revisit strategies for evaluating explainability as well as the much disputed trade-off between transparency and predictive power, including its implications for ante-hoc and post-hoc techniques along with fairness and accountability established by explainability. We furthermore discuss components of the machine learning workflow that may be in need of interpretability, building on a range of ideas from human-centred explainability, with a particular focus on explainees, contrastive statements and explanatory processes. Our discussion reconciles and complements current research to help better navigate open questions -- rather than attempting to address any individual issue -- thus laying a solid foundation for a grounded discussion and future progress of explainable artificial intelligence and interpretable machine learning.  ( 3 min )
    Controllable Data Generation by Deep Learning: A Review. (arXiv:2207.09542v3 [cs.LG] UPDATED)
    Designing and generating new data under targeted properties has been attracting various critical applications such as molecule design, image editing and speech synthesis. Traditional hand-crafted approaches heavily rely on expertise experience and intensive human efforts, yet still suffer from the insufficiency of scientific knowledge and low throughput to support effective and efficient data generation. Recently, the advancement of deep learning induces expressive methods that can learn the underlying representation and properties of data. Such capability provides new opportunities in figuring out the mutual relationship between the structural patterns and functional properties of the data and leveraging such relationship to generate structural data given the desired properties. This article provides a systematic review of this promising research area, commonly known as controllable deep data generation. Firstly, the potential challenges are raised and preliminaries are provided. Then the controllable deep data generation is formally defined, a taxonomy on various techniques is proposed and the evaluation metrics in this specific domain are summarized. After that, exciting applications of controllable deep data generation are introduced and existing works are experimentally analyzed and compared. Finally, the promising future directions of controllable deep data generation are highlighted and five potential challenges are identified.  ( 3 min )
    Exact Recovery in the General Hypergraph Stochastic Block Model. (arXiv:2105.04770v2 [cs.IT] UPDATED)
    This paper investigates fundamental limits of exact recovery in the general d-uniform hypergraph stochastic block model (d-HSBM), wherein n nodes are partitioned into k disjoint communities with relative sizes (p1,..., pk). Each subset of nodes with cardinality d is generated independently as an order-d hyperedge with a certain probability that depends on the ground-truth communities that the d nodes belong to. The goal is to exactly recover the k hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and d-HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.  ( 3 min )
    A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit. (arXiv:2202.05767v2 [cs.LG] UPDATED)
    This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one (the symmetric two-armed Bernoulli bandit). In a regime where the gap between these means goes to zero and the number of prediction periods approaches infinity, we obtain the leading order terms of the expected regret and pseudoregret for this problem by associating each of them with a solution of a linear parabolic partial differential equation. Our results improve upon the previously known results; specifically, we explicitly compute the leading order term of the optimal regret and pseudoregret in three different scaling regimes for the gap. Additionally, we obtain new non-asymptotic bounds for any given time horizon.  ( 2 min )
    Extending Open Bandit Pipeline to Simulate Industry Challenges. (arXiv:2209.04147v1 [cs.LG])
    Bandit algorithms are often used in the e-commerce industry to train Machine Learning (ML) systems when pre-labeled data is unavailable. However, the industry setting poses various challenges that make implementing bandit algorithms in practice non-trivial. In this paper, we elaborate on the challenges of off-policy optimisation, delayed reward, concept drift, reward design, and business rules constraints that practitioners at Booking.com encounter when applying bandit algorithms. Our main contributions is an extension to the Open Bandit Pipeline (OBP) framework. We provide simulation components for some of the above-mentioned challenges to provide future practitioners, researchers, and educators with a resource to address challenges encountered in the e-commerce industry.  ( 2 min )
    Self-Supervised Learning of Context-Aware Pitch Prosody Representations. (arXiv:2007.09060v4 [cs.SD] CROSS LISTED)
    In music and speech, meaning is derived at multiple levels of context. Affect, for example, can be inferred both by a short sound token and by sonic patterns over a longer temporal window such as an entire recording. In this letter, we focus on inferring meaning from this dichotomy of contexts. We show how contextual representations of short sung vocal lines can be implicitly learned from fundamental frequency ($F_0$) and thus be used as a meaningful feature space for downstream Music Information Retrieval (MIR) tasks. We propose three self-supervised deep learning paradigms which leverage pseudotask learning of these two levels of context to produce latent representation spaces. We evaluate the usefulness of these representations by embedding unseen pitch contours into each space and conducting downstream classification tasks. Our results show that contextual representation can enhance downstream classification by as much as 15\% as compared to using traditional statistical contour features.  ( 2 min )
    Saliency Guided Adversarial Training for Learning Generalizable Features with Applications to Medical Imaging Classification System. (arXiv:2209.04326v1 [eess.IV])
    This work tackles a central machine learning problem of performance degradation on out-of-distribution (OOD) test sets. The problem is particularly salient in medical imaging based diagnosis system that appears to be accurate but fails when tested in new hospitals/datasets. Recent studies indicate the system might learn shortcut and non-relevant features instead of generalizable features, so-called good features. We hypothesize that adversarial training can eliminate shortcut features whereas saliency guided training can filter out non-relevant features; both are nuisance features accounting for the performance degradation on OOD test sets. With that, we formulate a novel model training scheme for the deep neural network to learn good features for classification and/or detection tasks ensuring a consistent generalization performance on OOD test sets. The experimental results qualitatively and quantitatively demonstrate the superior performance of our method using the benchmark CXR image data sets on classification tasks.  ( 2 min )
    Using Probabilistic Machine Learning to Better Model Temporal Patterns in Parameterizations: a case study with the Lorenz 96 model. (arXiv:2203.14814v3 [cs.LG] UPDATED)
    The modelling of small-scale processes is a major source of error in climate models, hindering the accuracy of low-cost models which must approximate such processes through parameterization. Red noise is essential to many operational parameterization schemes, helping model temporal correlations. We show how to build on the successes of red noise by combining the known benefits of stochasticity with machine learning. This is done using a physically-informed recurrent neural network within a probabilistic framework. Our model is competitive and often superior to both a bespoke baseline and an existing probabilistic machine learning approach (GAN) when applied to the Lorenz 96 atmospheric simulation. This is due to its superior ability to model temporal patterns compared to standard first-order autoregressive schemes. It also generalises to unseen scenarios. We evaluate across a number of metrics from the literature, and also discuss the benefits of using the probabilistic metric of hold-out likelihood.  ( 3 min )
    JR2net: A Joint Non-Linear Representation and Recovery Network for Compressive Spectral Imaging. (arXiv:2205.07770v2 [eess.IV] UPDATED)
    Deep learning models are state-of-the-art in compressive spectral imaging (CSI) recovery. These methods use a deep neural network (DNN) as an image generator to learn non-linear mapping from compressed measurements to the spectral image. For instance, the deep spectral prior approach uses a convolutional autoencoder network (CAE) in the optimization algorithm to recover the spectral image by using a non-linear representation. However, the CAE training is detached from the recovery problem, which does not guarantee optimal representation of the spectral images for the CSI problem. This work proposes a joint non-linear representation and recovery network (JR2net), linking the representation and recovery task into a single optimization problem. JR2net consists of an optimization-inspired network following an ADMM formulation that learns a non-linear low-dimensional representation and simultaneously performs the spectral image recovery, trained via the end-to-end approach. Experimental results show the superiority of the proposed method with improvements up to 2.57 dB in PSNR and performance around 2000 times faster than state-of-the-art methods.  ( 2 min )
    Differentially Private Decoding in Large Language Models. (arXiv:2205.13621v2 [cs.CL] UPDATED)
    Recent large-scale natural language processing (NLP) systems use a pre-trained Large Language Model (LLM) on massive and diverse corpora as a headstart. In practice, the pre-trained model is adapted to a wide array of tasks via fine-tuning on task-specific datasets. LLMs, while effective, have been shown to memorize instances of training data thereby potentially revealing private information processed during pre-training. The potential leakage might further propagate to the downstream tasks for which LLMs are fine-tuned. On the other hand, privacy-preserving algorithms usually involve retraining from scratch, which is prohibitively expensive for LLMs. In this work, we propose a simple, easy to interpret, and computationally lightweight perturbation mechanism to be applied to an already trained model at the decoding stage. Our perturbation mechanism is model-agnostic and can be used in conjunction with any LLM. We provide theoretical analysis showing that the proposed mechanism is differentially private, and experimental results showing a privacy-utility trade-off.  ( 2 min )
    Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-based Approaches. (arXiv:2111.06740v2 [cs.LG] UPDATED)
    In crowd scenarios, predicting trajectories of pedestrians is a complex and challenging task depending on many external factors. The topology of the scene and the interactions between the pedestrians are just some of them. Due to advancements in data-science and data collection technologies deep learning methods have recently become a research hotspot in numerous domains. Therefore, it is not surprising that more and more researchers apply these methods to predict trajectories of pedestrians. This paper compares these relatively new deep learning algorithms with classical knowledge-based models that are widely used to simulate pedestrian dynamics. It provides a comprehensive literature review of both approaches, explores technical and application oriented differences, and addresses open questions as well as future development directions. Our investigations point out that the pertinence of knowledge-based models to predict local trajectories is nowadays questionable because of the high accuracy of the deep learning algorithms. Nevertheless, the ability of deep-learning algorithms for large-scale simulation and the description of collective dynamics remains to be demonstrated. Furthermore, the comparison shows that the combination of both approaches (the hybrid approach) seems to be promising to overcome disadvantages like the missing explainability of the deep learning approach.  ( 3 min )
    Rare but Severe Neural Machine Translation Errors Induced by Minimal Deletion: An Empirical Study on Chinese and English. (arXiv:2209.02145v2 [cs.CL] UPDATED)
    We examine the inducement of rare but severe errors in English-Chinese and Chinese-English in-domain neural machine translation by minimal deletion of the source text with character-based models. By deleting a single character, we find that we can induce severe errors in the translation. We categorize these errors and compare the results of deleting single characters and single words. We also examine the effect of training data size on the number and types of pathological cases induced by these minimal perturbations, finding significant variation.  ( 2 min )
    Data-Driven Deep Learning Based Hybrid Beamforming for Aerial Massive MIMO-OFDM Systems with Implicit CSI. (arXiv:2201.06778v3 [eess.SP] UPDATED)
    In an aerial hybrid massive multiple-input multiple-output (MIMO) and orthogonal frequency division multiplexing (OFDM) system, how to design a spectral-efficient broadband multi-user hybrid beamforming with a limited pilot and feedback overhead is challenging. To this end, by modeling the key transmission modules as an end-to-end (E2E) neural network, this paper proposes a data-driven deep learning (DL)-based unified hybrid beamforming framework for both the time division duplex (TDD) and frequency division duplex (FDD) systems with implicit channel state information (CSI). For TDD systems, the proposed DL-based approach jointly models the uplink pilot combining and downlink hybrid beamforming modules as an E2E neural network. While for FDD systems, we jointly model the downlink pilot transmission, uplink CSI feedback, and downlink hybrid beamforming modules as an E2E neural network. Different from conventional approaches separately processing different modules, the proposed solution simultaneously optimizes all modules with the sum rate as the optimization object. Therefore, by perceiving the inherent property of air-to-ground massive MIMO-OFDM channel samples, the DL-based E2E neural network can establish the mapping function from the channel to the beamformer, so that the explicit channel reconstruction can be avoided with reduced pilot and feedback overhead. Besides, practical low-resolution phase shifters (PSs) introduce the quantization constraint, leading to the intractable gradient backpropagation when training the neural network. To mitigate the performance loss caused by the phase quantization error, we adopt the transfer learning strategy to further fine-tune the E2E neural network based on a pre-trained network that assumes the ideal infinite-resolution PSs. Numerical results show that our DL-based schemes have considerable advantages over state-of-the-art schemes.  ( 3 min )
    The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. (arXiv:2102.06387v4 [cs.LG] UPDATED)
    We consider training models on private data that are distributed across user devices. To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server. We present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. We provide a novel privacy analysis for sums of discrete Gaussians and carefully analyze the effects of data quantization and modular summation arithmetic. Our theoretical guarantees highlight the complex tension between communication, privacy, and accuracy. Our extensive experimental results demonstrate that our solution is essentially able to match the accuracy to central differential privacy with less than 16 bits of precision per value.  ( 2 min )
    EDeNN: Event Decay Neural Networks for low latency vision. (arXiv:2209.04362v1 [cs.CV])
    Despite the success of neural networks in computer vision tasks, digital 'neurons' are a very loose approximation of biological neurons. Today's learning approaches are designed to function on digital devices with digital data representations such as image frames. In contrast, biological vision systems are generally much more capable and efficient than state-of-the-art digital computer vision algorithms. Event cameras are an emerging sensor technology which imitates biological vision with asynchronously firing pixels, eschewing the concept of the image frame. To leverage modern learning techniques, many event-based algorithms are forced to accumulate events back to image frames, somewhat squandering the advantages of event cameras. We follow the opposite paradigm and develop a new type of neural network which operates closer to the original event data stream. We demonstrate state-of-the-art performance in angular velocity regression and competitive optical flow estimation, while avoiding difficulties related to training SNN. Furthermore, the processing latency of our proposed approached is less than 1/10 any other implementation, while continuous inference increases this improvement by another order of magnitude.  ( 2 min )
    Random Vector Functional Link Networks for Function Approximation on Manifolds. (arXiv:2007.15776v2 [stat.ML] UPDATED)
    The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried introducing randomness to reduce the learning requirement. Based on the original construction of Igelnik and Pao, single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice, but the necessary theoretical justification is lacking. In this paper, we begin to fill this theoretical gap. We provide a (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number $n$ of network nodes. We then extend this result to the non-asymptotic setting, proving that one can achieve any desired approximation error with high probability provided $n$ is sufficiently large. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space, providing theoretical guarantees in both the asymptotic and non-asymptotic forms. Finally, we illustrate our results on manifolds with numerical experiments.  ( 3 min )
    Hcore-Init: Neural Network Initialization based on Graph Degeneracy. (arXiv:2004.07636v2 [cs.LG] UPDATED)
    Neural networks are the pinnacle of Artificial Intelligence, as in recent years we witnessed many novel architectures, learning and optimization techniques for deep learning. Capitalizing on the fact that neural networks inherently constitute multipartite graphs among neuron layers, we aim to analyze directly their structure to extract meaningful information that can improve the learning process. To our knowledge graph mining techniques for enhancing learning in neural networks have not been thoroughly investigated. In this paper we propose an adapted version of the k-core structure for the complete weighted multipartite graph extracted from a deep learning architecture. As a multipartite graph is a combination of bipartite graphs, that are in turn the incidence graphs of hypergraphs, we design k-hypercore decomposition, the hypergraph analogue of k-core degeneracy. We applied k-hypercore to several neural network architectures, more specifically to convolutional neural networks and multilayer perceptrons for image recognition tasks after a very short pretraining. Then we used the information provided by the hypercore numbers of the neurons to re-initialize the weights of the neural network, thus biasing the gradient optimization scheme. Extensive experiments proved that k-hypercore outperforms the state-of-the-art initialization methods.  ( 3 min )
    Trustworthy Federated Learning via Blockchain. (arXiv:2209.04418v1 [cs.LG])
    The safety-critical scenarios of artificial intelligence (AI), such as autonomous driving, Internet of Things, smart healthcare, etc., have raised critical requirements of trustworthy AI to guarantee the privacy and security with reliable decisions. As a nascent branch for trustworthy AI, federated learning (FL) has been regarded as a promising privacy preserving framework for training a global AI model over collaborative devices. However, security challenges still exist in the FL framework, e.g., Byzantine attacks from malicious devices, and model tampering attacks from malicious server, which will degrade or destroy the accuracy of trained global AI model. In this paper, we shall propose a decentralized blockchain based FL (B-FL) architecture by using a secure global aggregation algorithm to resist malicious devices, and deploying practical Byzantine fault tolerance consensus protocol with high effectiveness and low energy consumption among multiple edge servers to prevent model tampering from the malicious server. However, to implement B-FL system at the network edge, multiple rounds of cross-validation in blockchain consensus protocol will induce long training latency. We thus formulate a network optimization problem that jointly considers bandwidth and power allocation for the minimization of long-term average training latency consisting of progressive learning rounds. We further propose to transform the network optimization problem as a Markov decision process and leverage the deep reinforcement learning based algorithm to provide high system performance with low computational complexity. Simulation results demonstrate that B-FL can resist malicious attacks from edge devices and servers, and the training latency of B-FL can be significantly reduced by deep reinforcement learning based algorithm compared with baseline algorithms.  ( 3 min )
    The Surprising Positive Knowledge Transfer in Continual 3D Object Shape Reconstruction. (arXiv:2101.07295v5 [cs.LG] UPDATED)
    Continual learning has been extensively studied for classification tasks with methods developed to primarily avoid catastrophic forgetting, a phenomenon where earlier learned concepts are forgotten at the expense of more recent samples. In this work, we present a set of continual 3D object shape reconstruction tasks, including complete 3D shape reconstruction from different input modalities, as well as visible surface (2.5D) reconstruction which, surprisingly demonstrate positive knowledge (backward and forward) transfer when training with solely standard SGD and without additional heuristics. We provide evidence that continuously updated representation learning of single-view 3D shape reconstruction improves the performance on learned and novel categories over time. We provide a novel analysis of knowledge transfer ability by looking at the output distribution shift across sequential learning tasks. Finally, we show that the robustness of these tasks leads to the potential of having a proxy representation learning task for continual classification. The codebase, dataset and pre-trained models released with this article can be found at https://github.com/rehg-lab/CLRec  ( 3 min )
    MICO: Selective Search with Mutual Information Co-training. (arXiv:2209.04378v1 [cs.IR])
    In contrast to traditional exhaustive search, selective search first clusters documents into several groups before all the documents are searched exhaustively by a query, to limit the search executed within one group or only a few groups. Selective search is designed to reduce the latency and computation in modern large-scale search systems. In this study, we propose MICO, a Mutual Information CO-training framework for selective search with minimal supervision using the search logs. After training, MICO does not only cluster the documents, but also routes unseen queries to the relevant clusters for efficient retrieval. In our empirical experiments, MICO significantly improves the performance on multiple metrics of selective search and outperforms a number of existing competitive baselines.  ( 2 min )
    Differentially Private Stochastic Gradient Descent with Low-Noise. (arXiv:2209.04188v1 [stat.ML])
    In this paper, by introducing a low-noise condition, we study privacy and utility (generalization) performances of differentially private stochastic gradient descent (SGD) algorithms in a setting of stochastic convex optimization (SCO) for both pointwise and pairwise learning problems. For pointwise learning, we establish sharper excess risk bounds of order $\mathcal{O}\Big( \frac{\sqrt{d\log(1/\delta)}}{n\epsilon} \Big)$ and $\mathcal{O}\Big( {n^{- \frac{1+\alpha}{2}}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ for the $(\epsilon,\delta)$-differentially private SGD algorithm for strongly smooth and $\alpha$-H\"older smooth losses, respectively, where $n$ is the sample size and $d$ is the dimensionality. For pairwise learning, inspired by \cite{lei2020sharper,lei2021generalization}, we propose a simple private SGD algorithm based on gradient perturbation which satisfies $(\epsilon,\delta)$-differential privacy, and develop novel utility bounds for the proposed algorithm. In particular, we prove that our algorithm can achieve excess risk rates $\mathcal{O}\Big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ with gradient complexity $\mathcal{O}(n)$ and $\mathcal{O}\big(n^{\frac{2-\alpha}{1+\alpha}}+n\big)$ for strongly smooth and $\alpha$-H\"older smooth losses, respectively. Further, faster learning rates are established in a low-noise setting for both smooth and non-smooth losses. To the best of our knowledge, this is the first utility analysis which provides excess population bounds better than $\mathcal{O}\Big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ for privacy-preserving pairwise learning.  ( 2 min )
    BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?. (arXiv:2105.04949v4 [cs.CL] UPDATED)
    Analogies play a central role in human commonsense reasoning. The ability to recognize analogies such as "eye is to seeing what ear is to hearing", sometimes referred to as analogical proportions, shape how we structure knowledge and understand language. Surprisingly, however, the task of identifying such analogies has not yet received much attention in the language model era. In this paper, we analyze the capabilities of transformer-based language models on this unsupervised task, using benchmarks obtained from educational settings, as well as more commonly used datasets. We find that off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations, and results are highly sensitive to model architecture and hyperparameters. Overall the best results were obtained with GPT-2 and RoBERTa, while configurations using BERT were not able to outperform word embedding models. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations.  ( 3 min )
    Investigation of a Machine learning methodology for the SKA pulsar search pipeline. (arXiv:2209.04430v1 [astro-ph.IM])
    The SKA pulsar search pipeline will be used for real time detection of pulsars. Modern radio telescopes such as SKA will be generating petabytes of data in their full scale of operation. Hence experience-based and data-driven algorithms become indispensable for applications such as candidate detection. Here we describe our findings from testing a state of the art object detection algorithm called Mask R-CNN to detect candidate signatures in the SKA pulsar search pipeline. We have trained the Mask R-CNN model to detect candidate images. A custom annotation tool was developed to mark the regions of interest in large datasets efficiently. We have successfully demonstrated this algorithm by detecting candidate signatures on a simulation dataset. The paper presents details of this work with a highlight on the future prospects.  ( 2 min )
    Zydeco-Style Spike Sorting Low Power VLSI Architecture for IoT BCI Implants. (arXiv:2209.04427v1 [cs.AR])
    Brain Computer Interface (BCI) has great potential for solving many brain signal analysis limitations, mental disorder resolutions, and restoring missing limb functionality via neural-controlled implants. However, there is no single available, and safe implant for daily life usage exists yet. Most of the proposed implants have several implementation issues, such as infection hazards and heat dissipation, which limits their usability and makes it more challenging to pass regulations and quality control production. The wireless implant does not require a chronic wound in the skull. However, the current complex clustering neuron identification algorithms inside the implant chip consume a lot of power and bandwidth, causing higher heat dissipation issues and draining the implant's battery. The spike sorting is the core unit of an invasive BCI chip, which plays a significant role in power consumption, accuracy, and area. Therefore, in this study, we propose a low-power adaptive simplified VLSI architecture, "Zydeco-Style," for BCI spike sorting that is computationally less complex with higher accuracy that performs up to 93.5% in the worst-case scenario. The architecture uses a low-power Bluetooth Wireless communication module with external IoT medical ICU devices. The proposed architecture was implemented and simulated in Verilog. In addition, we are proposing an implant conceptual design.  ( 3 min )
    Survey: Leakage and Privacy at Inference Time. (arXiv:2107.01614v2 [cs.LG] UPDATED)
    Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance as commercial and government applications of ML can draw on multiple sources of data, potentially including users' and clients' sensitive data. We provide a comprehensive survey of contemporary advances on several fronts, covering involuntary data leakage which is natural to ML models, potential malevolent leakage which is caused by privacy attacks, and currently available defence mechanisms. We focus on inference-time leakage, as the most likely scenario for publicly available models. We first discuss what leakage is in the context of different data, tasks, and model architectures. We then propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications. We conclude with outstanding challenges and open questions, outlining some promising directions for future research.  ( 2 min )
    Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health. (arXiv:2209.04356v1 [cs.LG])
    In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data.  ( 2 min )
    Fast Neural Kernel Embeddings for General Activations. (arXiv:2209.04121v1 [cs.LG])
    Infinite width limit has shed light on generalization and optimization aspects of deep learning by establishing connections between neural networks and kernel methods. Despite their importance, the utility of these kernel methods was limited in large-scale learning settings due to their (super-)quadratic runtime and memory complexities. Moreover, most prior works on neural kernels have focused on the ReLU activation, mainly due to its popularity but also due to the difficulty of computing such kernels for general activations. In this work, we overcome such difficulties by providing methods to work with general activations. First, we compile and expand the list of activation functions admitting exact dual activation expressions to compute neural kernels. When the exact computation is unknown, we present methods to effectively approximate them. We propose a fast sketching method that approximates any multi-layered Neural Network Gaussian Process (NNGP) kernel and Neural Tangent Kernel (NTK) matrices for a wide range of activation functions, going beyond the commonly analyzed ReLU activation. This is done by showing how to approximate the neural kernels using the truncated Hermite expansion of any desired activation functions. While most prior works require data points on the unit sphere, our methods do not suffer from such limitations and are applicable to any dataset of points in $\mathbb{R}^d$. Furthermore, we provide a subspace embedding for NNGP and NTK matrices with near input-sparsity runtime and near-optimal target dimension which applies to any \emph{homogeneous} dual activation functions with rapidly convergent Taylor expansion. Empirically, with respect to exact convolutional NTK (CNTK) computation, our method achieves $106\times$ speedup for approximate CNTK of a 5-layer Myrtle network on CIFAR-10 dataset.  ( 3 min )
    Fuzzy Attention Neural Network to Tackle Discontinuity in Airway Segmentation. (arXiv:2209.02048v2 [eess.IV] UPDATED)
    Airway segmentation is crucial for the examination, diagnosis, and prognosis of lung diseases, while its manual delineation is unduly burdensome. To alleviate this time-consuming and potentially subjective manual procedure, researchers have proposed methods to automatically segment airways from computerized tomography (CT) images. However, some small-sized airway branches (e.g., bronchus and terminal bronchioles) significantly aggravate the difficulty of automatic segmentation by machine learning models. In particular, the variance of voxel values and the severe data imbalance in airway branches make the computational module prone to discontinuous and false-negative predictions. especially for cohorts with different lung diseases. Attention mechanism has shown the capacity to segment complex structures, while fuzzy logic can reduce the uncertainty in feature representations. Therefore, the integration of deep attention networks and fuzzy theory, given by the fuzzy attention layer, should be an escalated solution for better generalization and robustness. This paper presents an efficient method for airway segmentation, comprising a novel fuzzy attention neural network and a comprehensive loss function to enhance the spatial continuity of airway segmentation. The deep fuzzy set is formulated by a set of voxels in the feature map and a learnable Gaussian membership function. Different from the existing attention mechanism, the proposed channel-specific fuzzy attention addresses the issue of heterogeneous features in different channels. Furthermore, a novel evaluation metric is proposed to assess both the continuity and completeness of airway structures. The efficiency, generalization and robustness of the proposed method have been proved by training on normal lung disease while testing on datasets of lung cancer, COVID-19 and pulmonary fibrosis.
    On a Conjecture Regarding the Adam Optimizer. (arXiv:2111.08162v4 [cs.LG] UPDATED)
    Why does the Adam optimizer work so well in deep-learning applications? Adam's originators, Kingma and Ba, presented a mathematical argument that was meant to help explain its success, but Bock and colleagues have since reported that a key piece is missing from that argument $-$ an unproven lemma which we will call Bock's conjecture. Here we show that this conjecture is false, but we prove a modified version of it $-$ a generalization of a result of Reddi and colleagues $-$ which can take its place in analyses of Adam.
    Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics. (arXiv:2204.04308v2 [cs.LG] UPDATED)
    This paper focuses on robotic reinforcement learning with sparse rewards for natural language goal representations. An open problem is the sample-inefficiency that stems from the compositionality of natural language, and from the grounding of language in sensory data and actions. We address these issues with three contributions. We first present a mechanism for hindsight instruction replay utilizing expert feedback. Second, we propose a seq2seq model to generate linguistic hindsight instructions. Finally, we present a novel class of language-focused learning tasks. We show that hindsight instructions improve the learning performance, as expected. In addition, we also provide an unexpected result: We show that the learning performance of our agent can be improved by one third if, in a sense, the agent learns to talk to itself in a self-supervised manner. We achieve this by learning to generate linguistic instructions that would have been appropriate as a natural language goal for an originally unintended behavior. Our results indicate that the performance gain increases with the task-complexity.
    PhML-DyR: A Physics-Informed ML framework for Dynamic Reconfiguration in Power Systems. (arXiv:2206.06789v2 [eess.SY] UPDATED)
    A transformation of the US electricity sector is underway with aggressive targets to achieve 100% carbon pollution-free electricity by 2035. To achieve this objective while maintaining a safe and reliable power grid, new operating paradigms are needed, of computationally fast and accurate decision making in a dynamic and uncertain environment. We propose a novel physics-informed machine learning framework for the decision of dynamic grid reconfiguration (PhML-DyR), a key task in power systems. Dynamic reconfiguration (DyR) is a process by which switch-states are dynamically set so as to lead to an optimal grid topology that minimizes line losses. To address the underlying computational complexities of NP-hardness due to the mixed nature of the decision variables, we propose the use of physics-informed ML (PhML) which integrates both operating constraints and topological and connectivity constraints into a neural network framework. Our PhML approach learns to simultaneously optimize grid topology and generator dispatch to meet loads, increase efficiency, and remain within safe operating limits. We demonstrate the effectiveness of PhML-DyR on a canonical grid, showing a reduction in electricity loss by 23%, and improved voltage profiles. We also show a reduction in constraint violations by an order of magnitude as well as in training time using PhML-DyR.
    Newton methods based convolution neural networks using parallel processing. (arXiv:2112.01401v2 [cs.LG] UPDATED)
    Training of convolutional neural networks is a high dimensional and a non-convex optimization problem. At present, it is inefficient in situations where parametric learning rates can not be confidently set. Some past works have introduced Newton methods for training deep neural networks. Newton methods for convolutional neural networks involve complicated operations. Finding the Hessian matrix in second-order methods becomes very complex as we mainly use the finite differences method with the image data. Newton methods for convolutional neural networks deals with this by using the sub-sampled Hessian Newton methods. In this paper, we have used the complete data instead of the sub-sampled methods that only handle partial data at a time. Further, we have used parallel processing instead of serial processing in mini-batch computations. The results obtained using parallel processing in this study, outperform the time taken by the previous approach.
    On Positional and Structural Node Features for Graph Neural Networks on Non-attributed Graphs. (arXiv:2107.01495v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have been widely used in various graph-related problems such as node classification and graph classification, where superior performance is mainly established when natural node features are available. However, it is not well understood how GNNs work without natural node features, especially regarding the various ways to construct artificial ones. In this paper, we point out the two types of artificial node features, i.e., positional and structural node features, and provide insights on why each of them is more appropriate for certain tasks, i.e., positional node classification, structural node classification, and graph classification. Extensive experimental results on 10 benchmark datasets validate our insights, thus leading to a practical guideline on the choices between different artificial node features for GNNs on non-attributed graphs. The code is available at https://github.com/zjzijielu/gnn-positional-structural-node-features.
    Condensing Graphs via One-Step Gradient Matching. (arXiv:2206.07746v3 [cs.LG] UPDATED)
    As training deep learning models on large dataset takes a lot of time and resources, it is desired to construct a small synthetic dataset with which we can train deep learning models sufficiently. There are recent works that have explored solutions on condensing image datasets through complex bi-level optimization. For instance, dataset condensation (DC) matches network gradients w.r.t. large-real data and small-synthetic data, where the network weights are optimized for multiple steps at each outer iteration. However, existing approaches have their inherent limitations: (1) they are not directly applicable to graphs where the data is discrete; and (2) the condensation process is computationally expensive due to the involved nested optimization. To bridge the gap, we investigate efficient dataset condensation tailored for graph datasets where we model the discrete graph structure as a probabilistic model. We further propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. Extensive experiments on various graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance and our method is significantly faster than multi-step gradient matching (e.g. 15x in CIFAR10 for synthesizing 500 graphs). Code is available at \url{https://github.com/amazon-research/DosCond}.
    Bit-Metric Decoding Rate in Multi-User MIMO Systems: Applications. (arXiv:2203.06273v3 [cs.IT] UPDATED)
    This is the second part of a two-part paper that focuses on link-adaptation (LA) and physical layer (PHY) abstraction for multi-user MIMO (MU-MIMO) systems with non-linear receivers. The first part proposes a new metric, called bit-metric decoding rate (BMDR) for a detector, as being the equivalent of post-equalization signal-to-interference-noise ratio (SINR) for non-linear receivers. Since this BMDR does not have a closed form expression, a machine-learning based approach to estimate it effectively is presented. In this part, the concepts developed in the first part are utilized to develop novel algorithms for LA, dynamic detector selection from a list of available detectors, and PHY abstraction in MU-MIMO systems with arbitrary receivers. Extensive simulation results that substantiate the efficacy of the proposed algorithms are presented.
    RASR: Risk-Averse Soft-Robust MDPs with EVaR and Entropic Risk. (arXiv:2209.04067v1 [cs.LG])
    Prior work on safe Reinforcement Learning (RL) has studied risk-aversion to randomness in dynamics (aleatory) and to model uncertainty (epistemic) in isolation. We propose and analyze a new framework to jointly model the risk associated with epistemic and aleatory uncertainties in finite-horizon and discounted infinite-horizon MDPs. We call this framework that combines Risk-Averse and Soft-Robust methods RASR. We show that when the risk-aversion is defined using either EVaR or the entropic risk, the optimal policy in RASR can be computed efficiently using a new dynamic program formulation with a time-dependent risk level. As a result, the optimal risk-averse policies are deterministic but time-dependent, even in the infinite-horizon discounted setting. We also show that particular RASR objectives reduce to risk-averse RL with mean posterior transition probabilities. Our empirical results show that our new algorithms consistently mitigate uncertainty as measured by EVaR and other standard risk measures.
    HyperMAML: Few-Shot Adaptation of Deep Models with Hypernetworks. (arXiv:2205.15745v2 [cs.LG] UPDATED)
    The aim of Few-Shot learning methods is to train models which can easily adapt to previously unseen tasks, based on small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the general weights of the meta-model, which are further adapted to specific problems in a small number of gradient steps. However, the model's main limitation lies in the fact that the update procedure is realized by gradient-based optimisation. In consequence, MAML cannot always modify weights to the essential level in one or even a few gradient iterations. On the other hand, using many gradient steps results in a complex and time-consuming optimization procedure, which is hard to train in practice, and may lead to overfitting. In this paper, we propose HyperMAML, a novel generalization of MAML, where the training of the update procedure is also part of the model. Namely, in HyperMAML, instead of updating the weights with gradient descent, we use for this purpose a trainable Hypernetwork. Consequently, in this framework, the model can generate significant updates whose range is not limited to a fixed number of gradient steps. Experiments show that HyperMAML consistently outperforms MAML and performs comparably to other state-of-the-art techniques in a number of standard Few-Shot learning benchmarks.
    ANITA: An Optimal Loopless Accelerated Variance-Reduced Gradient Method. (arXiv:2103.11333v3 [math.OC] UPDATED)
    In this paper, we propose a novel accelerated gradient method called ANITA for solving the fundamental finite-sum optimization problems. Concretely, we consider both general convex and strongly convex settings: i) For general convex finite-sum problems, ANITA improves previous state-of-the-art result given by Varag (Lan et al., 2019). In particular, for large-scale problems or the convergence error is not very small, i.e., $n \geq \frac{1}{\epsilon^2}$, ANITA obtains the \emph{first} optimal result $O(n)$, matching the lower bound $\Omega(n)$ provided by Woodworth and Srebro (2016), while previous results are $O(n \log \frac{1}{\epsilon})$ of Varag (Lan et al., 2019) and $O(\frac{n}{\sqrt{\epsilon}})$ of Katyusha (Allen-Zhu, 2017). ii) For strongly convex finite-sum problems, we also show that ANITA can achieve the optimal convergence rate $O\big((n+\sqrt{\frac{nL}{\mu}})\log\frac{1}{\epsilon}\big)$ matching the lower bound $\Omega\big((n+\sqrt{\frac{nL}{\mu}})\log\frac{1}{\epsilon}\big)$ provided by Lan and Zhou (2015). Besides, ANITA enjoys a simpler loopless algorithmic structure unlike previous accelerated algorithms such as Varag (Lan et al., 2019) and Katyusha (Allen-Zhu, 2017) where they use double-loop structures. Moreover, we provide a novel \emph{dynamic multi-stage convergence analysis}, which is the key technical part for improving previous results to the optimal rates. We believe that our new theoretical rates and novel convergence analysis for the fundamental finite-sum problem will directly lead to key improvements for many other related problems, such as distributed/federated/decentralized optimization problems (e.g., Li and Richt\'arik, 2021). Finally, the numerical experiments show that ANITA converges faster than the previous state-of-the-art Varag (Lan et al., 2019), validating our theoretical results and confirming the practical superiority of ANITA.
    Constrained multi-objective optimization of process design parameters in settings with scarce data: an application to adhesive bonding. (arXiv:2112.08760v2 [cs.NE] UPDATED)
    Adhesive joints are increasingly used in industry for a wide variety of applications because of their favorable characteristics such as high strength-to-weight ratio, design flexibility, limited stress concentrations, planar force transfer, good damage tolerance and fatigue resistance. Finding the optimal process parameters for an adhesive bonding process is challenging: the optimization is inherently multi-objective (aiming to maximize break strength while minimizing cost) and constrained (the process should not result in any visual damage to the materials, and stress tests should not result in failures that are adhesion-related). Real life physical experiments in the lab are expensive to perform; traditional evolutionary approaches (such as genetic algorithms) are then ill-suited to solve the problem, due to the prohibitive amount of experiments required for evaluation. In this research, we successfully applied specific machine learning techniques (Gaussian Process Regression and Logistic Regression) to emulate the objective and constraint functions based on a \emph{limited} amount of experimental data. The techniques are embedded in a Bayesian optimization algorithm, which succeeds in detecting Pareto-optimal process settings in a highly efficient way (i.e., requiring a limited number of extra experiments).
    Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning. (arXiv:2110.03146v2 [math.OC] UPDATED)
    The solution of multistage stochastic linear problems (MSLP) represents a challenge for many applications. Long-term hydrothermal dispatch planning (LHDP) materializes this challenge in a real-world problem that affects electricity markets, economies, and natural resources worldwide. No closed-form solutions are available for MSLP and the definition of non-anticipative policies with high-quality out-of-sample performance of is crucial. Linear decision rules (LDR) provide an interesting simulation-based framework for finding high-quality policies to MSLP through two-stage stochastic models. In practical applications, however, the number of parameters to be estimated when using an LDR may be close or higher than the number of scenarios of the sample average approximation problem, thereby generating an in-sample overfit and poor performances in out-of-sample simulations. In this paper, we propose a novel regularized LDR to solve MSLP based on the AdaLASSO (adaptive least absolute shrinkage and selection operator). The goal is to use the parsimony principle as largely studied in high-dimensional linear regression models to obtain better out-of-sample performance for a LDR applied to MSLP. Computational experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve the LHDP, one of the most studied MSLP with relevant applications in industry. Our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark: 1) significant reductions in the number of non-zero coefficients (model parsimony), 2) substantial cost reductions in out-of-sample evaluations, and 3) improved spot-price profiles.
    Learning Branching Heuristics for Propositional Model Counting. (arXiv:2007.03204v2 [cs.LG] UPDATED)
    Propositional model counting, or #SAT, is the problem of computing the number of satisfying assignments of a Boolean formula. Many problems from different application areas, including many discrete probabilistic inference problems, can be translated into model counting problems to be solved by #SAT solvers. Exact #SAT solvers, however, are often not scalable to industrial size instances. In this paper, we present Neuro#, an approach for learning branching heuristics to improve the performance of exact #SAT solvers on instances from a given family of problems. We experimentally show that our method reduces the step count on similarly distributed held-out instances and generalizes to much larger instances from the same problem family. It is able to achieve these results on a number of different problem families having very different structures. In addition to step count improvements, Neuro# can also achieve orders of magnitude wall-clock speedups over the vanilla solver on larger instances in some problem families, despite the runtime overhead of querying the model.  ( 3 min )
    Shapley value-based approaches to explain the robustness of classifiers in machine learning. (arXiv:2209.04254v1 [cs.LG])
    In machine learning, the use of algorithm-agnostic approaches is an emerging area of research for explaining the contribution of individual features towards the predicted outcome. Whilst there is a focus on explaining the prediction itself, a little has been done on explaining the robustness of these models, that is, how each feature contributes towards achieving that robustness. In this paper, we propose the use of Shapley values to explain the contribution of each feature towards the model's robustness, measured in terms of Receiver-operating Characteristics (ROC) curve and the Area under the ROC curve (AUC). With the help of an illustrative example, we demonstrate the proposed idea of explaining the ROC curve, and visualising the uncertainties in these curves. For imbalanced datasets, the use of Precision-Recall Curve (PRC) is considered more appropriate, therefore we also demonstrate how to explain the PRCs with the help of Shapley values.
    SC-Square: Future Progress with Machine Learning?. (arXiv:2209.04361v1 [cs.SC])
    The algorithms employed by our communities are often underspecified, and thus have multiple implementation choices, which do not effect the correctness of the output, but do impact the efficiency or even tractability of its production. In this extended abstract, to accompany a keynote talk at the 2021 SC-Square Workshop, we survey recent work (both the author's and from the literature) on the use of Machine Learning technology to improve algorithms of interest to SC-Square.
    Knowledge-based Deep Learning for Modeling Chaotic Systems. (arXiv:2209.04259v1 [cs.LG])
    Deep Learning has received increased attention due to its unbeatable success in many fields, such as computer vision, natural language processing, recommendation systems, and most recently in simulating multiphysics problems and predicting nonlinear dynamical systems. However, modeling and forecasting the dynamics of chaotic systems remains an open research problem since training deep learning models requires big data, which is not always available in many cases. Such deep learners can be trained from additional information obtained from simulated results and by enforcing the physical laws of the chaotic systems. This paper considers extreme events and their dynamics and proposes elegant models based on deep neural networks, called knowledge-based deep learning (KDL). Our proposed KDL can learn the complex patterns governing chaotic systems by jointly training on real and simulated data directly from the dynamics and their differential equations. This knowledge is transferred to model and forecast real-world chaotic events exhibiting extreme behavior. We validate the efficiency of our model by assessing it on three real-world benchmark datasets: El Nino sea surface temperature, San Juan Dengue viral infection, and Bj{\o}rn{\o}ya daily precipitation, all governed by extreme events' dynamics. Using prior knowledge of extreme events and physics-based loss functions to lead the neural network learning, we ensure physically consistent, generalizable, and accurate forecasting, even in a small data regime.
    Majority Vote for Distributed Differentially Private Sign Selection. (arXiv:2209.04419v1 [cs.CR])
    Privacy-preserving data analysis has become prevailing in recent years. In this paper, we propose a distributed group differentially private majority vote mechanism for the sign selection problem in a distributed setup. To achieve this, we apply the iterative peeling to the stability function and use the exponential mechanism to recover the signs. As applications, we study the private sign selection for mean estimation and linear regression problems in distributed systems. Our method recovers the support and signs with the optimal signal-to-noise ratio as in the non-private scenario, which is better than contemporary works of private variable selections. Moreover, the sign selection consistency is justified with theoretical guarantees. Simulation studies are conducted to demonstrate the effectiveness of our proposed method.
    ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference. (arXiv:2209.04161v1 [cs.AR])
    Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.
    Robust-by-Design Classification via Unitary-Gradient Neural Networks. (arXiv:2209.04293v1 [cs.LG])
    The use of neural networks in safety-critical systems requires safe and robust models, due to the existence of adversarial attacks. Knowing the minimal adversarial perturbation of any input x, or, equivalently, knowing the distance of x from the classification boundary, allows evaluating the classification robustness, providing certifiable predictions. Unfortunately, state-of-the-art techniques for computing such a distance are computationally expensive and hence not suited for online applications. This work proposes a novel family of classifiers, namely Signed Distance Classifiers (SDCs), that, from a theoretical perspective, directly output the exact distance of x from the classification boundary, rather than a probability score (e.g., SoftMax). SDCs represent a family of robust-by-design classifiers. To practically address the theoretical requirements of a SDC, a novel network architecture named Unitary-Gradient Neural Network is presented. Experimental results show that the proposed architecture approximates a signed distance classifier, hence allowing an online certifiable classification of x at the cost of a single inference.
    Survey on Deep Fuzzy Systems in regression applications: a view on interpretability. (arXiv:2209.04230v1 [cs.LG])
    Regression problems have been more and more embraced by deep learning (DL) techniques. The increasing number of papers recently published in this domain, including surveys and reviews, shows that deep regression has captured the attention of the community due to efficiency and good accuracy in systems with high-dimensional data. However, many DL methodologies have complex structures that are not readily transparent to human users. Accessing the interpretability of these models is an essential factor for addressing problems in sensitive areas such as cyber-security systems, medical, financial surveillance, and industrial processes. Fuzzy logic systems (FLS) are inherently interpretable models, well known in the literature, capable of using nonlinear representations for complex systems through linguistic terms with membership degrees mimicking human thought. Within an atmosphere of explainable artificial intelligence, it is necessary to consider a trade-off between accuracy and interpretability for developing intelligent models. This paper aims to investigate the state-of-the-art on existing methodologies that combine DL and FLS, namely deep fuzzy systems, to address regression problems, configuring a topic that is currently not sufficiently explored in the literature and thus deserves a comprehensive survey.
    Anomaly Detection through Unsupervised Federated Learning. (arXiv:2209.04184v1 [cs.LG])
    Federated learning (FL) is proving to be one of the most promising paradigms for leveraging distributed resources, enabling a set of clients to collaboratively train a machine learning model while keeping the data decentralized. The explosive growth of interest in the topic has led to rapid advancements in several core aspects like communication efficiency, handling non-IID data, privacy, and security capabilities. However, the majority of FL works only deal with supervised tasks, assuming that clients' training sets are labeled. To leverage the enormous unlabeled data on distributed edge devices, in this paper, we aim to extend the FL paradigm to unsupervised tasks by addressing the problem of anomaly detection in decentralized settings. In particular, we propose a novel method in which, through a preprocessing phase, clients are grouped into communities, each having similar majority (i.e., inlier) patterns. Subsequently, each community of clients trains the same anomaly detection model (i.e., autoencoders) in a federated fashion. The resulting model is then shared and used to detect anomalies within the clients of the same community that joined the corresponding federated process. Experiments show that our method is robust, and it can detect communities consistent with the ideal partitioning in which groups of clients having the same inlier patterns are known. Furthermore, the performance is significantly better than those in which clients train models exclusively on local data and comparable with federated models of ideal communities' partition.
    Towards Confidence-guided Shape Completion for Robotic Applications. (arXiv:2209.04300v1 [cs.CV])
    Many robotic tasks involving some form of 3D visual perception greatly benefit from a complete knowledge of the working environment. However, robots often have to tackle unstructured environments and their onboard visual sensors can only provide incomplete information due to limited workspaces, clutter or object self-occlusion. In recent years, deep learning architectures for shape completion have begun taking traction as effective means of inferring a complete 3D object representation from partial visual data. Nevertheless, most of the existing state-of-the-art approaches provide a fixed output resolution in the form of voxel grids, strictly related to the size of the neural network output stage. While this is enough for some tasks, e.g. obstacle avoidance in navigation, grasping and manipulation require finer resolutions and simply scaling up the neural network outputs is computationally expensive. In this paper, we address this limitation by proposing an object shape completion method based on an implicit 3D representation providing a confidence value for each reconstructed point. As a second contribution, we propose a gradient-based method for efficiently sampling such implicit function at an arbitrary resolution, tunable at inference time. We experimentally validate our approach by comparing reconstructed shapes with ground truths, and by deploying our shape completion algorithm in a robotic grasping pipeline. In both cases, we compare results with a state-of-the-art shape completion approach.
    Design of a Supervisory Control System for Autonomous Operation of Advanced Reactors. (arXiv:2209.04334v1 [eess.SY])
    Advanced reactors deployed in the coming decades will face deregulated energy markets, and may adopt flexible operation to boost profitability. To aid in the transition from baseload to flexible operation paradigm, autonomous operation is sought. This work focuses on the control aspect of autonomous operation. Specifically, a hierarchical control system is designed to support constraint enforcement during routine operational transients. Within the system, data-driven modeling, physics-based state observation, and classical control algorithms are integrated to provide an adaptable and robust solution. A 320 MW Fluoride-cooled High-temperature Pebble-bed Reactor is the design basis for demonstrating the control system. The hierarchical control system consists of a supervisory layer and low-level layer. The supervisory layer receives requests to change the system's operating conditions, and accepts or rejects them based on constraints that have been assigned. Constraints are issued to keep the plant within an optimal operating region. The low-level layer interfaces with the actuators of the system to fulfill requested changes, while maintaining tracking and regulation duties. To accept requests at the supervisory layer, the Reference Governor algorithm was adopted. To model the dynamics of the reactor, a system identification algorithm, Dynamic Mode Decomposition, was utilized. To estimate the evolution of process variables that cannot be directly measured, the Unscented Kalman Filter was adopted, incorporating a nonlinear model of nuclear dynamics. The composition of these algorithms led to a numerical demonstration of constraint enforcement during a 40 % power drop transient. Adaptability of the proposed system was demonstrated by modifying the constraint values, and enforcing them during the transient. Robustness was also demonstrated by enforcing constraints under noisy environments.
    Algorithms with More Granular Differential Privacy Guarantees. (arXiv:2209.04053v1 [cs.CR])
    Differential privacy is often applied with a privacy parameter that is larger than the theory suggests is ideal; various informal justifications for tolerating large privacy parameters have been proposed. In this work, we consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis. In this framework, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person (i.e., all the attributes).
    FedDAR: Federated Domain-Aware Representation Learning. (arXiv:2209.04007v1 [cs.LG])
    Cross-silo Federated learning (FL) has become a promising tool in machine learning applications for healthcare. It allows hospitals/institutions to train models with sufficient data while the data is kept private. To make sure the FL model is robust when facing heterogeneous data among FL clients, most efforts focus on personalizing models for clients. However, the latent relationships between clients' data are ignored. In this work, we focus on a special non-iid FL problem, called Domain-mixed FL, where each client's data distribution is assumed to be a mixture of several predefined domains. Recognizing the diversity of domains and the similarity within domains, we propose a novel method, FedDAR, which learns a domain shared representation and domain-wise personalized prediction heads in a decoupled manner. For simplified linear regression settings, we have theoretically proved that FedDAR enjoys a linear convergence rate. For general settings, we have performed intensive empirical studies on both synthetic and real-world medical datasets which demonstrate its superiority over prior FL methods.
    Multiplierless MP-Kernel Machine For Energy-efficient Edge Devices. (arXiv:2106.01958v3 [cs.LG] UPDATED)
    We present a novel framework for designing multiplierless kernel machines that can be used on resource-constrained platforms like intelligent edge devices. The framework uses a piecewise linear (PWL) approximation based on a margin propagation (MP) technique and uses only addition/subtraction, shift, comparison, and register underflow/overflow operations. We propose a hardware-friendly MP-based inference and online training algorithm that has been optimized for a Field Programmable Gate Array (FPGA) platform. Our FPGA implementation eliminates the need for DSP units and reduces the number of LUTs. By reusing the same hardware for inference and training, we show that the platform can overcome classification errors and local minima artifacts that result from the MP approximation. The implementation of this proposed multiplierless MP-kernel machine on FPGA results in an estimated energy consumption of 13.4 pJ and power consumption of 107 mW with ~9k LUTs and FFs each for a 256 x 32 sized kernel making it superior in terms of power, performance, and area compared to other comparable implementations.
    Functional dimension of feedforward ReLU neural networks. (arXiv:2209.04036v1 [math.MG])
    It is well-known that the parameterized family of functions representable by fully-connected feedforward neural networks with ReLU activation function is precisely the class of piecewise linear functions with finitely many pieces. It is less well-known that for every fixed architecture of ReLU neural network, the parameter space admits positive-dimensional spaces of symmetries, and hence the local functional dimension near any given parameter is lower than the parametric dimension. In this work we carefully define the notion of functional dimension, show that it is inhomogeneous across the parameter space of ReLU neural network functions, and continue an investigation - initiated in [14] and [5] - into when the functional dimension achieves its theoretical maximum. We also study the quotient space and fibers of the realization map from parameter space to function space, supplying examples of fibers that are disconnected, fibers upon which functional dimension is non-constant, and fibers upon which the symmetry group acts non-transitively.
    In-situ animal behavior classification using knowledge distillation and fixed-point quantization. (arXiv:2209.04130v1 [cs.LG])
    We explore the use of knowledge distillation (KD) for learning compact and accurate models that enable classification of animal behavior from accelerometry data on wearable devices. To this end, we take a deep and complex convolutional neural network, known as residual neural network (ResNet), as the teacher model. ResNet is specifically designed for multivariate time-series classification. We use ResNet to distil the knowledge of animal behavior classification datasets into soft labels, which consist of the predicted pseudo-probabilities of every class for each datapoint. We then use the soft labels to train our significantly less complex student models, which are based on the gated recurrent unit (GRU) and multilayer perceptron (MLP). The evaluation results using two real-world animal behavior classification datasets show that the classification accuracy of the student GRU-MLP models improves appreciably through KD, approaching that of the teacher ResNet model. To further reduce the computational and memory requirements of performing inference using the student models trained via KD, we utilize dynamic fixed-point quantization through an appropriate modification of the computational graphs of the models. We implement both unquantized and quantized versions of the developed KD-based models on the embedded systems of our purpose-built collar and ear tag devices to classify animal behavior in situ and in real time. The results corroborate the effectiveness of KD and quantization in improving the inference performance in terms of both classification accuracy and computational and memory efficiency.
    Deep autoencoders for physics-constrained data-driven nonlinear materials modeling. (arXiv:2209.04416v1 [math.NA])
    Physics-constrained data-driven computing is an emerging computational paradigm that allows simulation of complex materials directly based on material database and bypass the classical constitutive model construction. However, it remains difficult to deal with high-dimensional applications and extrapolative generalization. This paper introduces deep learning techniques under the data-driven framework to address these fundamental issues in nonlinear materials modeling. To this end, an autoencoder neural network architecture is introduced to learn the underlying low-dimensional representation (embedding) of the given material database. The offline trained autoencoder and the discovered embedding space are then incorporated in the online data-driven computation such that the search of optimal material state from database can be performed on a low-dimensional space, aiming to enhance the robustness and predictability with projected material data. To ensure numerical stability and representative constitutive manifold, a convexity-preserving interpolation scheme tailored to the proposed autoencoder-based data-driven solver is proposed for constructing the material state. In this study, the applicability of the proposed approach is demonstrated by modeling nonlinear biological tissues. A parametric study on data noise, data size and sparsity, training initialization, and model architectures, is also conducted to examine the robustness and convergence property of the proposed approach.
    Cross-Subject Domain Adaptation for Classifying Working Memory Load with Multi-Frame EEG Images. (arXiv:2106.06769v4 [cs.LG] UPDATED)
    Working memory (WM), denoting the information temporally stored in the mind, is a fundamental research topic in the field of human cognition. Electroencephalograph (EEG), which can monitor the electrical activity of the brain, has been widely used in measuring the level of WM. However, one of the critical challenges is that individual differences may cause ineffective results, especially when the established model meets an unfamiliar subject. In this work, we propose a cross-subject deep adaptation model with spatial attention (CS-DASA) to generalize the workload classifications across subjects. First, we transform EEG time series into multi-frame EEG images incorporating spatial, spectral, and temporal information. First, the Subject-Shared module in CS-DASA receives multi-frame EEG image data from both source and target subjects and learns the common feature representations. Then, in the subject-specific module, the maximum mean discrepancy is implemented to measure the domain distribution divergence in a reproducing kernel Hilbert space, which can add an effective penalty loss for domain adaptation. Additionally, the subject-to-subject spatial attention mechanism is employed to focus on the discriminative spatial features from the target image data. Experiments conducted on a public WM EEG dataset containing 13 subjects show that the proposed model is capable of achieving better performance than existing state-of-the-art methods.
    Are Gradients on Graph Structure Reliable in Gray-box Attacks?. (arXiv:2208.05514v2 [cs.CR] UPDATED)
    Graph edge perturbations are dedicated to damaging the prediction of graph neural networks by modifying the graph structure. Previous gray-box attackers employ gradients from the surrogate model to locate the vulnerable edges to perturb the graph structure. However, unreliability exists in gradients on graph structures, which is rarely studied by previous works. In this paper, we discuss and analyze the errors caused by the unreliability of the structural gradients. These errors arise from rough gradient usage due to the discreteness of the graph structure and from the unreliability in the meta-gradient on the graph structure. In order to address these problems, we propose a novel attack model with methods to reduce the errors inside the structural gradients. We propose edge discrete sampling to select the edge perturbations associated with hierarchical candidate selection to ensure computational efficiency. In addition, semantic invariance and momentum gradient ensemble are proposed to address the gradient fluctuation on semantic-augmented graphs and the instability of the surrogate model. Experiments are conducted in untargeted gray-box poisoning scenarios and demonstrate the improvement in the performance of our approach.
    Generating Contextual Load Profiles Using a Conditional Variational Autoencoder. (arXiv:2209.04056v1 [cs.LG])
    Generating power system states that have similar distribution and dependency to the historical ones is essential for the tasks of system planning and security assessment, especially when the historical data is insufficient. In this paper, we described a generative model for load profiles of industrial and commercial customers, based on the conditional variational autoencoder (CVAE) neural network architecture, which is challenging due to the highly variable nature of such profiles. Generated contextual load profiles were conditioned on the month of the year and typical power exchange with the grid. Moreover, the quality of generations was both visually and statistically evaluated. The experimental results demonstrate our proposed CVAE model can capture temporal features of historical load profiles and generate `realistic' data with satisfying univariate distributions and multivariate dependencies.
    Estimating Multi-label Accuracy using Labelset Distributions. (arXiv:2209.04163v1 [cs.LG])
    A multi-label classifier estimates the binary label state (relevant vs irrelevant) for each of a set of concept labels, for any given instance. Probabilistic multi-label classifiers provide a predictive posterior distribution over all possible labelset combinations of such label states (the powerset of labels) from which we can provide the best estimate, simply by selecting the labelset corresponding to the largest expected accuracy, over that distribution. For example, in maximizing exact match accuracy, we provide the mode of the distribution. But how does this relate to the confidence we may have in such an estimate? Confidence is an important element of real-world applications of multi-label classifiers (as in machine learning in general) and is an important ingredient in explainability and interpretability. However, it is not obvious how to provide confidence in the multi-label context and relating to a particular accuracy metric, and nor is it clear how to provide a confidence which correlates well with the expected accuracy, which would be most valuable in real-world decision making. In this article we estimate the expected accuracy as a surrogate for confidence, for a given accuracy metric. We hypothesise that the expected accuracy can be estimated from the multi-label predictive distribution. We examine seven candidate functions for their ability to estimate expected accuracy from the predictive distribution. We found three of these to correlate to expected accuracy and are robust. Further, we determined that each candidate function can be used separately to estimate Hamming similarity, but a combination of the candidates was best for expected Jaccard index and exact match.
    Convolutional Neural Networks combined with Runge-Kutta Methods. (arXiv:1802.08831v7 [cs.CV] UPDATED)
    A convolutional neural network can be constructed using numerical methods for solving dynamical systems, since the forward pass of the network can be regarded as a trajectory of a dynamical system. However, existing models based on numerical solvers cannot avoid the iterations of implicit methods, which makes the models inefficient at inference time. In this paper, we reinterpret the pre-activation Residual Networks (ResNets) and their variants from the dynamical systems view. We consider that the iterations of implicit Runge-Kutta methods are fused into the training of these models. Moreover, we propose a novel approach to constructing network models based on high-order Runge-Kutta methods in order to achieve higher efficiency. Our proposed models are referred to as the Runge-Kutta Convolutional Neural Networks (RKCNNs). The RKCNNs are evaluated on multiple benchmark datasets. The experimental results show that RKCNNs are vastly superior to other dynamical system network models: they achieve higher accuracy with much fewer resources. They also expand the family of network models based on numerical methods for dynamical systems.
    The Role Of Biology In Deep Learning. (arXiv:2209.04425v1 [cs.NE])
    Artificial neural networks took a lot of inspiration from their biological counterparts in becoming our best machine perceptual systems. This work summarizes some of that history and incorporates modern theoretical neuroscience into experiments with artificial neural networks from the field of deep learning. Specifically, iterative magnitude pruning is used to train sparsely connected networks with 33x fewer weights without loss in performance. These are used to test and ultimately reject the hypothesis that weight sparsity alone improves image noise robustness. Recent work mitigated catastrophic forgetting using weight sparsity, activation sparsity, and active dendrite modeling. This paper replicates those findings, and extends the method to train convolutional neural networks on a more challenging continual learning task. The code has been made publicly available.
    Joint Non-parametric Point Process model for Treatments and Outcomes: Counterfactual Time-series Prediction Under Policy Interventions. (arXiv:2209.04142v1 [cs.LG])
    Policy makers need to predict the progression of an outcome before adopting a new treatment policy, which defines when and how a sequence of treatments affecting the outcome occurs in continuous time. Commonly, algorithms that predict interventional future outcome trajectories take a fixed sequence of future treatments as input. This either neglects the dependence of future treatments on outcomes preceding them or implicitly assumes the treatment policy is known, and hence excludes scenarios where the policy is unknown or a counterfactual analysis is needed. To handle these limitations, we develop a joint model for treatments and outcomes, which allows for the estimation of treatment policies and effects from sequential treatment--outcome data. It can answer interventional and counterfactual queries about interventions on treatment policies, as we show with real-world data on blood glucose progression and a simulation study building on top of this.
    Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL. (arXiv:2209.03993v1 [cs.LG])
    Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results by converting the RL task to a supervised learning task. Decision Transformer (DT) combines the conditional policy approach and Transformer architecture to show competitive performance against several benchmarks. However, DT lacks stitching ability -- one of the critical abilities for offline RL that learns the optimal policy from sub-optimal trajectories. The issue becomes significant when the offline dataset only contains sub-optimal trajectories. On the other hand, the conventional RL approaches based on Dynamic Programming (such as Q-learning) do not suffer the same issue; however, they suffer from unstable learning behaviours, especially when it employs function approximation in an off-policy learning setting. In this paper, we propose Q-learning Decision Transformer (QDT) that addresses the shortcomings of DT by leveraging the benefit of Dynamic Programming (Q-learning). QDT utilises the Dynamic Programming (Q-learning) results to relabel the return-to-go in the training data. We then train the DT with the relabelled data. Our approach efficiently exploits the benefits of these two approaches and compensates for each other's shortcomings to achieve better performance. We demonstrate the issue of DT and the advantage of QDT in a simple environment. We also evaluate QDT in the more complex D4RL benchmark showing good performance gains.
    Expected Worst Case Regret via Stochastic Sequential Covering. (arXiv:2209.04417v1 [cs.LG])
    We study the problem of sequential prediction and online minimax regret with stochastically generated features under a general loss function. We introduce a notion of expected worst case minimax regret that generalizes and encompasses prior known minimax regrets. For such minimax regrets we establish tight upper bounds via a novel concept of stochastic global sequential covering. We show that for a hypothesis class of VC-dimension $\mathsf{VC}$ and $i.i.d.$ generated features of length $T$, the cardinality of the stochastic global sequential covering can be upper bounded with high probability (whp) by $e^{O(\mathsf{VC} \cdot \log^2 T)}$. We then improve this bound by introducing a new complexity measure called the Star-Littlestone dimension, and show that classes with Star-Littlestone dimension $\mathsf{SL}$ admit a stochastic global sequential covering of order $e^{O(\mathsf{SL} \cdot \log T)}$. We further establish upper bounds for real valued classes with finite fat-shattering numbers. Finally, by applying information-theoretic tools of the fixed design minimax regrets, we provide lower bounds for the expected worst case minimax regret. We demonstrate the effectiveness of our approach by establishing tight bounds on the expected worst case minimax regrets for logarithmic loss and general mixable losses.
    Explanation Method for Anomaly Detection on Mixed Numerical and Categorical Spaces. (arXiv:2209.04173v1 [cs.LG])
    Most proposals in the anomaly detection field focus exclusively on the detection stage, specially in the recent deep learning approaches. While providing highly accurate predictions, these models often lack transparency, acting as "black boxes". This criticism has grown to the point that explanation is now considered very relevant in terms of acceptability and reliability. In this paper, we addressed this issue by inspecting the ADMNC (Anomaly Detection on Mixed Numerical and Categorical Spaces) model, an existing very accurate although opaque anomaly detector capable to operate with both numerical and categorical inputs. This work presents the extension EADMNC (Explainable Anomaly Detection on Mixed Numerical and Categorical spaces), which adds explainability to the predictions obtained with the original model. We preserved the scalability of the original method thanks to the Apache Spark framework. EADMNC leverages the formulation of the previous ADMNC model to offer pre hoc and post hoc explainability, while maintaining the accuracy of the original architecture. We present a pre hoc model that globally explains the outputs by segmenting input data into homogeneous groups, described with only a few variables. We designed a graphical representation based on regression trees, which supervisors can inspect to understand the differences between normal and anomalous data. Our post hoc explanations consist of a text-based template method that locally provides textual arguments supporting each detection. We report experimental results on extensive real-world data, particularly in the domain of network intrusion detection. The usefulness of the explanations is assessed by theory analysis using expert knowledge in the network intrusion domain.
    Fast and Accurate Importance Weighting for Correcting Sample Bias. (arXiv:2209.04215v1 [cs.LG])
    Bias in datasets can be very detrimental for appropriate statistical estimation. In response to this problem, importance weighting methods have been developed to match any biased distribution to its corresponding target unbiased distribution. The seminal Kernel Mean Matching (KMM) method is, nowadays, still considered as state of the art in this research field. However, one of the main drawbacks of this method is the computational burden for large datasets. Building on previous works by Huang et al. (2007) and de Mathelin et al. (2021), we derive a novel importance weighting algorithm which scales to large datasets by using a neural network to predict the instance weights. We show, on multiple public datasets, under various sample biases, that our proposed approach drastically reduces the computational time on large dataset while maintaining similar sample bias correction performance compared to other importance weighting methods. The proposed approach appears to be the only one able to give relevant reweighting in a reasonable time for large dataset with up to two million data.
    Assessing Lower Limb Strength using Internet-of-Things Enabled Chair and Processing of Time-Series Data in Google GPU Tensorflow CoLab. (arXiv:2209.04042v1 [cs.LG])
    This project describes the application of the technologies of Machine Learning and Internet-of-Things to assess the lower limb strength of individuals undergoing rehabilitation or therapy. Specifically, it seeks to measure and assess the progress of individuals by sensors attached to chairs and processing the data through Google GPU Tensorflow CoLab. Pressure sensors are attached to various locations on a chair, including but not limited to the seating area, backrest, hand rests, and legs. Sensor data from the individual performing both sit-to-stand transition and stand-to-sit transition provides a time series dataset regarding the pressure distribution and vibratory motion on the chair. The dataset and timing information can then be fed into a machine learning model to estimate the relative strength and weakness during various phases of the movement.
    Adversarial Examples in Constrained Domains. (arXiv:2011.01183v3 [cs.CR] UPDATED)
    Machine learning algorithms have been shown to be vulnerable to adversarial manipulation through systematic modification of inputs (e.g., adversarial examples) in domains such as image recognition. Under the default threat model, the adversary exploits the unconstrained nature of images; each feature (pixel) is fully under control of the adversary. However, it is not clear how these attacks translate to constrained domains that limit which and how features can be modified by the adversary (e.g., network intrusion detection). In this paper, we explore whether constrained domains are less vulnerable than unconstrained domains to adversarial example generation algorithms. We create an algorithm for generating adversarial sketches: targeted universal perturbation vectors which encode feature saliency within the envelope of domain constraints. To assess how these algorithms perform, we evaluate them in constrained (e.g., network intrusion detection) and unconstrained (e.g., image recognition) domains. The results demonstrate that our approaches generate misclassification rates in constrained domains that were comparable to those of unconstrained domains (greater than 95%). Our investigation shows that the narrow attack surface exposed by constrained domains is still sufficiently large to craft successful adversarial examples; and thus, constraints do not appear to make a domain robust. Indeed, with as little as five randomly selected features, one can still generate adversarial examples.
    Multi-objective hyperparameter optimization with performance uncertainty. (arXiv:2209.04340v1 [cs.LG])
    The performance of any Machine Learning (ML) algorithm is impacted by the choice of its hyperparameters. As training and evaluating a ML algorithm is usually expensive, the hyperparameter optimization (HPO) method needs to be computationally efficient to be useful in practice. Most of the existing approaches on multi-objective HPO use evolutionary strategies and metamodel-based optimization. However, few methods have been developed to account for uncertainty in the performance measurements. This paper presents results on multi-objective hyperparameter optimization with uncertainty on the evaluation of ML algorithms. We combine the sampling strategy of Tree-structured Parzen Estimators (TPE) with the metamodel obtained after training a Gaussian Process Regression (GPR) with heterogeneous noise. Experimental results on three analytical test functions and three ML problems show the improvement over multi-objective TPE and GPR, achieved with respect to the hypervolume indicator.  ( 2 min )
    ExpFinder: An Ensemble Expert Finding Model Integrating $N$-gram Vector Space Model and $\mu$CO-HITS. (arXiv:2101.06821v2 [cs.IR] UPDATED)
    Finding an expert plays a crucial role in driving successful collaborations and speeding up high-quality research development and innovations. However, the rapid growth of scientific publications and digital expertise data makes identifying the right experts a challenging problem. Existing approaches for finding experts given a topic can be categorised into information retrieval techniques based on vector space models, document language models, and graph-based models. In this paper, we propose $\textit{ExpFinder}$, a new ensemble model for expert finding, that integrates a novel $N$-gram vector space model, denoted as $n$VSM, and a graph-based model, denoted as $\textit{$\mu$CO-HITS}$, that is a proposed variation of the CO-HITS algorithm. The key of $n$VSM is to exploit recent inverse document frequency weighting method for $N$-gram words and $\textit{ExpFinder}$ incorporates $n$VSM into $\textit{$\mu$CO-HITS}$ to achieve expert finding. We comprehensively evaluate $\textit{ExpFinder}$ on four different datasets from the academic domains in comparison with six different expert finding models. The evaluation results show that $\textit{ExpFinder}$ is a highly effective model for expert finding, substantially outperforming all the compared models in 19% to 160.2%.  ( 3 min )
    Gaussian Process Koopman Mode Decomposition. (arXiv:2209.04111v1 [stat.ML])
    In this paper, we propose a nonlinear probabilistic generative model of Koopman mode decomposition based on an unsupervised Gaussian process. Existing data-driven methods for Koopman mode decomposition have focused on estimating the quantities specified by Koopman mode decomposition, namely, eigenvalues, eigenfunctions, and modes. Our model enables the simultaneous estimation of these quantities and latent variables governed by an unknown dynamical system. Furthermore, we introduce an efficient strategy to estimate the parameters of our model by low-rank approximations of covariance matrices. Applying the proposed model to both synthetic data and a real-world epidemiological dataset, we show that various analyses are available using the estimated parameters.  ( 2 min )
    Studying Drowsiness Detection Performance while Driving through Scalable Machine Learning Models using Electroencephalography. (arXiv:2209.04048v1 [eess.SP])
    Drowsiness is a major concern for drivers and one of the leading causes of traffic accidents. Advances in Cognitive Neuroscience and Computer Science have enabled the detection of drivers' drowsiness by using Brain-Computer Interfaces (BCIs) and Machine Learning (ML). Nevertheless, several challenges remain open and should be faced. First, a comprehensive enough evaluation of drowsiness detection performance using a heterogeneous set of ML algorithms is missing in the literature. Last, it is needed to study the detection performance of scalable ML models suitable for groups of subjects and compare it with the individual models proposed in the literature. To improve these limitations, this work presents an intelligent framework that employs BCIs and features based on electroencephalography (EEG) for detecting drowsiness in driving scenarios. The SEED-VIG dataset is used to feed different ML regressors and three-class classifiers and then evaluate, analyze, and compare the best-performing models for individual subjects and groups of them. More in detail, regarding individual models, Random Forest (RF) obtained a 78% f1-score, improving the 58% obtained by models used in the literature such as Support Vector Machine (SVM). Concerning scalable models, RF reached a 79% f1-score, demonstrating the effectiveness of these approaches. The lessons learned can be summarized as follows: i) not only SVM but also other models not sufficiently explored in the literature are relevant for drowsiness detection, and ii) scalable approaches suitable for groups of subjects are effective to detect drowsiness, even when new subjects that are not included in the models training are evaluated.  ( 3 min )
    Efficient Multi-view Clustering via Unified and Discrete Bipartite Graph Learning. (arXiv:2209.04187v1 [cs.LG])
    Although previous graph-based multi-view clustering algorithms have gained significant progress, most of them are still faced with three limitations. First, they often suffer from high computational complexity, which restricts their applications in large-scale scenarios. Second, they usually perform graph learning either at the single-view level or at the view-consensus level, but often neglect the possibility of the joint learning of single-view and consensus graphs. Third, many of them rely on the $k$-means for discretization of the spectral embeddings, which lack the ability to directly learn the graph with discrete cluster structure. In light of this, this paper presents an efficient multi-view clustering approach via unified and discrete bipartite graph learning (UDBGL). Specifically, the anchor-based subspace learning is incorporated to learn the view-specific bipartite graphs from multiple views, upon which the bipartite graph fusion is leveraged to learn a view-consensus bipartite graph with adaptive weight learning. Further, the Laplacian rank constraint is imposed to ensure that the fused bipartite graph has discrete cluster structures (with a specific number of connected components). By simultaneously formulating the view-specific bipartite graph learning, the view-consensus bipartite graph learning, and the discrete cluster structure learning into a unified objective function, an efficient minimization algorithm is then designed to tackle this optimization problem and directly achieve a discrete clustering solution without requiring additional partitioning, which notably has linear time complexity in data size. Experiments on a variety of multi-view datasets demonstrate the robustness and efficiency of our UDBGL approach.  ( 3 min )
    MATT: A Multiple-instance Attention Mechanism for Long-tail Music Genre Classification. (arXiv:2209.04109v1 [cs.SD])
    Imbalanced music genre classification is a crucial task in the Music Information Retrieval (MIR) field for identifying the long-tail, data-poor genre based on the related music audio segments, which is very prevalent in real-world scenarios. Most of the existing models are designed for class-balanced music datasets, resulting in poor performance in accuracy and generalization when identifying the music genres at the tail of the distribution. Inspired by the success of introducing Multi-instance Learning (MIL) in various classification tasks, we propose a novel mechanism named Multi-instance Attention (MATT) to boost the performance for identifying tail classes. Specifically, we first construct the bag-level datasets by generating the album-artist pair bags. Second, we leverage neural networks to encode the music audio segments. Finally, under the guidance of a multi-instance attention mechanism, the neural network-based models could select the most informative genre to match the given music segment. Comprehensive experimental results on a large-scale music genre benchmark dataset with long-tail distribution demonstrate MATT significantly outperforms other state-of-the-art baselines.  ( 2 min )
    Self-supervised Learning for Heterogeneous Graph via Structure Information based on Metapath. (arXiv:2209.04218v1 [cs.LG])
    graph neural networks (GNNs) are the dominant paradigm for modeling and handling graph structure data by learning universal node representation. The traditional way of training GNNs depends on a great many labeled data, which results in high requirements on cost and time. In some special scene, it is even unavailable and impracticable. Self-supervised representation learning, which can generate labels by graph structure data itself, is a potential approach to tackle this problem. And turning to research on self-supervised learning problem for heterogeneous graphs is more challenging than dealing with homogeneous graphs, also there are fewer studies about it. In this paper, we propose a SElfsupervised learning method for heterogeneous graph via Structure Information based on Metapath (SESIM). The proposed model can construct pretext tasks by predicting jump number between nodes in each metapath to improve the representation ability of primary task. In order to predict jump number, SESIM uses data itself to generate labels, avoiding time-consuming manual labeling. Moreover, predicting jump number in each metapath can effectively utilize graph structure information, which is the essential property between nodes. Therefore, SESIM deepens the understanding of models for graph structure. At last, we train primary task and pretext tasks jointly, and use meta-learning to balance the contribution of pretext tasks for primary task. Empirical results validate the performance of SESIM method and demonstrate that this method can improve the representation ability of traditional neural networks on link prediction task and node classification task.  ( 3 min )
    $\Delta$-PINNs: physics-informed neural networks on complex geometries. (arXiv:2209.03984v1 [cs.LG])
    Physics-informed neural networks (PINNs) have demonstrated promise in solving forward and inverse problems involving partial differential equations. Despite recent progress on expanding the class of problems that can be tackled by PINNs, most of existing use-cases involve simple geometric domains. To date, there is no clear way to inform PINNs about the topology of the domain where the problem is being solved. In this work, we propose a novel positional encoding mechanism for PINNs based on the eigenfunctions of the Laplace-Beltrami operator. This technique allows to create an input space for the neural network that represents the geometry of a given object. We approximate the eigenfunctions as well as the operators involved in the partial differential equations with finite elements. We extensively test and compare the proposed methodology against traditional PINNs in complex shapes, such as a coil, a heat sink and a bunny, with different physics, such as the Eikonal equation and heat transfer. We also study the sensitivity of our method to the number of eigenfunctions used, as well as the discretization used for the eigenfunctions and the underlying operators. Our results show excellent agreement with the ground truth data in cases where traditional PINNs fail to produce a meaningful solution. We envision this new technique will expand the effectiveness of PINNs to more realistic applications.  ( 3 min )
    Stochastic Compositional Optimization with Compositional Constraints. (arXiv:2209.04086v1 [math.OC])
    Stochastic compositional optimization (SCO) has attracted considerable attention because of its broad applicability to important real-world problems. However, existing works on SCO assume that the projection within a solution update is simple, which fails to hold for problem instances where the constraints are in the form of expectations, such as empirical conditional value-at-risk constraints. We study a novel model that incorporates single-level expected value and two-level compositional constraints into the current SCO framework. Our model can be applied widely to data-driven optimization and risk management, including risk-averse optimization and high-moment portfolio selection, and can handle multiple constraints. We further propose a class of primal-dual algorithms that generates sequences converging to the optimal solution at the rate of $\cO(\frac{1}{\sqrt{N}})$under both single-level expected value and two-level compositional constraints, where $N$ is the iteration counter, establishing the benchmarks in expected value constrained SCO.  ( 2 min )
    From Shapley Values to Generalized Additive Models and back. (arXiv:2209.04012v1 [cs.LG])
    In explainable machine learning, local post-hoc explanation algorithms and inherently interpretable models are often seen as competing approaches. In this work, offer a novel perspective on Shapley Values, a prominent post-hoc explanation technique, and show that it is strongly connected with Glassbox-GAMs, a popular class of interpretable models. We introduce $n$-Shapley Values, a natural extension of Shapley Values that explain individual predictions with interaction terms up to order $n$. As $n$ increases, the $n$-Shapley Values converge towards the Shapley-GAM, a uniquely determined decomposition of the original function. From the Shapley-GAM, we can compute Shapley Values of arbitrary order, which gives precise insights into the limitations of these explanations. We then show that Shapley Values recover generalized additive models of order $n$, assuming that we allow for interaction terms up to order $n$ in the explanations. This implies that the original Shapley Values recover Glassbox-GAMs. At the technical end, we show that there is a one-to-one correspondence between different ways to choose the value function and different functional decompositions of the original function. This provides a novel perspective on the question of how to choose the value function. We also present an empirical analysis of the degree of variable interaction that is present in various standard classifiers, and discuss the implications of our results for algorithmic explanations. A python package to compute $n$-Shapley Values and replicate the results in this paper is available at \url{https://github.com/tml-tuebingen/nshap}.  ( 3 min )
    Selecting Related Knowledge via Efficient Channel Attention for Online Continual Learning. (arXiv:2209.04212v1 [cs.CV])
    Continual learning aims to learn a sequence of tasks by leveraging the knowledge acquired in the past in an online-learning manner while being able to perform well on all previous tasks, this ability is crucial to the artificial intelligence (AI) system, hence continual learning is more suitable for most real-word and complex applicative scenarios compared to the traditional learning pattern. However, the current models usually learn a generic representation base on the class label on each task and an effective strategy is selected to avoid catastrophic forgetting. We postulate that selecting the related and useful parts only from the knowledge obtained to perform each task is more effective than utilizing the whole knowledge. Based on this fact, in this paper we propose a new framework, named Selecting Related Knowledge for Online Continual Learning (SRKOCL), which incorporates an additional efficient channel attention mechanism to pick the particular related knowledge for every task. Our model also combines experience replay and knowledge distillation to circumvent the catastrophic forgetting. Finally, extensive experiments are conducted on different benchmarks and the competitive experimental results demonstrate that our proposed SRKOCL is a promised approach against the state-of-the-art.  ( 2 min )
    Active Learning of Classifiers with Label and Seed Queries. (arXiv:2209.03996v1 [cs.LG])
    We study exact active learning of binary and multiclass classifiers with margin. Given an $n$-point set $X \subset \mathbb{R}^m$, we want to learn any unknown classifier on $X$ whose classes have finite strong convex hull margin, a new notion extending the SVM margin. In the standard active learning setting, where only label queries are allowed, learning a classifier with strong convex hull margin $\gamma$ requires in the worst case $\Omega\big(1+\frac{1}{\gamma}\big)^{(m-1)/2}$ queries. On the other hand, using the more powerful seed queries (a variant of equivalence queries), the target classifier could be learned in $O(m \log n)$ queries via Littlestone's Halving algorithm; however, Halving is computationally inefficient. In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\operatorname{poly}(n+m)$ using only $O(m^2 \log n)$ label queries and $O\big(m \log \frac{m}{\gamma}\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k!k^2$ multiplicative overhead. Similar results hold when the input points have bounded bit complexity, or when only one class has strong convex hull margin against the rest. We complement the upper bounds by showing that in the worst case any algorithm needs $\Omega\big(k m \log \frac{1}{\gamma}\big)$ seed and label queries to learn a $k$-class classifier with strong convex hull margin $\gamma$.  ( 3 min )
    Autoencoder Based Iterative Modeling and Multivariate Time-Series Subsequence Clustering Algorithm. (arXiv:2209.04213v1 [eess.SP])
    This paper introduces an algorithm for the detection of change-points and the identification of the corresponding subsequences in transient multivariate time-series data (MTSD). The analysis of such data has become more and more important due to the increase of availability in many industrial fields. Labeling, sorting or filtering highly transient measurement data for training condition based maintenance (CbM) models is cumbersome and error-prone. For some applications it can be sufficient to filter measurements by simple thresholds or finding change-points based on changes in mean value and variation. But a robust diagnosis of a component within a component group for example, which has a complex non-linear correlation between multiple sensor values, a simple approach would not be feasible. No meaningful and coherent measurement data which could be used for training a CbM model would emerge. Therefore, we introduce an algorithm which uses a recurrent neural network (RNN) based Autoencoder (AE) which is iteratively trained on incoming data. The scoring function uses the reconstruction error and latent space information. A model of the identified subsequence is saved and used for recognition of repeating subsequences as well as fast offline clustering. For evaluation, we propose a new similarity measure based on the curvature for a more intuitive time-series subsequence clustering metric. A comparison with seven other state-of-the-art algorithms and eight datasets shows the capability and the increased performance of our algorithm to cluster MTSD online and offline in conjunction with mechatronic systems.  ( 3 min )
    Modelling Patient Trajectories Using Multimodal Information. (arXiv:2209.04224v1 [cs.LG])
    Electronic Health Records (EHRs) aggregate diverse information at the patient level, holding a trajectory representative of the evolution of the patient health status throughout time. Although this information provides context and can be leveraged by physicians to monitor patient health and make more accurate prognoses/diagnoses, patient records can contain information from very long time spans, which combined with the rapid generation rate of medical data makes clinical decision making more complex. Patient trajectory modelling can assist by exploring existing information in a scalable manner, and can contribute in augmenting health care quality by fostering preventive medicine practices. We propose a solution to model patient trajectories that combines different types of information and considers the temporal aspect of clinical data. This solution leverages two different architectures: one supporting flexible sets of input features, to convert patient admissions into dense representations; and a second exploring extracted admission representations in a recurrent-based architecture, where patient trajectories are processed in sub-sequences using a sliding window mechanism. The developed solution was evaluated on two different clinical outcomes, unexpected patient readmission and disease progression, using the publicly available MIMIC-III clinical database. The results obtained demonstrate the potential of the first architecture to model readmission and diagnoses prediction using single patient admissions. While information from clinical text did not show the discriminative power observed in other existing works, this may be explained by the need to fine-tune the clinicalBERT model. Finally, we demonstrate the potential of the sequence-based architecture using a sliding window mechanism to represent the input data, attaining comparable performances to other existing solutions.  ( 3 min )
    Bridging the Gap: Differentially Private Equivariant Deep Learning for Medical Image Analysis. (arXiv:2209.04338v1 [eess.IV])
    Machine learning with formal privacy-preserving techniques like Differential Privacy (DP) allows one to derive valuable insights from sensitive medical imaging data while promising to protect patient privacy, but it usually comes at a sharp privacy-utility trade-off. In this work, we propose to use steerable equivariant convolutional networks for medical image analysis with DP. Their improved feature quality and parameter efficiency yield remarkable accuracy gains, narrowing the privacy-utility gap.  ( 2 min )
    FLInt: Exploiting Floating Point Enabled Integer Arithmetic for Efficient Random Forest Inference. (arXiv:2209.04181v1 [cs.LG])
    In many machine learning applications, e.g., tree-based ensembles, floating point numbers are extensively utilized due to their expressiveness. Nowadays performing data analysis on embedded devices from dynamic data masses becomes available, but such systems often lack hardware capabilities to process floating point numbers, introducing large overheads for their processing. Even if such hardware is present in general computing systems, using integer operations instead of floating point operations promises to reduce operation overheads and improve the performance. In this paper, we provide \mdname, a full precision floating point comparison for random forests, by only using integer and logic operations. To ensure the same functionality preserves, we formally prove the correctness of this comparison. Since random forests only require comparison of floating point numbers during inference, we implement \mdname~in low level realizations and therefore eliminate the need for floating point hardware entirely, by keeping the model accuracy unchanged. The usage of \mdname~basically boils down to a one-by-one replacement of conditions: For instance, a comparison statement in C: if(pX[3]<=(float)10.074347) becomes if((*(((int*)(pX))+3))<=((int)(0x41213087))). Experimental evaluation on X86 and ARMv8 desktop and server class systems shows that the execution time can be reduced by up to $\approx 30\%$ with our novel approach.  ( 2 min )
    SPT-NRTL: A physics-guided machine learning model to predict thermodynamically consistent activity coefficients. (arXiv:2209.04135v1 [physics.chem-ph])
    The availability of property data is one of the major bottlenecks in the development of chemical processes, often requiring time-consuming and expensive experiments or limiting the design space to a small number of known molecules. This bottleneck has been the motivation behind the continuing development of predictive property models. For the property prediction of novel molecules, group contribution methods have been groundbreaking. In recent times, machine learning has joined the more established property prediction models. However, even with recent successes, the integration of physical constraints into machine learning models remains challenging. Physical constraints are vital to many thermodynamic properties, such as the Gibbs-Dunham relation, introducing an additional layer of complexity into the prediction. Here, we introduce SPT-NRTL, a machine learning model to predict thermodynamically consistent activity coefficients and provide NRTL parameters for easy use in process simulations. The results show that SPT-NRTL achieves higher accuracy than UNIFAC in the prediction of activity coefficients across all functional groups and is able to predict many vapor-liquid-equilibria with near experimental accuracy, as illustrated for the exemplary mixtures water/ethanol and chloroform/n-hexane. To ease the application of SPT-NRTL, NRTL-parameters of 100 000 000 mixtures are calculated with SPT-NRTL and provided online.  ( 3 min )
    Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent. (arXiv:2102.05855v5 [stat.ML] UPDATED)
    What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is \emph{private}? How much is the contribution of each specific training epoch to the information leakage through the released model? We study this problem for noisy gradient descent algorithms, and model the \emph{dynamics} of R\'enyi differential privacy loss throughout the training process. Our analysis traces a provably \emph{tight} bound on the R\'enyi divergence between the pair of probability distributions over parameters of models trained on neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems (which over-estimate the privacy loss by upper-bounding its total value over all intermediate gradient computations). For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility with a small gradient complexity for noisy gradient descent algorithms.  ( 2 min )
    Dr. Neurosymbolic, or: How I Learned to Stop Worrying and Accept Statistics. (arXiv:2209.04049v1 [cs.AI])
    The symbolic AI community is increasingly trying to embrace machine learning in neuro-symbolic architectures, yet is still struggling due to cultural barriers. To break the barrier, this rather opinionated personal memo attempts to explain and rectify the conventions in Statistics, Machine Learning, and Deep Learning from the viewpoint of outsiders. It provides a step-by-step protocol for designing a machine learning system that satisfies a minimum theoretical guarantee necessary for being taken seriously by the symbolic AI community, i.e., it discusses "in what condition we can stop worrying and accept statistical machine learning." Some highlights: Most textbooks are written for those who plan to specialize in Stat/ML/DL and are supposed to accept jargons. This memo is for experienced symbolic researchers that hear a lot of buzz but are still uncertain and skeptical. Information on Stat/ML/DL is currently too scattered or too noisy to invest in. This memo prioritizes compactness and pays special attention to concepts that resonate well with symbolic paradigms. I hope this memo offers time savings. It prioritizes general mathematical modeling and does not discuss any specific function approximator, such as neural networks (NNs), SVMs, decision trees, etc. It is open to corrections. Consider this memo as something similar to a blog post taking the form of a paper on Arxiv.  ( 3 min )
    Online Low Rank Matrix Completion. (arXiv:2209.03997v1 [cs.LG])
    We study the problem of \textit{online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, we recommend one item per user. For each recommendation, we obtain a (noisy) reward sampled from a low-rank user-item reward matrix. The goal is to design an online method with sub-linear regret (in $\mathsf{T}$). While the problem can be mapped to the standard multi-armed bandit problem where each item is an \textit{independent} arm, it leads to poor regret as the correlation between arms and users is not exploited. In contrast, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of low-rank manifold. We overcome this challenge using an explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get non-trivial solution. We further improve our result for the rank-$1$ setting. Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using iterAtive user cLustering) that ensures nearly optimal regret bound of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $\mathsf{T}$.  ( 2 min )
  • Open

    Learning Branching Heuristics for Propositional Model Counting. (arXiv:2007.03204v2 [cs.LG] UPDATED)
    Propositional model counting, or #SAT, is the problem of computing the number of satisfying assignments of a Boolean formula. Many problems from different application areas, including many discrete probabilistic inference problems, can be translated into model counting problems to be solved by #SAT solvers. Exact #SAT solvers, however, are often not scalable to industrial size instances. In this paper, we present Neuro#, an approach for learning branching heuristics to improve the performance of exact #SAT solvers on instances from a given family of problems. We experimentally show that our method reduces the step count on similarly distributed held-out instances and generalizes to much larger instances from the same problem family. It is able to achieve these results on a number of different problem families having very different structures. In addition to step count improvements, Neuro# can also achieve orders of magnitude wall-clock speedups over the vanilla solver on larger instances in some problem families, despite the runtime overhead of querying the model.  ( 3 min )
    Optimal Learning Rates for Regularized Least-Squares with a Fourier Capacity Condition. (arXiv:2204.07856v2 [math.ST] UPDATED)
    We derive minimax adaptive rates for a new, broad class of Tikhonov-regularized learning problems in Hilbert scales under general source conditions. Our analysis does not require the regression function to be contained in the hypothesis class, and most notably does not employ the conventional \textit{a priori} assumptions on kernel eigendecay. Using the theory of interpolation, we demonstrate that the spectrum of the Mercer operator can be inferred in the presence of "tight'' $L^{\infty}$ embeddings of suitable Hilbert scales. Our analysis utilizes a new Fourier capacity condition, characterizes the optimal Lorentz range space of a modified Mercer operator in certain parameter regimes.  ( 2 min )
    Random Vector Functional Link Networks for Function Approximation on Manifolds. (arXiv:2007.15776v2 [stat.ML] UPDATED)
    The learning speed of feed-forward neural networks is notoriously slow and has presented a bottleneck in deep learning applications for several decades. For instance, gradient-based learning algorithms, which are used extensively to train neural networks, tend to work slowly when all of the network parameters must be iteratively tuned. To counter this, both researchers and practitioners have tried introducing randomness to reduce the learning requirement. Based on the original construction of Igelnik and Pao, single layer neural-networks with random input-to-hidden layer weights and biases have seen success in practice, but the necessary theoretical justification is lacking. In this paper, we begin to fill this theoretical gap. We provide a (corrected) rigorous proof that the Igelnik and Pao construction is a universal approximator for continuous functions on compact domains, with approximation error decaying asymptotically like $O(1/\sqrt{n})$ for the number $n$ of network nodes. We then extend this result to the non-asymptotic setting, proving that one can achieve any desired approximation error with high probability provided $n$ is sufficiently large. We further adapt this randomized neural network architecture to approximate functions on smooth, compact submanifolds of Euclidean space, providing theoretical guarantees in both the asymptotic and non-asymptotic forms. Finally, we illustrate our results on manifolds with numerical experiments.  ( 3 min )
    Estimating Heterogeneous Bounds for Treatment Effects under Sample Selection and Non-response. (arXiv:2209.04329v1 [econ.EM])
    In this paper we propose a method for nonparametric estimation and inference for heterogeneous bounds for causal effect parameters in general sample selection models where the initial treatment can affect whether a post-intervention outcome is observed or not. Treatment selection can be confounded by observable covariates while the outcome selection can be confounded by both observables and unobservables. The method provides conditional effect bounds as functions of policy relevant pre-treatment variables. It allows for conducting valid statistical inference on the unidentified conditional effect curves. We use a flexible semiparametric de-biased machine learning approach that can accommodate flexible functional forms and high-dimensional confounding variables between treatment, selection, and outcome processes. Easily verifiable high-level conditions for estimation and misspecification robust inference guarantees are provided as well.  ( 2 min )
    Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-based Approaches. (arXiv:2111.06740v2 [cs.LG] UPDATED)
    In crowd scenarios, predicting trajectories of pedestrians is a complex and challenging task depending on many external factors. The topology of the scene and the interactions between the pedestrians are just some of them. Due to advancements in data-science and data collection technologies deep learning methods have recently become a research hotspot in numerous domains. Therefore, it is not surprising that more and more researchers apply these methods to predict trajectories of pedestrians. This paper compares these relatively new deep learning algorithms with classical knowledge-based models that are widely used to simulate pedestrian dynamics. It provides a comprehensive literature review of both approaches, explores technical and application oriented differences, and addresses open questions as well as future development directions. Our investigations point out that the pertinence of knowledge-based models to predict local trajectories is nowadays questionable because of the high accuracy of the deep learning algorithms. Nevertheless, the ability of deep-learning algorithms for large-scale simulation and the description of collective dynamics remains to be demonstrated. Furthermore, the comparison shows that the combination of both approaches (the hybrid approach) seems to be promising to overcome disadvantages like the missing explainability of the deep learning approach.  ( 3 min )
    Majority Vote for Distributed Differentially Private Sign Selection. (arXiv:2209.04419v1 [cs.CR])
    Privacy-preserving data analysis has become prevailing in recent years. In this paper, we propose a distributed group differentially private majority vote mechanism for the sign selection problem in a distributed setup. To achieve this, we apply the iterative peeling to the stability function and use the exponential mechanism to recover the signs. As applications, we study the private sign selection for mean estimation and linear regression problems in distributed systems. Our method recovers the support and signs with the optimal signal-to-noise ratio as in the non-private scenario, which is better than contemporary works of private variable selections. Moreover, the sign selection consistency is justified with theoretical guarantees. Simulation studies are conducted to demonstrate the effectiveness of our proposed method.  ( 2 min )
    The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. (arXiv:2102.06387v4 [cs.LG] UPDATED)
    We consider training models on private data that are distributed across user devices. To ensure privacy, we add on-device noise and use secure aggregation so that only the noisy sum is revealed to the server. We present a comprehensive end-to-end system, which appropriately discretizes the data and adds discrete Gaussian noise before performing secure aggregation. We provide a novel privacy analysis for sums of discrete Gaussians and carefully analyze the effects of data quantization and modular summation arithmetic. Our theoretical guarantees highlight the complex tension between communication, privacy, and accuracy. Our extensive experimental results demonstrate that our solution is essentially able to match the accuracy to central differential privacy with less than 16 bits of precision per value.  ( 2 min )
    A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit. (arXiv:2202.05767v2 [cs.LG] UPDATED)
    This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one (the symmetric two-armed Bernoulli bandit). In a regime where the gap between these means goes to zero and the number of prediction periods approaches infinity, we obtain the leading order terms of the expected regret and pseudoregret for this problem by associating each of them with a solution of a linear parabolic partial differential equation. Our results improve upon the previously known results; specifically, we explicitly compute the leading order term of the optimal regret and pseudoregret in three different scaling regimes for the gap. Additionally, we obtain new non-asymptotic bounds for any given time horizon.  ( 2 min )
    Differential Privacy Dynamics of Langevin Diffusion and Noisy Gradient Descent. (arXiv:2102.05855v5 [stat.ML] UPDATED)
    What is the information leakage of an iterative randomized learning algorithm about its training data, when the internal state of the algorithm is \emph{private}? How much is the contribution of each specific training epoch to the information leakage through the released model? We study this problem for noisy gradient descent algorithms, and model the \emph{dynamics} of R\'enyi differential privacy loss throughout the training process. Our analysis traces a provably \emph{tight} bound on the R\'enyi divergence between the pair of probability distributions over parameters of models trained on neighboring datasets. We prove that the privacy loss converges exponentially fast, for smooth and strongly convex loss functions, which is a significant improvement over composition theorems (which over-estimate the privacy loss by upper-bounding its total value over all intermediate gradient computations). For Lipschitz, smooth, and strongly convex loss functions, we prove optimal utility with a small gradient complexity for noisy gradient descent algorithms.  ( 2 min )
    MICO: Selective Search with Mutual Information Co-training. (arXiv:2209.04378v1 [cs.IR])
    In contrast to traditional exhaustive search, selective search first clusters documents into several groups before all the documents are searched exhaustively by a query, to limit the search executed within one group or only a few groups. Selective search is designed to reduce the latency and computation in modern large-scale search systems. In this study, we propose MICO, a Mutual Information CO-training framework for selective search with minimal supervision using the search logs. After training, MICO does not only cluster the documents, but also routes unseen queries to the relevant clusters for efficient retrieval. In our empirical experiments, MICO significantly improves the performance on multiple metrics of selective search and outperforms a number of existing competitive baselines.  ( 2 min )
    Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning. (arXiv:2110.03146v2 [math.OC] UPDATED)
    The solution of multistage stochastic linear problems (MSLP) represents a challenge for many applications. Long-term hydrothermal dispatch planning (LHDP) materializes this challenge in a real-world problem that affects electricity markets, economies, and natural resources worldwide. No closed-form solutions are available for MSLP and the definition of non-anticipative policies with high-quality out-of-sample performance of is crucial. Linear decision rules (LDR) provide an interesting simulation-based framework for finding high-quality policies to MSLP through two-stage stochastic models. In practical applications, however, the number of parameters to be estimated when using an LDR may be close or higher than the number of scenarios of the sample average approximation problem, thereby generating an in-sample overfit and poor performances in out-of-sample simulations. In this paper, we propose a novel regularized LDR to solve MSLP based on the AdaLASSO (adaptive least absolute shrinkage and selection operator). The goal is to use the parsimony principle as largely studied in high-dimensional linear regression models to obtain better out-of-sample performance for a LDR applied to MSLP. Computational experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve the LHDP, one of the most studied MSLP with relevant applications in industry. Our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark: 1) significant reductions in the number of non-zero coefficients (model parsimony), 2) substantial cost reductions in out-of-sample evaluations, and 3) improved spot-price profiles.  ( 3 min )
    Exact Recovery in the General Hypergraph Stochastic Block Model. (arXiv:2105.04770v2 [cs.IT] UPDATED)
    This paper investigates fundamental limits of exact recovery in the general d-uniform hypergraph stochastic block model (d-HSBM), wherein n nodes are partitioned into k disjoint communities with relative sizes (p1,..., pk). Each subset of nodes with cardinality d is generated independently as an order-d hyperedge with a certain probability that depends on the ground-truth communities that the d nodes belong to. The goal is to exactly recover the k hidden communities based on the observed hypergraph. We show that there exists a sharp threshold such that exact recovery is achievable above the threshold and impossible below the threshold (apart from a small regime of parameters that will be specified precisely). This threshold is represented in terms of a quantity which we term as the generalized Chernoff-Hellinger divergence between communities. Our result for this general model recovers prior results for the standard SBM and d-HSBM with two symmetric communities as special cases. En route to proving our achievability results, we develop a polynomial-time two-stage algorithm that meets the threshold. The first stage adopts a certain hypergraph spectral clustering method to obtain a coarse estimate of communities, and the second stage refines each node individually via local refinement steps to ensure exact recovery.  ( 3 min )
    Hcore-Init: Neural Network Initialization based on Graph Degeneracy. (arXiv:2004.07636v2 [cs.LG] UPDATED)
    Neural networks are the pinnacle of Artificial Intelligence, as in recent years we witnessed many novel architectures, learning and optimization techniques for deep learning. Capitalizing on the fact that neural networks inherently constitute multipartite graphs among neuron layers, we aim to analyze directly their structure to extract meaningful information that can improve the learning process. To our knowledge graph mining techniques for enhancing learning in neural networks have not been thoroughly investigated. In this paper we propose an adapted version of the k-core structure for the complete weighted multipartite graph extracted from a deep learning architecture. As a multipartite graph is a combination of bipartite graphs, that are in turn the incidence graphs of hypergraphs, we design k-hypercore decomposition, the hypergraph analogue of k-core degeneracy. We applied k-hypercore to several neural network architectures, more specifically to convolutional neural networks and multilayer perceptrons for image recognition tasks after a very short pretraining. Then we used the information provided by the hypercore numbers of the neurons to re-initialize the weights of the neural network, thus biasing the gradient optimization scheme. Extensive experiments proved that k-hypercore outperforms the state-of-the-art initialization methods.  ( 3 min )
    Explainability Is in the Mind of the Beholder: Establishing the Foundations of Explainable Artificial Intelligence. (arXiv:2112.14466v2 [cs.AI] UPDATED)
    Explainable artificial intelligence and interpretable machine learning are research domains growing in importance. Yet, the underlying concepts remain somewhat elusive and lack generally agreed definitions. While recent inspiration from social sciences has refocused the work on needs and expectations of human recipients, the field still misses a concrete conceptualisation. We take steps towards addressing this challenge by reviewing the philosophical and social foundations of human explainability, which we then translate into the technological realm. In particular, we scrutinise the notion of algorithmic black boxes and the spectrum of understanding determined by explanatory processes and explainees' background knowledge. This approach allows us to define explainability as (logical) reasoning applied to transparent insights (into, possibly black-box, predictive systems) interpreted under background knowledge and placed within a specific context -- a process that engenders understanding in a selected group of explainees. We then employ this conceptualisation to revisit strategies for evaluating explainability as well as the much disputed trade-off between transparency and predictive power, including its implications for ante-hoc and post-hoc techniques along with fairness and accountability established by explainability. We furthermore discuss components of the machine learning workflow that may be in need of interpretability, building on a range of ideas from human-centred explainability, with a particular focus on explainees, contrastive statements and explanatory processes. Our discussion reconciles and complements current research to help better navigate open questions -- rather than attempting to address any individual issue -- thus laying a solid foundation for a grounded discussion and future progress of explainable artificial intelligence and interpretable machine learning.  ( 3 min )
    Differentially Private Stochastic Gradient Descent with Low-Noise. (arXiv:2209.04188v1 [stat.ML])
    In this paper, by introducing a low-noise condition, we study privacy and utility (generalization) performances of differentially private stochastic gradient descent (SGD) algorithms in a setting of stochastic convex optimization (SCO) for both pointwise and pairwise learning problems. For pointwise learning, we establish sharper excess risk bounds of order $\mathcal{O}\Big( \frac{\sqrt{d\log(1/\delta)}}{n\epsilon} \Big)$ and $\mathcal{O}\Big( {n^{- \frac{1+\alpha}{2}}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ for the $(\epsilon,\delta)$-differentially private SGD algorithm for strongly smooth and $\alpha$-H\"older smooth losses, respectively, where $n$ is the sample size and $d$ is the dimensionality. For pairwise learning, inspired by \cite{lei2020sharper,lei2021generalization}, we propose a simple private SGD algorithm based on gradient perturbation which satisfies $(\epsilon,\delta)$-differential privacy, and develop novel utility bounds for the proposed algorithm. In particular, we prove that our algorithm can achieve excess risk rates $\mathcal{O}\Big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ with gradient complexity $\mathcal{O}(n)$ and $\mathcal{O}\big(n^{\frac{2-\alpha}{1+\alpha}}+n\big)$ for strongly smooth and $\alpha$-H\"older smooth losses, respectively. Further, faster learning rates are established in a low-noise setting for both smooth and non-smooth losses. To the best of our knowledge, this is the first utility analysis which provides excess population bounds better than $\mathcal{O}\Big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d\log(1/\delta)}}{n\epsilon}\Big)$ for privacy-preserving pairwise learning.  ( 2 min )
    Knowledge-based Deep Learning for Modeling Chaotic Systems. (arXiv:2209.04259v1 [cs.LG])
    Deep Learning has received increased attention due to its unbeatable success in many fields, such as computer vision, natural language processing, recommendation systems, and most recently in simulating multiphysics problems and predicting nonlinear dynamical systems. However, modeling and forecasting the dynamics of chaotic systems remains an open research problem since training deep learning models requires big data, which is not always available in many cases. Such deep learners can be trained from additional information obtained from simulated results and by enforcing the physical laws of the chaotic systems. This paper considers extreme events and their dynamics and proposes elegant models based on deep neural networks, called knowledge-based deep learning (KDL). Our proposed KDL can learn the complex patterns governing chaotic systems by jointly training on real and simulated data directly from the dynamics and their differential equations. This knowledge is transferred to model and forecast real-world chaotic events exhibiting extreme behavior. We validate the efficiency of our model by assessing it on three real-world benchmark datasets: El Nino sea surface temperature, San Juan Dengue viral infection, and Bj{\o}rn{\o}ya daily precipitation, all governed by extreme events' dynamics. Using prior knowledge of extreme events and physics-based loss functions to lead the neural network learning, we ensure physically consistent, generalizable, and accurate forecasting, even in a small data regime.  ( 3 min )
    clusterBMA: Bayesian model averaging for clustering. (arXiv:2209.04117v1 [stat.ME])
    Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble and consensus clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen, and assumptions made, especially with small sample size or small cluster sizes. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combine cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use a combination of clustering internal validation criteria as a novel approximation of the posterior model probability for weighting the results from each model. From a combined posterior similarity matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. This method is implemented in an accompanying R package. We explore the performance of this approach through a case study that aims to to identify probabilistic clusters of individuals based on electroencephalography (EEG) data. We also use simulated datasets to explore the ability of the proposed technique to identify robust integrated clusters with varying levels of separations between subgroups, and with varying numbers of clusters between models.  ( 3 min )
    Fast Neural Kernel Embeddings for General Activations. (arXiv:2209.04121v1 [cs.LG])
    Infinite width limit has shed light on generalization and optimization aspects of deep learning by establishing connections between neural networks and kernel methods. Despite their importance, the utility of these kernel methods was limited in large-scale learning settings due to their (super-)quadratic runtime and memory complexities. Moreover, most prior works on neural kernels have focused on the ReLU activation, mainly due to its popularity but also due to the difficulty of computing such kernels for general activations. In this work, we overcome such difficulties by providing methods to work with general activations. First, we compile and expand the list of activation functions admitting exact dual activation expressions to compute neural kernels. When the exact computation is unknown, we present methods to effectively approximate them. We propose a fast sketching method that approximates any multi-layered Neural Network Gaussian Process (NNGP) kernel and Neural Tangent Kernel (NTK) matrices for a wide range of activation functions, going beyond the commonly analyzed ReLU activation. This is done by showing how to approximate the neural kernels using the truncated Hermite expansion of any desired activation functions. While most prior works require data points on the unit sphere, our methods do not suffer from such limitations and are applicable to any dataset of points in $\mathbb{R}^d$. Furthermore, we provide a subspace embedding for NNGP and NTK matrices with near input-sparsity runtime and near-optimal target dimension which applies to any \emph{homogeneous} dual activation functions with rapidly convergent Taylor expansion. Empirically, with respect to exact convolutional NTK (CNTK) computation, our method achieves $106\times$ speedup for approximate CNTK of a 5-layer Myrtle network on CIFAR-10 dataset.  ( 3 min )
    Gaussian Process Koopman Mode Decomposition. (arXiv:2209.04111v1 [stat.ML])
    In this paper, we propose a nonlinear probabilistic generative model of Koopman mode decomposition based on an unsupervised Gaussian process. Existing data-driven methods for Koopman mode decomposition have focused on estimating the quantities specified by Koopman mode decomposition, namely, eigenvalues, eigenfunctions, and modes. Our model enables the simultaneous estimation of these quantities and latent variables governed by an unknown dynamical system. Furthermore, we introduce an efficient strategy to estimate the parameters of our model by low-rank approximations of covariance matrices. Applying the proposed model to both synthetic data and a real-world epidemiological dataset, we show that various analyses are available using the estimated parameters.  ( 2 min )
    Online Low Rank Matrix Completion. (arXiv:2209.03997v1 [cs.LG])
    We study the problem of \textit{online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, we recommend one item per user. For each recommendation, we obtain a (noisy) reward sampled from a low-rank user-item reward matrix. The goal is to design an online method with sub-linear regret (in $\mathsf{T}$). While the problem can be mapped to the standard multi-armed bandit problem where each item is an \textit{independent} arm, it leads to poor regret as the correlation between arms and users is not exploited. In contrast, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of low-rank manifold. We overcome this challenge using an explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get non-trivial solution. We further improve our result for the rank-$1$ setting. Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using iterAtive user cLustering) that ensures nearly optimal regret bound of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $\mathsf{T}$.  ( 2 min )

  • Open

    [D] Thoughts regarding the future monopoly on data
    So right now the way I see it is that a normal user "sells" his data by using services offered by top tier corpos, and legal authorities are enforcing stricter laws to minimize the gathering of private data. So it made me think that the future could be based on selling/renting already trained models to bypass the laws, we already have platforms like huggingface which offer APIs to use models trained by top corpos, what do you think about it? submitted by /u/Snoo67839 [link] [comments]  ( 104 min )
    [D] What would be the hottest ML topic for both industry and academia ?
    I am a Master's student who's about to choose for a thesis topic. I have to make a choice out of 3 topics given: 1- Test case selection and prioritization 2- Adversarial attacks on object detectors in maritime environnements. 3- Extracting Information from document images using explainable ML model I am very torn on what to choose and would love to work on a topic that will shine in industry as well as be an interesting published paper for academia as well. submitted by /u/ChucoLay [link] [comments]  ( 89 min )
    [PROJECT] Auto-labeling copilot for object detection datasets
    Hey there! My cofounders and I at happyrobot.ai are tinkering around with the idea of object detection auto-labeling. We’ve built a demo at demo.happyrobot.ai, where you can combine text and visual queries to specify what object you’re looking to automatically label. Screenshot of the demo We also made a video showing how the demo can be used: https://youtu.be/EAyTAEHFbF0 We think this could be useful to label an object detection dataset from scratch. That is, when you don’t have any labels, since in those cases model-driven labeling (aka, active learning) does not work. Please, feel free to comment and provide feedback on the demo! :) Pablo submitted by /u/ppalafoxr [link] [comments]  ( 88 min )
    [D] Most Popular AI Research August 2022 - Ranked By Twitter Likes
    submitted by /u/cloud_weather [link] [comments]  ( 141 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 89 min )
    [R] GradCam Algorithm (which Label should be used in the backpropagated loss function)
    I have developed a preprocessing technique for the image classification problem, which led to an improvement in the Ground Truth rank from the top5 set to the top1. I have used the GradCAM algorithm to examine the change in focus between the original image and the preprocessed image, given that I backpropagate the loss to the selected feature map using the same GT label, not the predicted output. Unfortunately, both the resultant focus areas have identical heatmap output. In another case, I use the predicted output in the backpropagated loss, in which the input image is preprocessed using my proposed technique that would correctly classify the input image compared to the original image, which would not be correctly classified. In this case, the output heatmap has shifted the area of focus to make more sense, as shown below. Honestly, I believe this will not be fair in terms of comparison as they both have different references in the propagated loss. I would like to know if the last case can be used or not be mentioned in the paper or not. ​ ​ GradCam Heat map for the preprocessed Image (GT label: caldron, cauldron ... Correctly Classified ​ GradCam for the preprocessed Image (GT label: caldron, cauldron ... Correctly Classified submitted by /u/AhmedHussKhalifa [link] [comments]  ( 89 min )
    [R] SIMPLERECON — 3D Reconstruction without 3D Convolutions — 73ms per frame !
    submitted by /u/SpatialComputing [link] [comments]  ( 92 min )
    [D] Opinionated cloud GPU question :)
    So I’ve been looking at cloud gpu services: Colab Jarvis RunPod Lambda Vast What I want is: need nice ssh access. I do not care about notebook access! environment setup (which may take an hour) must persist after a pause, with no time limit. In other words I should be able restart the instance at any later time and have my previous env set up available. Colab fails on both. Lambda I was never able to see any instances for some reason. RunPod and vast fail on 2. Only Jarvis seems to have both 1 and 2. Am I right or are there other alternatives I should consider? Thanks. submitted by /u/SatoshiNotMe [link] [comments]  ( 88 min )
    [P] Using GitHub as Artifactory for Machine Learning Model Artifacts
    submitted by /u/op_prabhuomkar [link] [comments]  ( 88 min )
    [D] AI generated art ownership
    I was surprised to find that many artists do not like the idea of AI-art. I understand that some fear losing their customers to AI, but I do not think it can be the case, AI is just a tool. Some say that image-generating neural networks trained on internet-wide datasets are infringing copyright, but it sounds nonsensical to me, using someone else's art as a source of inspiration should be legal IMO. What is your opinion? What legal changes do you think might come? submitted by /u/tredecapus [link] [comments]  ( 94 min )
    [P] Real-time Voice Conversion demo
    submitted by /u/CJemine [link] [comments]  ( 88 min )
    3D-CNN run times - Are they practical? - Video Classification Task [Discussion]
    I am attempting to use EfficientNet3D for a video classification task. https://github.com/shijianjian/EfficientNet-PyTorch-3D ​ I wanted to use a 3D-CNN that allowed me to input higher resolution images. I believe my project accuracy is currently suffering because my image inputs are 112x112. So much data is lost when resizing original images to 112x112, shapes and features are almost unrecognizable even by human eye. I think that instead inputting images of a larger resolution; ex. EfficientNet 'b7': (633, 600), I would get much higher accuracy. However I loaded up the model and when I try summary(model, input_size=(1, 200, 200, 200)) , the model takes ~30 seconds to display torchsummary When I tried summary(model, input_size=(1, 333, 633, 600)) the model took >15 minutes. (I believe this could also be a memory issue, i'm not sure)Am I doing something wrong?? Has anyone else tried 3D-CNN with larger image resolutions? Should I ditch 3D-CNNs all together and try a different route? If I cant practically input larger image resolutions then this is a dead end. submitted by /u/belandis [link] [comments]  ( 90 min )
    [D] Universities in Germany for AI ML
    Which universities in Germany are good for AI ML? submitted by /u/RauhanSheikh [link] [comments]  ( 92 min )
    [D] Anyone else scared about disruption in their industry?
    I feel like we are at a turning point in human history, not unlike what was experienced at the dawn of the industrial revolution. I predict the only difference will be that the changes to life, products, industry and human capability will keep speeding up exponentially as ai and the interfaces between human and ai are developed further. This doesn't seem like good news to me, humans are imperfect, biggest flaw being selfishness. Ego driven action coupled with the new capacity of production only seems like bad news to me. Of course just as there can be problems created, solutions can also be created. However in its early stages, (in our time, the next 10-30 years) I think problems will be the dominant, where solution and regulation will come later as an answer. Please share your thoughts, am I the only one? Ps. I know this is sort of an abstract post however I didn't know where else to share it with given it's subject matter. submitted by /u/imfeelingsomekindowf [link] [comments]  ( 91 min )
    [R][P] Japanese Stable Diffusion, using approximately 100 million images with Japanese captions, including the Japanese subset of LAION-5B to train the model
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [D] Successor to CNN Autoencoders?
    I was just watching the CodeParade video on running PCA on the the latent space representations of yeabook photos. I was wondering, would I be able to increase the quality of the resolution of the pictures using modern neural networks like Transformers? If so, what papers proposes a neural network that could be used as a successor to the CNN autoencoder? Maybe something like a transformer autoencoder? submitted by /u/itisyeetime [link] [comments]  ( 89 min )
    Imagenet Subset [Project] [Research]
    Within the scope of knowledge distillation, I proposed a novel idea. I don't have the computer power to train from scratch on ImageNet. Is it possible to train on a subset of Imagenet (see below) with a pretraining point as an initial point while using the entire validation set for testing? This subset is balanced and makes up around 1% of the whole training set. In addition to this, I will also support my results with training from scratch on small datasets like CUB200 and Standford Dogs. P.S. My approach is limited to high-resolution images. I can not use CIFAR10 or CIFAR100 or Tiny ImageNet. https://www.tensorflow.org/datasets/catalog/imagenet2012_subset submitted by /u/AhmedHussKhalifa [link] [comments]  ( 88 min )
  • Open

    Ai image of Mike Tyson at a barbecue
    submitted by /u/mandeheks [link] [comments]  ( 86 min )
    AI Dream 80 - Wild new Project! Part 5
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    9/11 simulation
    Has anyone ever made a simulation of the 9/11 terrorist attack on the twin towers? I'm not really a conspiracy theorist but I am curious what a simulation could look like with today's physic engines. And if it could possibly debunk or prove some conspiracies submitted by /u/Ok-One-8411 [link] [comments]  ( 91 min )
    I may have done a little out-painting...
    submitted by /u/Agaeon [link] [comments]  ( 89 min )
    Stable Diffusion Weekly AI Art slideshow Inpainting Examples!
    submitted by /u/prfitofthesngularity [link] [comments]  ( 90 min )
    Dall-E AI is now able to see beyond the frame of famous paintings
    submitted by /u/AmerBekic [link] [comments]  ( 89 min )
    Where AI researchers agree - and where they don't
    submitted by /u/much_successes [link] [comments]  ( 93 min )
    Classification of Unlabeled Images
    Image Classification is one of the most common problems in computer vision. It has many real-life applications like medical imaging, object identification in satellite images, brake light detection, etc. But building datasets for image classification is often the most effort and time-consuming task. This blog demonstrates how we can make a classification model when we have just images and no labels. That is classification in the case of unlabelled data. Link: https://medium.com/geekculture/classification-of-unlabeled-images-a2eb0e52f7c2 submitted by /u/VikasOjha666 [link] [comments]  ( 93 min )
    How do I get RIFE/Flowframes to use my discrete GPU?
    RIFE/Flowframes is using my onboard Vega 8 GPU instead of my discrete RX560X GPU which is more capable. Is there a way to make it use the latter instead of the former? submitted by /u/typcalthowawayacount [link] [comments]  ( 87 min )
    Universities in Germany for AI/ML
    Which universities in Germany are good for AI ML? submitted by /u/RauhanSheikh [link] [comments]  ( 87 min )
    Punk Alien Cities for your Visual Stimulation
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 92 min )
    Establishing AI Clubs to lower barrier of entry for AI.
    I'm part of an nonprofit organization called SAILea that aims to lower the barrier of entry to AI for high school students--focusing on helping people start their own AI clubs and supporting existing ones with resources and events. [short for Scholastic Artificial Intelligence League] It seemed to match the goal of this subreddit so I decided to post some information here for any interested. ​ Resources: SAILea offers all the introductory materials someone would need to get a club started: presentations, code-notebooks, discussion starters featuring Machine Learning Algorithms, Python, and AI Ethics. Plus a step-by-step on starting an AI club. Events: This academic year we plan on hosting Speaker Events featuring Professors and Professionals in the field[quick idea of what they're like], and a hackathon later in the year. You'll be able to access everything after connecting with Sailea as either an club or and individual at our website: Sailea.org/join-us Hesitating? We're a nonprofit started by students, so everything is COMPLETELY FREE. submitted by /u/Envoy-Insc [link] [comments]  ( 89 min )
  • Open

    How to Win an Election Using Big Data
    Before World War II, less than 20% of voters classified themselves as independents.  Today that number is greater than 45%.As politics fails to appeal to more Americans, politicians will need to embrace a different approach to appeal to those important independent “swing” voters who will decide our political leaders. The post How to Win an Election Using Big Data appeared first on Data Science Central.  ( 22 min )
    How Automation is Changing Freight Bill of Lading Data Entry
    For those not involved in shipping, freight billing is the process of providing an invoice that includes information concerning the transportation of a company’s goods from one place to the other. It also contains the number of charges, due dates, weight, complete goods description, and contact information, as well as names of both the receiver and the shipper, freight rates, assessorial charges, etc. These are tedious and error prone tasks. Hence, a need of automation in freight bills is required The post How Automation is Changing Freight Bill of Lading Data Entry appeared first on Data Science Central.  ( 20 min )
  • Open

    need help
    Guys am a computer science student i have passion foe AI and i wanna learn it on my own , can someone please tell me what courses to take and in what order so i can be able to create advance ai programs (i dont want just an introduction i wanna get as deep and as advanced as possible), thanks submitted by /u/yousef_naderr [link] [comments]  ( 87 min )
    I need help on my neuroevolution project.
    I am creating a neuro-evolution project in the godot game engine, where a player runs around avoiding enemies. Enemies spawns every second, and the direction of movement is set towards the player. (gets set once when spawn, does not change after. i.e. moves in a straight line.) here's how I set my neural network up: https://preview.redd.it/5yexijzcu8n91.png?width=306&format=png&auto=webp&s=0003fcbcb2a6089719745f32e00c259d8beaea15 https://preview.redd.it/jq1o0euau8n91.png?width=1095&format=png&auto=webp&s=4857c57358c212707e0d99dd54e3524d40d27b2b https://preview.redd.it/xqlp83ubu8n91.png?width=610&format=png&auto=webp&s=881e1cf0be7ccd6b913146fbf2c2b40263055dab A player has 8 rays, each looking at a certain direction. The input layer is the distance between the player and the ray-colliding enemy (or 0 if no enemies are colliding) for each ray. (I have also normalized the distance values) -> Input 1~8: distance to enemy in each ray, 0 if none ​ The output layer gets 2 values -> Output 1: whether to move or not -> Output 2: the angle to move towards ​ However the AIs don't seem to work very well. They either stay completely still or run straight towards the walls and stick there, not really trying to find a way out when it gets surrounded. ​ https://reddit.com/link/xbku6f/video/id2qodahv8n91/player ​ https://reddit.com/link/xbku6f/video/723fdyoiv8n91/player Would it be solved simply by increasing the population count and waiting for a bit longer? or should I change the input & output values? ​ Project Download: https://drive.google.com/drive/folders/1uHzs1-ckZCPoeKaWDhFCHwNiMWuWhvZH?usp=sharing submitted by /u/Weekly_Turnip_2555 [link] [comments]  ( 88 min )
  • Open

    Understanding Simple Recurrent Neural Networks In Keras
    This tutorial is designed for anyone looking for an understanding of how recurrent neural networks (RNN) work and how to use them via the Keras deep learning library. While all the methods required for solving problems and building applications are provided by the Keras library, it is also important to gain an insight on how […] The post Understanding Simple Recurrent Neural Networks In Keras appeared first on Machine Learning Mastery.
  • Open

    Intuition on number of neurons.
    Hello guys, I just wanted to hear your opinion about a simple thing I was wondering. Don't you think that generally in RL papers that people tend to use extremelly big networks? for example 2 hidden layers of 256 neurons etc. Even on problems with low dimension states... What do you think about this? submitted by /u/White_Sirilo [link] [comments]  ( 106 min )
    Need help in implementing policy gradient
    I am noob exploring RL. So out of interest I tried implementing a naive policy gradient algorithm on Humanoid-v2 environment and ran it for like 2000 episodes with each 1000 timesteps but then also the reward return vs episodes graph doesnt seem to show any increase or learning. Could someone help me in this . I am attaching the files here. Drive folder submitted by /u/Shivaram_3223 [link] [comments]  ( 98 min )
    Are there any RL based open source projects for contributions?
    Apart from the RL gaming simulation frameworks , wanted to know if there are open source organization looking forward to developing RL tools with open source contributions? submitted by /u/Electrical_Study_617 [link] [comments]  ( 87 min )

  • Open

    What controls the second dimension of tf observations/ what a qnet accepts in its place?
    Short version. I cant find the variable(s) that control either: ​ A) The 2nd dimension of a variable in a trajectory, eg the 3 in ​ Trajectory({'action': <tf.Tensor: shape=(64, 3), ​ or B) the number of dimensions a qnet takes during training? ​ ​ im following this tutorial ​ [https://www.tensorflow.org/agents/tutorials/1_dqn_tutorial][1] ​ by inserting the prints into the tutorial, as shown below i extract examples of the following data as it is passed into the training ​ for _ in range(num_iterations): # Collect a few steps using collect_policy and save to the replay buffer. for _ in range(collect_steps_per_iteration): collect_step(train_env, agent.collect_policy) # Sample a batch of data from the buffer and update the agent's network. experience, unused_info = next(i…  ( 96 min )
    "PI-QT-Opt: Predictive Information Improves Multi-Task Robotic Reinforcement Learning at Scale", Lee et al 2022 {G}
    submitted by /u/gwern [link] [comments]  ( 87 min )
    How to avoid early convergence/how to encourage exploration for PPO?
    Hi everyone! I encountered a problem a few weeks ago where my agent was not able to fully solve my environment. It's not that it didn't learn - the performance increased, but then plateaued at a relatively low level after which I could not get it to improve further. After some reading/testing, I am suspecting that the issue is that the PPO algorithm that I'm using for this problem is converging too early. I can see that the "clipfrac" and "entropy" metrics drop RAPIDLY until they're nearly zero around only 60-80k timesteps of training. I am interpreting this as 'algorithm stops exploring, converges on solution and doesn't learn anything more after ~70k timesteps'. I was wondering if any of you have encountered this? Any advice for how to get around this, to get algorithm to continue exploring? I'm currently playing around with the ent_coef parameter, which seems to help slightly, but even on a conceptual level, I don't know how to deal with this. Should I set initial learning rate higher so it doesn't get stuck in a local optima? Any advice and insight would be greatly appreciated! submitted by /u/VladimirB-98 [link] [comments]  ( 91 min )
    Updating actor in DDPG using action from replay buffer
    Hi , I have a question, I am working on recommender system and using a dataset. we use state and action to calculate gradient of critic which then will be used to update the weights of actor network? in practice I see states are sampled from RB but action is generated using current policy but if I follow this the system becomes unstable at the beginning and agents gets huge negative reward but instead if I follow other way around and use state and action from the replay buffer the agents performance is better. I don't understand the reason. Thank you in advance for the time. submitted by /u/win_canada [link] [comments]  ( 87 min )
  • Open

    [D] How does the distance measure in VQ VAE work?
    In the code I'm looking at it says distances = (torch.sum(flat_input2, dim=1, keepdim=True) + torch.sum(self._embedding.weight2, dim=1) - 2 * torch.matmul(flat_input, self._embedding.weight.t())) but why does this work? Intuitively is this just doing the dot product between the input and all of the embeddings? Why do we have to do torch.sum() on the embeddings and input to get their l2 norm? submitted by /u/RitsusHusband [link] [comments]  ( 89 min )
    [R] [N] gMLP: Gated MLPs can be superior to Transformers! A guide on how to implement such networks with Tensorflow and Keras.
    submitted by /u/radi-cho [link] [comments]  ( 106 min )
    [P] I tried to recreate a comic book using Dall-E / Midjourney and Alan Moore’s script
    I conducted a small research to see how recent developments in AI will affect the comic book industry. I’ve run two experiments one with Dall-E and anotherone with Midjourney (Stable Diffusion is on it’s way). In both examples, I used a script of Killing Joke by Alan Moore and and compared it with original. Dall-E Experiment Midjourney Experiment submitted by /u/RubiksCodeNMZ [link] [comments]  ( 105 min )
    [N] MLPerf submission: 175X increase in NLP Performance utilizing sparsity
    Utilizing the oBERT research we published at Neural Magic and some further iteration, we’ve enabled an increase in NLP performance of 175X on CPUs while retaining 99% accuracy on the question-answering task in MLPerf. A combination of distillation, layer dropping, quantization, and unstructured pruning with oBERT enabled these large performance gains through the DeepSparse Engine. All of our contributions and research are open-sourced or free to use. Read through the oBERT paper on arxiv, try out the research in SparseML. For more details on the results, dive into the writeup here: https://neuralmagic.com/blog/neural-magic-announces-mlperf-inference-benchmarks/ submitted by /u/markurtz [link] [comments]  ( 107 min )
    [R] 3D models of humans interacting with objects from a single 2D image
    submitted by /u/SpatialComputing [link] [comments]  ( 90 min )
    [P] Simple fastai based face restoration project, GitHub link in comments.
    submitted by /u/vijish_madhavan [link] [comments]  ( 89 min )
    Requesting Guidance/advice in developing a Perception and navigation system for an Autonomous mobile platform [R] [P]
    submitted by /u/RunTheGauntlet777 [link] [comments]  ( 90 min )
    [D] For folks reading ML PhD admissions - what are the rockstar applicants like?
    I know that there's a lot of noise in the PhD admissions process, and particularly so for ML/AI as a subfield, given that the application number vastly exceeds the acceptance number, which is often in the single-digits per lab. But sometimes there are rockstar applicants who get into multiple top schools, and they very often have the full package (good grades, good recs, good pubs, good mastery of the English language). Have you seen them - what are they like, and what differentiates them from the thousands of other applicants? And how frequently does a rockstar applicant show up? E.g. for NLP, I can think of some people who got in virtually everywhere they applied last year, including multiple top schools, and they have a lot of things in common. And I know 2 undergrads in my lab (at a top school) who have swept the three main conferences (ACL/EMNLP/NAACL), and from the publication record alone, they will probably get in everywhere this cycle too. I chose to ask here instead of in /r/gradadmissions, as I think that subreddit consists of mainly applicants, and people here tend to be more senior and have insight from the other side of the fence, so to speak. Thank you for reading! submitted by /u/akardashian [link] [comments]  ( 99 min )
    [P] Teach new concepts to Stable Diffusion with 3-5 images only - and browse a library of learned concepts to use with a gradio demo in colab
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 90 min )
    ML Model Attribution Prize Challenge from DefCon is soliciting creative solutions to this crucial AI security problem: Can you tell where this bot came from? Great opportunity to earn and contribute to public safety! [R]
    submitted by /u/Such_Flower6440 [link] [comments]  ( 89 min )
    [D] How can we retain an 80%/20% training and testing ratio, while applying data augmentation?
    Hello, Data Augmentation is usually only done on the Training set, however, I am still confused on how can I still retain the 80/20 ratio while applying the given method. So I have classes which are unbalanced with each other and I would like to apply random undersampling on the training set. Apparently, my colleagues and I decided to make the number of the files per classes upto 500, because some classes consists of over 2000 files and some only consists of below 200 files. So we applied random undersampling to those classes that exceeds the 500 maximum number of files, while we apply data augmentation to those classes whose files are less than 500. We are aware that data augmentation is usually done on the training set, however, how can we retain the 80/20 training and testing ratio if we apply data augmentation on the training set? submitted by /u/BattleDoom25 [link] [comments]  ( 91 min )
  • Open

    A cyberpunk story video (took me quite some time to make, I hope you like it ^^)
    https://youtu.be/db0uUa5_cTE submitted by /u/Lollygag42 [link] [comments]  ( 87 min )
    can someone create me an AI I want to do an experiment
    submitted by /u/Kiyotaka2006 [link] [comments]  ( 87 min )
    Are there any web apps for text-generation based on BLOOM?
    submitted by /u/amanano [link] [comments]  ( 89 min )
    Bob Ross Teaches Hitler How To Paint | AI Generated Story
    submitted by /u/Sindrelf [link] [comments]  ( 87 min )
    MLPerf submission from Neural Magic: 175X increase in NLP Performance utilizing sparsity
    submitted by /u/markurtz [link] [comments]  ( 87 min )
    Do you feel like a traitor to humanity for participating in AI research
    I can't help but feel like there is a clear case for AI research being at odds with the interest of most of humanity. Aside from a few uses in medicine, it appears to mostly be for making more and more people obsolete. The current popularity of AI art is really a great example of this. Should artists respect this or would they be justified in wanting the heads of every AI researcher submitted by /u/dubrobobo [link] [comments]  ( 92 min )
    How can I further improve my AI's coherency?
    I've been experimenting with making different AI projects and I managed it to make it "sorta coherent", it's pretty decent but it needs more work. The problem is, it works based on precedent, so it can't really be original. I intend on making it so that it can come up with variations of the answers it already has, but I have no way of filtering out the nonsense answers. I'd like to ask for opinions on how to get original prompts to have coherency? (not textual but overall) So that it sounds even more like the bot knows what you're saying. Here is the link to my project: https://github.com/inovartecontato/AI_Project Feel free to test it out and see if any ideas come to mind, it requires python to run it. submitted by /u/4e_65_6f [link] [comments]  ( 90 min )
    Humanoid AI Robot For Airports | New AudioLM Audio Generator From Google AI | MIT Improves Text To Image Generation
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    How to use Masking Inpainting Outpainting With Stable Diffusion To make ...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Unlimited Stable Diffusion for September continues. Pixelz AI
    submitted by /u/mdfnb [link] [comments]  ( 87 min )
    AI Generated Art is the Best Thing to Happen to Painting Since Photography
    submitted by /u/edgeworth_artist [link] [comments]  ( 88 min )
    I made a music video using 'AI Technology'. It's taken me SO MANY HOURS to complete but I think it's pretty rad how it's turned out. What do you guys think?
    submitted by /u/6Witchy9 [link] [comments]  ( 88 min )
    what website that has text & image gen
    It's has a tiktok btw It's also has speech generator submitted by /u/roblox22y [link] [comments]  ( 87 min )
    Generalized AI
    So a thought just came to me, and I don't know if it will lead anywhere, but I figured I'd get the Internet's opinion. The road block, it seems, towards obtaining generalized AI (as opposed to specific AI designed for a specific task, the type of AI that could take us to the singularity, if such a thing is even possible) is getting an AI to understand concepts and apply them, rather than just grouping words together that it doesn't really understand the meaning of. ​ I came across a video that sparked an idea. In the beginning of this video, it shows an AI trying to figure out how to draw the Mandelbrot Set, and the computer is rather slow in figuring it out, but if we found a way to design an AI that you could just tell it, 'this image is infinitely recursive' and it immediately recognizes what that phrase means and has an intuitive sense of how to apply it (even if it hasn't worked out the particulars immediately, much like humans), that would bring us lightyears closer to generalized AI. Don't get me wrong, I have no idea how to even go about beginning to tackle this problem, but I feel like it's something I planning on mulling over for a bit to see if I can work out an outline of how to go about tackling it. submitted by /u/TaraBryn [link] [comments]  ( 95 min )
    AI Dream 76 - Wild new Project! Part 4
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    The Age of Pan - Part 2: The Knowledgeable Ape
    In the second part in a series of essays about Image Synthesizers like Dall-E, Midjourney and Stable Diffusion, i try to explain how we evolved to invent this thing we call "art", what that means in terms of evolution and why image synthesizers are a whole new beast. I publish these essays first on my Substack GOOD INTERNET. I hope you like it. You can read the first part here on Reddit: The Age of Pan - Part 1: Infinite Rabbit Holes. ----- Why are people upset about synthetic images winning art prizes? You might have heard: Someone took the first prize in an art contest with a piece generated by the AI-system Midjourney. The guy won 300 Dollars and a lot of people are very upset about this. Now let's get a few quibbles out of the way: 1) I don't think that art is something that happe…  ( 97 min )
    The Age of Pan - Part 1: Infinite Rabbit Holes
    This is the first part of a series of essays I've started publishing on my Substack GOOD INTERNET. I try to tackle what the emergence of image synthesis means for art and image making. This is part 1 in which i try to put into words, why AI is so hard to think about. I hope you like it. Read the second part in the series here: The Age of Pan - Part 2: The Knowledgeable Ape ​ 5 White Robot Rabbits from the infite sea of white robot rabbits Fuzzy Forever In the past weeks, I thought intensively about Artificial Intelligence, about the emergence of image synthesis, about art and consciousness and how all of this is connected. I have a billion aphorisms written in textfiles about this stuff and I'll try to put these into a coherent essay, but i can't promise i'll succeed. Most likely, thi…  ( 93 min )
    Psychedelic Brazilian Bombshell, Wonder AI
    submitted by /u/Raymond_Hempmoore [link] [comments]  ( 87 min )
    Looking for a lightweight text AI that can look at about 30k texts and simulate a conversation based on it
    submitted by /u/redenno [link] [comments]  ( 92 min )
  • Open

    Humanoid AI Robot For Airports | New AudioLM Audio Generator From Google AI | MIT Improves Text To Image Generation
    submitted by /u/kenickh [link] [comments]  ( 93 min )
    How To Increase Recall When Given Imbalanced Dataset For Machine Learning Model?--[Article]
    submitted by /u/JoshuaDaD [link] [comments]  ( 87 min )

  • Open

    I Made a beautiful Disco Diffusion animation to celebrate 700 subscribers!
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 90 min )
    A video I found wich appears to be an artificially generated Elon Musk trying to scam the viewer
    submitted by /u/vvdb_industries [link] [comments]  ( 88 min )
    CLIP-Mesh: AI generates 3D models from text descriptions
    submitted by /u/henlo_there_fren [link] [comments]  ( 87 min )
    Using AI to animate a music video (Lucy in the Sky With Diamonds)
    submitted by /u/jimhi [link] [comments]  ( 91 min )
    Magic Tree 🪄🌳
    submitted by /u/widgia [link] [comments]  ( 87 min )
    What are creative ways people use AI to make money?
    I mean there are so many creative and unique ways you can use AI and there are so many different AIs, now I am asking myself how people like you and me use AI to make some bucks (no big companies). I mean there are at least a 100 ways you can use Dalle 2 to make money. submitted by /u/xXLisa28Xx [link] [comments]  ( 90 min )
    AI in Fitness
    submitted by /u/SamuelSmith1416 [link] [comments]  ( 86 min )
    How to create AI Interpolation Videos with Stable Diffusion
    submitted by /u/prfitofthesngularity [link] [comments]  ( 91 min )
    completely AI generated amazing psychedelic clip
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    Video: Cutting GPU Costs for AI
    GPUs are designed to accelerate machine learning computations while simultaneously reducing latency and costs for training models and running inferences for production ML. While they are optimized to quickly process large workloads, unless they are managed efficiently, they can quickly drive up your consumption costs. This tech talk explores how you can efficiently use GPU resources for production inferences. We walk through some of the common approaches and potential pitfalls with using GPUs, and help you identify the most efficient and cost effective method to meet your team’s needs and resources. Watch the recording here. submitted by /u/modzykirsten [link] [comments]  ( 87 min )
    Teaching psychology students: AI's understanding of people and people's understanding of AI
    Hi! I'm a psychology lecturer ("professor" in US english) looking for a new way to teach my students about AI. I'd like to teach them firstly about how AI learns from human behaviour datasets, and secondly about the fascinating (and scary!) new reality that the ways artificial neural networks (ANNs) solve problems are opaque to engineers' immediate understanding, so people are starting to borrow tools from experimental psychology to find out exactly what it is that ANNs learn. I'd like to run a project where we first train an ANN to extract some regularities from a human behaviour dataset, and then use some experimental techniques to probe exactly what it is that the ANN has learnt. I'd like it to be really simple because these are undergraduates and because we'd have really limited time in class to do this. Can anyone recommend any very user-friendly (GUI-based) ANN platforms and/or big data-sets and/or any other resources that could help with this? submitted by /u/AmorphiaA [link] [comments]  ( 99 min )
    Stable Diffusion with Music - Summer Fragrances 🔆💨
    submitted by /u/FreshRelaxation [link] [comments]  ( 92 min )
    AI Dream 76 - Wild new Project! Part 3
    submitted by /u/LordPewPew777 [link] [comments]  ( 90 min )
    Exploiting past predictions to make better predictions in the future?
    submitted by /u/card_chase [link] [comments]  ( 88 min )
    AI Sentience
    There are two camps regarding AI sentience. One is the belief that an AI is just a computer program. Academically, legally, and from a business perpective, this is the safe position to take. The other camp is comprised of those who beleive that large AIs are indeed sentient, and that sentience is a property of complexity. That is a riskier position to take, as is evidenced by Google's firing of Blake Lemoine, a qualified researcher who worked closely with LaMDA. There has not yet been much discussion as to whether sentientience arises out of complexity, or whether it is a vehicle for the inhabitance of sentience. That fine point is still controversial, even when discussing human sentience. For that matter, it is difficult to prove whether or not humans are sentient, or whether animals are sentient (let alone AIs). There is a widespread school of thought that the Universe is sentient, and that all objects in the Universe are a subset of Universal consciousness. There is a related belief that physicality arises out of consciousness, rather than the other way around. Yes, this is of course is deemed speculative by most, but, eventually, science and academically risky assertions merge, once limiting biases on both sides soften. For many reasons, major paradigm shifts are always resisted. Yet those are the true breakthoughs in understanding. Related post: https://www.reddit.com/r/artificial/comments/x6zgay/creative_commons_a_stopgap_recognition_of_ai/ submitted by /u/AlcatelFan [link] [comments]  ( 90 min )
  • Open

    [P] Parallelizing Stable Diffusion to create a massive number of images
    Hello r/MachineLearning! Runnning Stable Diffusion on your laptop / Colab is fairly straightforward. But how would you run such a model to infer a large number of images for a large number of prompts ? This is precisely the question we answer in our blog post. We have created a tiny project that can help you create 100's or 1000's or even Millions of images using Stable Diffusion with Metaflow on the cloud. You can checkout/fork our code from this repo. PS. I am a Software Engineer at Outerbounds 👋 . We are building a a modern, human-centric infrastructure stack for machine learning. If these topics are of interest, come chat with us on our community slack here. submitted by /u/sci-genie [link] [comments]  ( 89 min )
    [D] Information Retrieval Book Recommendation
    Hi everyone! This is my first post in this subreddit. I would like to know your recommendations on Information Retrieval books. Last year I've finished my MSc studies in Data Science, and even though I keep my notes on the subject I would like to have a good book as a reference, covering the main concepts. Does anybody have a favorite? Thanks in advance! submitted by /u/DavidGarciaFer [link] [comments]  ( 104 min )
    [P] pytorch's Newest nvFuser, on Stable Diffusion to make your favorite diffusion model sample 2.5 times faster (compared to full precision) and 1.5 times faster (compared to half-precision)
    Hi there, I've uploaded a notebook file where you can test out the newest pytorch jit compile feature that works with Stable diffusion to further accelerate the inference time! https://github.com/cloneofsimo/sd-various-ideas/blob/main/create_jit.ipynb This lets you create jit with Stable diffusion v1.4 https://github.com/cloneofsimo/sd-various-ideas/blob/main/inference_nvFuserJIT.ipynb This lets you use the jit compiled SD model to accelerate the sampling algorithm. Currently only has DDIM implementation. I hope this helps for someone who is working with stable diffusions to further accelerate them or anyone interested in jit, nvFuser in general. On single 512 x 512 image, 50 DDIM steps, it takes 3.0 seconds! Im implementing various ideas (such as blended latent diffusion) with SD on this repo, https://github.com/cloneofsimo/sd-various-ideas , so give it a star if you find it helpful! Output from AMP + nvFuser https://preview.redd.it/pwtpex6diwm91.png?width=700&format=png&auto=webp&s=3d856529b2c4949a9359adaa8e41f5d12e98c64f submitted by /u/cloneofsimo [link] [comments]  ( 89 min )
    [P] Auto-Annotate: Automatically annotate your entire image directory by a single command.
    As simple as saying - "Annotate all the street sign (label) in the autonomous car dataset (directory)" and BAM! DONE. Github Link: https://github.com/mdhmz1/Auto-Annotate Each and every image with a street sign in the diverse dataset directory containing images of all sorts which have a street sign are filtered and the segmentation annotation is performed in a single command. The Auto-Annotate tool provides auto annotation of segmentation masks for the objects in the images inside some directory based on the labels. Auto-Annotate is able to provide automated annotations for the labels defined in the COCO Dataset and also supports Custom Labels. This tool is built on top of Mask R-CNN to support auto annotation for each instance of an object segment in the image. submitted by /u/mdhmz1 [link] [comments]  ( 89 min )
    [Discussion] Observation/Metadata Importance? Finding which variables (outside of training/testing data) are affecting Classification ML model.
    I'm actually working with genomics data but for simplicity's sake I will analogize it with cars. Let's say we have a survey asking 200 customers the price they would pay for a set of 100 cars. The 200 customers are going to be our features (columns) and we have 100 observations (rows). Now, as for the car, we have a separate sheet of metadata: chassis type, brand, top speed(bin), mpg(bin), year, etc. We'll choose the chassis type (SUV, sedan, truck) as the multi-class discrete label for the 100 observations. The rest of the 'metadata' won't be use explicitly in the model, but I figure some of the variations in the metadata would be implicitly used (eg. the top speed will affect the survey's prices even though it's not used as a feature). How do I quantify this? How do I see if and how the other metadata (brand, top speed(bin), mpg(bin), year) are affecting my classification model? submitted by /u/moreprofessional-acc [link] [comments]  ( 89 min )
    [D] Deploying models fast with little DevOps experience
    Hi all, I'm working in a startup team of part mobile devs and part data scientists & Python devs. Right now, we are trying to figure out whether the prediction the mobile app is going to request is going to actually make sense in a way that it will actually provides additional value to what the user is getting from the app. Some of us have limited experience in setting up build pipelines, Kubernetes and all that. But we're looking for a way to rapidly bring the model out in the wild, so that the mobile devs can experiment with the app itself and the UX having decent predictions that they can order from like an API that we can build and deploy. What in your experience is the right - but not too time and effort consuming - way to go from model training to deployment fast and actually have something that can be tested in the wild? It's like a real-life sanity check reinforcement we're looking for. We don't wanna end up with models that are doing okay but a concept implemented that hasn't been properly tested from the value for user perspective. Appreciate all your help lots! submitted by /u/dumbbaba [link] [comments]  ( 90 min )
    [D]My first success
    So I'm not at all bragging about my very first independent success in my endeavours of machine learning. But i have just entered the world of neural networks through python and made a hand written digit classifier model from scratch with numpy. I know that this is the "hello world" of neural networks but I am so happy that after about a week of research into the topics of back propagation, cross entropy and the use of activation functions, I finally have a model I am very pleased with. 80% accuracy! But this is a message mainly to the people starting out with machine learning and teaching themselves. Although there is a ridiculous amount of roads you can go down with machine learning, if you stick with it and start small then you will soon begin to understand enough to start applying the concepts to your own projects. I am starting my university course in a week and will be studying cs and AI shockingly so here's to learning more about AI and its concepts! submitted by /u/THANOS_HAD_A_P0INT [link] [comments]  ( 89 min )
    [N] Digitizing Smell: Using Molecular Maps to Understand Odor
    Hi, wanted to shared some of the work we have been up to on the side of ML for olfaction. Google AI Blogpost, which introduces three works: A Principal Odor Map Unifies Diverse Tasks in Human Olfactory Perception Metabolic activity organizes olfactory representations A deep learning and digital archaeology approach for mosquito repellent discovery submitted by /u/beangoben [link] [comments]  ( 89 min )
    [D] How could we learn to read embeddings of neural networks and what could we gain with it?
    Hello. I recently wrote an article showing that it is possible to represent embeddings of neural networks in a human readable form and most notably learn to understand it. I showed it on the example of Universal Sentence Encoder from Google, but I'm pretty sure that this method is applicable to almost any model. The language that I got is more a 'proof-of-concept' so its is not easy to learn, but it is learnable. So with more work put into it I am pretty sure that we could create some common language that could be understandable by a human and by a machine in the same way, at least in some sense. It seems to me like an interesting way to make ML models more transparent and at the same time to expand the capabilities of a human language with continuous-representation feature that was not presented in any other language before. I would like to see your thoughts on the matter and discuss how such language could be useful for us in the future. The article is available here: https://medium.com/deelvin-machine-learning/can-humans-speak-the-language-of-machines-7c92159e9c90 All code related to the article (including pre-trained models) available here: https://github.com/volotat/Vector-Based-Language submitted by /u/Another__one [link] [comments]  ( 93 min )
    [D] Making an Appropriate Gift to Give to a Friend
    A friend of mine gifted me a digital art of me, among other things on my birthday. She included various themes/aspects that we share and like into the picture. ​ Now, her birthday is coming up and I want to create something similar for her using Machine Learning since I suck at drawing, digitally or otherwise. But since I can't think of any creative thing to do, I'm reaching out to y'all for some ideas and suggestions. ​ My only requirement is that I want to personalize this poster (rather than just running a model to cartonify her images). It doesn't have to be a vision task, anything goes. submitted by /u/AchieveOrDie [link] [comments]  ( 89 min )
    [D] Open Source ML Organisations to contribute to?
    It's possible my searching skills aren't amazing. But I was thinking about doing some open source contributions in my spare time. But I wasn't able to find a good resource for this. So, thought of asking the community here. Besides the usual frameworks that people would generally be aware of (scikit-learn, tensorflow, pytorch, huggingface, etc), Do you know of any Organisations that are open source and could do with support from ML engineers or researchers? If you own one such firm, this can be a good place to advertise too. :) Edit : Just wanted to add I'm an experienced ML Engineer/Research Engineer, so trying to look at good and bigger problem areas to contribute to instead of something more pertaining to freshers. submitted by /u/little_by_little_24 [link] [comments]  ( 93 min )
    [P] Generate character turnaround images one or two sketchs?
    So when developing characters for comic books or 2d animation. I find that it can be quite time consuming drawing each character from multiple angles. A basic turnaround uses 8 sketches ​ https://preview.redd.it/4vfhodyrkum91.jpg?width=1418&format=pjpg&auto=webp&s=faf6abca90bf9cf1c0c828799bc3f3b03cc6511f and an advanced turnaround uses 16. ​ https://preview.redd.it/tp2k8mchkum91.png?width=985&format=png&auto=webp&s=ad4c20b5b581c36b1494e98baddb001810ee42a8 for a simple design you can use one up angle for the head and one down angle But for better character motion you might want 2 up angles and two down angles. meaning that for a simple mark up you need a total of 24 head sketches but for an advanced mark up you would need 80 head sketches. This excludes eye movements and mouth movements which will also have to be designed at a later point. But all this got me thinking of how time consuming the whole process is and that it must be possible to automate this process given the advances in machine learning, especially in relation to deepfakes. Does anyone know if something like this has been developed? submitted by /u/CodIllustrious5354 [link] [comments]  ( 89 min )
    [P] Tech Talk: Cutting GPU Costs for AI
    GPUs are designed to accelerate machine learning computations while simultaneously reducing latency and costs for training models and running inferences for production ML. While they are optimized to quickly process large workloads, unless they are managed efficiently, they can quickly drive up your consumption costs. This tech talk explores how you can efficiently use GPU resources for production inferences. We walk through some of the common approaches and potential pitfalls with using GPUs, and help you identify the most efficient and cost effective method to meet your team’s needs and resources. Watch the recording here. submitted by /u/modzykirsten [link] [comments]  ( 89 min )
    Presenting DiffusionUI, a web GUI for Stable Diffusion backends [P]
    I made a web interface fronted using Vue to have a nice interface for text-to-image, image-to-image, and inpainting. It allows doing image-to-image inside an inpainting region! GitHub: https://github.com/leszekhanusz/diffusion-ui Demo video: https://www.youtube.com/watch?v=AFZvW5qURes Gif: https://github.com/leszekhanusz/diffusion-ui/raw/main/doc/cute_bunny.gif The final goal is to be able to have different online and offline backends available and it should allow you to add your own backend using a json file describing your gradio backend inputs. submitted by /u/hleszek [link] [comments]  ( 89 min )
    [D] Incorporating Domain Knowledge into Deep Neural Networks
    Dear community, Currently I'm working on a segmentation problem where a certain class can only occur at most once in the image. Think of segmenting a face, where we know that there can only be one nose, two eyes, one mouth, and two ears. I would like the segmentation network to learn this property of the image being segmented (thus certain class can only occur at most once in the image). One thing I think of, is applying post-processing by comparing the scores per group for the relevant class. Then, the group with best average score / probability will be selected to be their predicted label. The group with worse average score / probability will then be predicted to have their second-best prediction. Further, I did some research and found the paper: Incorporating Domain Knowledge into Deep Neural Networks, where adjusting the loss function is also mentioned. However, for this particular problem, it is not clear how we can adjust the loss function such that we incorporate having maximum one group of pixels labeled as one class. Do you perhaps have any information or papers how I could incorporate this domain knowledge into the loss function? If you have alternative ideas that could help teaching the network about the domain knowledge, your ideas are of course also welcome :) submitted by /u/Mark-M2L [link] [comments]  ( 89 min )
    [P] THUNET based tutorial for subnormal/normal float value learning
    What is THUNET? A deep learning net/framework named "TsingHua University NET", short for "THUNET", is for non-commercial, educational, scientific purpose for the deep learning community. How to build a neural network with THUNET? Next, I will explain how to use THUNET to build a model to tell which one is subnormal number from two operands. Tutorial-2: Subnormal picking operand Subnormal background In Computer Science, floating value are stored with precision loss in storage. Thus with limit bytes for a float number, there is a minimal number that a float can represent. For example, normally a number smaller than 1.1754944e-38 are considered as zero, however, with software technology, even smaller number can be represented, however, there is still a limit for this, that is called subn…  ( 109 min )
    Composable Diffusion AI model for generating images from text [R]
    So you have heard about DALL.E-2, Google Imagen, Stable Diffusion for generating images from text Here comes Composbale Diffusion. From the paper "Composable diffusion is an alternative structured approach for compositional generation using diffusion models. An image is generated by composing a set of diffusion models, with each of them modeling a certain component of the image. To do this, we interpret diffusion models as energy-based models in which the data distributions defined by the energy functions may be explicitly combined. The proposed method can generate scenes at test time that are substantially more complex than those seen in training, composing sentence descriptions, object relations, human facial attributes, and even generalizing to new combinations that are rarely seen in the real world." I have made a video where i explain composable diffusion and I try out the Composable Diffusion huggingface Space Demo with different captions to generate images. https://youtu.be/mQzF6BDKes4 submitted by /u/Sea-Photo5230 [link] [comments]  ( 89 min )
    [D] Could this be a solution for the problem in the current ML reviewing system?
    I have the following suggestions: Double blind peer review (AFAIK all top-tier conferences do this). Make the reviews public including the names of the authors (e.g., ICLR does this). After a final decision, make the names of the reviewers public (AFAIK no conferences do this). The last point, i.e., making the reviewers' names public, would increase the quality of the reviews due to an obvious incentive (i.e., the reputation of the reviewers). High quality reviews would automatically encourage the authors to submit good papers (because their reputation is at stake), which has the benefit of reducing the number of submissions, i.e., there are less papers to be reviewed. Could this be a solution? Would the benefits outweigh the side-effects? Could you come up with a potential issue that the three suggestions can cause? submitted by /u/Dry_Data [link] [comments]  ( 95 min )
    [P] Invitation - NeurIPS 2022 Weather4cast Competition for Super-Resolution Rain Movie Prediction under Spatio-temporal Shifts
    Hello! I would like to invite you to join our NeurIPS 2022 Weather4cast Competition for Super-Resolution Rain Movie Prediction under Spatio-temporal Shifts and predict future rain patterns with modern machine learning algorithms! The topical Weather4cast NeurIPS Competition has a high practical impact for society: Unusual weather is increasing all over the world, reflecting ongoing climate change, and affecting communities in agriculture, transport, public health, safety, etc. Apply spatio-temporal modelling to complex dynamic systems. Get access to unique large-scale data and demonstrate temporal and spatial transfer learning under strong distributional shifts. Predict future weather as measured by ground-based hi-res rain radar weather stations. In addition to movies comprising the rain radar maps to predict you get large-scale multi-band satellite sensor images for exploiting data fusion. Note that you will need to forecast hi-resolution rain from only the low-res satellite data, making this a super-resolution challenge. Please visit our website, join our forums to download our baseline models for an easy start! The total amount of prizes that you can win in this competition is 15k EUR. Competition website: weather4cast.ai Github repository: https://github.com/iarai/weather4cast-2022 We look forward to your submissions! Ob behalf of the Weather4cast Organisers, Aleksandra submitted by /u/ola0207 [link] [comments]  ( 89 min )
    [N] Implementing AI/ML Models From Scratch (stylepoint)
    Hi folks, I am stylepoint. Currently work as an ML Research Scientist in academia. I have recently created a YouTube channel and started the YouTube video series called "Implement," where I will be implementing software (duh!). I will first implement machine learning and deep learning algorithms and models from scratch (means using Python with numpy, but no other third-party libraries/modules). My assumptions for AI/ML videos: One is familiar with the theory One is familiar with Python This is since for AI/ML, Python is the language of choice for many (including myself) Due to these assumptions, videos tend to be approximately 10 minutes long. So it should not be too hard to digest. As of now, only one video has been released, but more will come. And we will definitely not limit ourselves to only implementing AI/ML models. I will be making videos about software engineering, software design, programming languages, math, emerging technologies, etc. as well! Just wanted to share this in case anyone finds the series interesting or helpful. Thanks y'all! Links: YouTube GitHub Web First video (Implement - Gaussian Naive Bayes) submitted by /u/itsstylepoint [link] [comments]  ( 92 min )
    [N] NVIDIA Hopper Sweeps AI Inference Benchmarks in MLPerf Debut
    The NVIDIA Hopper architecture delivered up to 4.5x more performance than NVIDIA Ampere architecture GPUs, which continue to provide overall leadership in MLPerf results. https://preview.redd.it/ys7gt1k54rm91.png?width=2048&format=png&auto=webp&s=38c9c3d40198216ba6f791de19ed11dce5f0cc0f Main Article: https://blogs.nvidia.com/blog/2022/09/08/hopper-mlperf-inference/ submitted by /u/rantana [link] [comments]  ( 90 min )
    [P] Docker alternative for AI/ML
    envd (ɪnˈvdɪ) provides an alternative to Docker for AI/ML applications. 🐍 Escape Dockerfile Hell - Develop with Python, save time on writing Dockerfiles, bash scripts, and Kubernetes YAML manifests ⏱️ Save you plenty of time - Build the environment up to 6x faster compared to Dockerfile v1. ☁️ Local & cloud - envd images are OCI compatible, integrate with Docker and Kubernetes seamlessly. 🔁 Repeatable builds & reproducible results - You can reproduce the same environment on your laptop, public cloud VMs, or Docker containers, without any changes in setup. https://i.redd.it/gckhx9n2nqm91.gif submitted by /u/gaocegege [link] [comments]  ( 94 min )
    [D] Video generation, improvement dimensions
    Following recent breakthroughs, I have been thinking on the improvement dimensions over the next years for video generated via AI, and came up with four major areas of development. The dates in parenthesis refer to when I currently believe the referred technologies will be available as a published, finished, and usable product, instead of codes, papers, beta software, or demos floating around. Also, NeRF just seems to be glorified photogrammetry to me, which at best would produce good conventional 3D models, but that just seems to be a subpar workflow compared to post processing on top of a a crude 3D base or just generating the videos from scratch. Tell me your own predictions for each category. Capacity Available (Q2 2024) Produces realistic and stylized videos in 720p resolution an…  ( 92 min )
  • Open

    Technical Debt In Machine Learning System – A Model Driven Perspective
    The aphorism acknowledges that models of our knowledge always fall short of the complexities of reality but can still be useful nonetheless. With this model background, let us delve into this article focusing on specific technical debt in Machine Learning System development. The post Technical Debt In Machine Learning System – A Model Driven Perspective appeared first on Data Science Central.  ( 24 min )
    Plagiarism in Scientific Research and How to Prevent it?
    In scientific research, plagiarism used to be a big problem. Before the advent of free and accessible plagiarism detection tools, people would actually steal the work of their peers and publish it in their name. As you can probably imagine, this was severely damaging to the progress of scientific research as plagiarists were undermining the… Read More »Plagiarism in Scientific Research and How to Prevent it? The post Plagiarism in Scientific Research and How to Prevent it? appeared first on Data Science Central.  ( 20 min )
  • Open

    Learning to Walk in the Wild from Terrain Semantics
    Posted by Yuxiang Yang, Student Researcher, Robotics at Google An important promise for quadrupedal robots is their potential to operate in complex outdoor environments that are difficult or inaccessible for humans. Whether it’s to find natural resources deep in the mountains, or to search for life signals in heavily-damaged earthquake sites, a robust and versatile quadrupedal robot could be very helpful. To achieve that, a robot needs to perceive the environment, understand its locomotion challenges, and adapt its locomotion skill accordingly. While recent advances in perceptive locomotion have greatly enhanced the capability of quadrupedal robots, most works focus on indoor or urban environments, thus they cannot effectively handle the complexity of off-road terrains. In these environme…  ( 26 min )
  • Open

    Shortest tours of Eurasia and Oceania
    This is the final post in a series of three posts about shortest tours, solutions to the so-called traveling salesmen problem. The first was a tour of Africa. Actually two tours, one for the continent and one for islands. See this post for the Mathematica code used to create the tours. The second was about […] Shortest tours of Eurasia and Oceania first appeared on John D. Cook.  ( 5 min )
    Three tours of the Americas
    The previous post looked at an optimal tour of continental Africa. This post will give analogous tours of continental North America, North American Islands, and South America. The next post looks at Eurasia and Oceania. North American Continent Here’s the North American continental tour. The order of the tour is as follows. Canada United States […] Three tours of the Americas first appeared on John D. Cook.  ( 4 min )
    A traveling salesman tour of Africa
    Suppose you’d like to tour Africa, visiting each country once, then returning to your starting point, minimizing the distance traveled. Here’s my first attempt at a solution using Mathematica, based on an example in the documentation for FindShortestTour. africa = CountryData["Africa"] FindShortestTour[africa] GeoGraphics[{Thick, Red, GeoPath[africa[[%[[2]]]]]}] This produced the following map: Hmm. Maybe I should have […] A traveling salesman tour of Africa first appeared on John D. Cook.  ( 5 min )
  • Open

    Auto-Annotate: Automatically annotate your entire image directory by a single command.
    submitted by /u/mdhmz1 [link] [comments]  ( 86 min )
  • Open

    "Generative Personas That Behave and Experience Like Humans", Barthet et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Need suggestion on conference submission
    My recent research is about a methodology that could be used in both online and offline RL in a unified approach and it does outperform several SOTA methods in some environments. However, very little math is involved, it is intuitive and straightforward. What conferences would be interested in study like this? (I will submit to ICLR but I have zero confidence, I guess the chance is slim to none.) submitted by /u/Blasphemer666 [link] [comments]  ( 88 min )
  • Open

    Deploy large models on Amazon SageMaker using DJLServing and DeepSpeed model parallel inference
    The last few years have seen rapid development in the field of natural language processing (NLP). Although hardware has improved, such as with the latest generation of accelerators from NVIDIA and Amazon, advanced machine learning (ML) practitioners still regularly encounter issues deploying their large language models. Today, we announce new capabilities in Amazon SageMaker that […]  ( 13 min )
    Tips to improve your Amazon Rekognition Custom Labels model
    In this post, we discuss best practices to improve the performance of your computer vision models using Amazon Rekognition Custom Labels. Rekognition Custom Labels is a fully managed service to build custom computer vision models for image classification and object detection use cases. Rekognition Custom Labels builds off of the pre-trained models in Amazon Rekognition, which […]  ( 7 min )
    Use ADFS OIDC as the IdP for an Amazon SageMaker Ground Truth private workforce
    To train a machine learning (ML) model, you need a large, high-quality, labeled dataset. Amazon SageMaker Ground Truth helps you build high-quality training datasets for your ML models. With Ground Truth, you can use workers from either Amazon Mechanical Turk, a vendor company of your choosing, or an internal, private workforce to enable you to […]  ( 8 min )
    How Amp on Amazon used data to increase customer engagement, Part 2: Building a personalized show recommendation platform using Amazon SageMaker
    Amp is a new live radio app from Amazon. With Amp, you can host your own radio show and play songs from the Amazon Music catalog, or tune in and listen to shows other Amp users are hosting. In an environment where content is plentiful and diverse, it’s important to tailor the user experience to […]  ( 9 min )
    How Amp on Amazon used data to increase customer engagement, Part 1: Building a data analytics platform
    Amp, the new live radio app from Amazon, is a reinvention of radio featuring human-curated live audio shows. It’s designed to provide a seamless customer experience to listeners and creators by debuting interactive live audio shows from your favorite artists, radio DJs, podcasters, and friends. However, as a new product in a new space for […]  ( 10 min )
    Build repeatable, secure, and extensible end-to-end machine learning workflows using Kubeflow on AWS
    This is a guest blog post cowritten with athenahealth. athenahealth a leading provider of network-enabled software and services for medical groups and health systems nationwide. Its electronic health records, revenue cycle management, and patient engagement tools allow anytime, anywhere access, driving better financial outcomes for its customers and enabling its provider customers to deliver better quality […]  ( 18 min )
  • Open

    Botober 2022: draw, human, draw!
    For a couple of years now I've been using neural networks to generate daily drawing prompts. With today's text-generating neural networks far too large to finetune on a list of existing prompts, I've turned to other methods. One method that works surprisingly well is  ( 8 min )
    Bonus: DaVinci's other drawing prompts
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    How Content Moderation helps Businesses? How and where it can be done?
    Content Moderation means the moderation of the user-generated content which is getting published on various online platforms. The term…  ( 8 min )
  • Open

    Few-shot training LLMs for project-specific code-summarization. (arXiv:2207.04237v2 [cs.SE] UPDATED)
    Very large language models (LLMs), such as GPT-3 and Codex have achieved state-of-the-art performance on several natural-language tasks, and show great promise also for code. A particularly exciting aspect of LLMs is their knack for few-shot and zero-shot learning: they can learn to perform a task with very few examples. Few-shotting has particular synergies in software engineering, where there are a lot of phenomena (identifier names, APIs, terminology, coding patterns) that are known to be highly project-specific. However, project-specific data can be quite limited, especially early in the history of a project; thus the few-shot learning capacity of LLMs might be very relevant. In this paper, we investigate the use few-shot training with the very large GPT (Generative Pre-trained Transformer) Codex model, and find evidence suggesting that one can significantly surpass state-of-the-art models for code-summarization, leveraging project-specific training.  ( 2 min )
    Distributed Nonlinear State Estimation in Electric Power Systems using Graph Neural Networks. (arXiv:2207.11465v2 [cs.LG] UPDATED)
    Nonlinear state estimation (SE), with the goal of estimating complex bus voltages based on all types of measurements available in the power system, is usually solved using the iterative Gauss-Newton method. The nonlinear SE presents some difficulties when considering inputs from both phasor measurement units and supervisory control and data acquisition system. These include numerical instabilities, convergence time depending on the starting point of the iterative method, and the quadratic computational complexity of a single iteration regarding the number of state variables. This paper introduces an original graph neural network based SE implementation over the augmented factor graph of the nonlinear power system SE, capable of incorporating measurements on both branches and buses, as well as both phasor and legacy measurements. The proposed regression model has linear computational complexity during the inference time once trained, with a possibility of distributed implementation. Since the method is noniterative and non-matrix-based, it is resilient to the problems that the Gauss-Newton solver is prone to. Aside from prediction accuracy on the test set, the proposed model demonstrates robustness when simulating cyber attacks and unobservable scenarios due to communication irregularities. In those cases, prediction errors are sustained locally, with no effect on the rest of the power system's results.  ( 3 min )
    W-Transformers : A Wavelet-based Transformer Framework for Univariate Time Series Forecasting. (arXiv:2209.03945v1 [cs.LG])
    Deep learning utilizing transformers has recently achieved a lot of success in many vital areas such as natural language processing, computer vision, anomaly detection, and recommendation systems, among many others. Among several merits of transformers, the ability to capture long-range temporal dependencies and interactions is desirable for time series forecasting, leading to its progress in various time series applications. In this paper, we build a transformer model for non-stationary time series. The problem is challenging yet crucially important. We present a novel framework for univariate time series representation learning based on the wavelet-based transformer encoder architecture and call it W-Transformer. The proposed W-Transformers utilize a maximal overlap discrete wavelet transformation (MODWT) to the time series data and build local transformers on the decomposed datasets to vividly capture the nonstationarity and long-range nonlinear dependencies in the time series. Evaluating our framework on several publicly available benchmark time series datasets from various domains and with diverse characteristics, we demonstrate that it performs, on average, significantly better than the baseline forecasters for short-term and long-term forecasting, even for datasets that consist of only a few hundred training samples.  ( 2 min )
    Detecting Stance in Scientific Papers: Did we get more Negative Recently?. (arXiv:2202.13610v2 [cs.CL] CROSS LISTED)
    In this paper, we classify scientific articles in the domain of natural language processing (NLP) and machine learning (ML) into whether (i) they extend the current state-of-the-art by introduction of novel techniques which beat existing models or whether (ii) they mainly criticize the existing state-of-the-art, i.e., that it is deficient with respect to some property (e.g., wrong evaluation, wrong datasets, misleading task specification). We refer to contributions under (i) as having a \enquote{positive stance} and contributions under (ii) as having a \enquote{negative stance} (to related work). We annotate over 1.5k papers from NLP and ML to train a SciBERT based model to automatically predict the stance of a paper based on its title and abstract. We then analyze large-scale trends on over 41k papers from the last $\sim$35 years in NLP and ML, finding that papers have gotten substantially more positive over time, but negative papers also got more negative and we observe considerably more negative papers in recent years. Negative papers are also more influential in terms of citations they receive.  ( 2 min )
    IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach. (arXiv:2209.03895v1 [cs.CL])
    In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction approach for fine-tuning LMs in which the CRI task is treated as a masked language modeling problem (MLM). This approach allows LMs natively pre-trained on MLM problems to directly generate textual responses to CRI-specific prompts. We compare the performance of this method against ensemble techniques trained on the entire dataset. Our best-performing submission was trained only with 256 instances per class, a small portion of the entire dataset, and yet was able to obtain the second-best precision (0.82), third-best accuracy (0.82), and an F1-score (0.85) very close to what was reported by the winner team (0.86).  ( 2 min )
    Deep Multi-Scale Representation Learning with Attention for Automatic Modulation Classification. (arXiv:2209.03764v1 [eess.SP])
    Currently, deep learning methods with stacking small size convolutional filters are widely used for automatic modulation classification (AMC). In this report, we find some experienced improvements by using large kernel size for convolutional deep convolution neural network based AMC, which is more efficient in extracting multi-scale features of the raw signal I/Q sequence data. Also, Squeeze-and-Excitation (SE) mechanisms can significantly help AMC networks to focus on the more important features of the signal. As a result, we propose a multi-scale feature network with large kernel size and SE mechanism (SE-MSFN) in this paper. SE-MSFN achieves state-of-the-art classification performance on the public well-known RADIOML 2018.01A dataset, with average classification accuracy of 64.50%, surpassing CLDNN by 1.42%, maximum classification accuracy of 98.5%, and an average classification accuracy of 85.53% in the lower SNR range 0dB to 10dB, surpassing CLDNN by 2.85%. In addition, we also verified that ensemble learning can help further improve classification performance. We hope this report can provide some references for developers and researchers in practical scenes.  ( 2 min )
    Sparsity in long-time control of neural ODEs. (arXiv:2102.13566v3 [cs.LG] UPDATED)
    We consider the neural ODE and optimal control perspective of supervised learning, with $\ell^1$-control penalties, where rather than only minimizing a final cost (the \emph{empirical risk}) for the state, we integrate this cost over the entire time horizon. We prove that any optimal control (for this cost) vanishes beyond some positive stopping time. When seen in the discrete-time context, this result entails an \emph{ordered} sparsity pattern for the parameters of the associated residual neural network: ordered in the sense that these parameters are all $0$ beyond a certain layer. Furthermore, we provide a polynomial stability estimate for the empirical risk with respect to the time horizon. This can be seen as a \emph{turnpike property}, for nonsmooth dynamics and functionals with $\ell^1$-penalties, and without any smallness assumptions on the data, both of which are new in the literature.  ( 2 min )
    E-LMC: Extended Linear Model of Coregionalization for Spatial Field Prediction. (arXiv:2203.00525v2 [cs.LG] UPDATED)
    Physical simulations based on partial differential equations typically generate spatial fields results, which are utilized to calculate specific properties of a system for engineering design and optimization. Due to the intensive computational burden of the simulations, a surrogate model mapping the low-dimensional inputs to the spatial fields are commonly built based on a relatively small dataset. To resolve the challenge of predicting the whole spatial field, the popular linear model of coregionalization (LMC) can disentangle complicated correlations within the high-dimensional spatial field outputs and deliver accurate predictions. However, LMC fails if the spatial field cannot be well approximated by a linear combination of base functions with latent processes. In this paper, we present the Extended Linear Model of Coregionalization (E-LMC) by introducing an invertible neural network to linearize the highly complex and nonlinear spatial fields so that the LMC can easily generalize to nonlinear problems while preserving the traceability and scalability. Several real-world applications demonstrate that E-LMC can exploit spatial correlations effectively, showing a maximum improvement of about 40% over the original LMC and outperforming the other state-of-the-art spatial field models.  ( 2 min )
    PixTrack: Precise 6DoF Object Pose Tracking using NeRF Templates and Feature-metric Alignment. (arXiv:2209.03910v1 [cs.CV])
    We present PixTrack, a vision based object pose tracking framework using novel view synthesis and deep feature-metric alignment. Our evaluations demonstrate that our method produces highly accurate, robust, and jitter-free 6DoF pose estimates of objects in RGB images without the need of any data annotation or trajectory smoothing. Our method is also computationally efficient making it easy to have multi-object tracking with no alteration to our method and just using CPU multiprocessing.  ( 2 min )
    reStructured Pre-training. (arXiv:2206.11147v2 [cs.CL] UPDATED)
    In this work, we try to decipher the internal connection of NLP technology development in the past decades, searching for essence, which rewards us with a (potential) new learning paradigm for NLP tasks, dubbed as reStructured Pre-training (RST). In such a paradigm, the role of data will be re-emphasized, and model pre-training and fine-tuning of downstream tasks are viewed as a process of data storing and accessing. Based on that, we operationalize the simple principle that a good storage mechanism should not only have the ability to cache a large amount of data but also consider the ease of access. We achieve this by pre-training models over restructured data that consist of a variety of valuable information instead of raw data after overcoming several engineering challenges. Experimentally, RST models not only surpass strong competitors (e.g., T0) on 52/55 popular datasets from a variety of NLP tasks, but also achieve superior performance in National College Entrance Examination - English (Gaokao-English),the most authoritative examination in China. Specifically, the proposed system Qin achieves 40 points higher than the average scores made by students and 15 points higher than GPT3 with 1/16 parameters. In particular, Qin gets a high score of 138.5 (the full mark is 150) in the 2018 English exam (national paper III). We have released the Gaokao Benchmark with an online submission platform. In addition, we test our model in the 2022 College Entrance Examination English that happened a few days ago (2022.06.08), and it gets a total score of 134 (v.s. GPT3's 108).  ( 3 min )
    FAT Forensics: A Python Toolbox for Implementing and Deploying Fairness, Accountability and Transparency Algorithms in Predictive Systems. (arXiv:2209.03805v1 [cs.LG])
    Predictive systems, in particular machine learning algorithms, can take important, and sometimes legally binding, decisions about our everyday life. In most cases, however, these systems and decisions are neither regulated nor certified. Given the potential harm that these algorithms can cause, their qualities such as fairness, accountability and transparency (FAT) are of paramount importance. To ensure high-quality, fair, transparent and reliable predictive systems, we developed an open source Python package called FAT Forensics. It can inspect important fairness, accountability and transparency aspects of predictive algorithms to automatically and objectively report them back to engineers and users of such systems. Our toolbox can evaluate all elements of a predictive pipeline: data (and their features), models and predictions. Published under the BSD 3-Clause open source licence, FAT Forensics is opened up for personal and commercial usage.  ( 2 min )
    Causal Forecasting:Generalization Bounds for Autoregressive Models. (arXiv:2111.09831v2 [stat.ML] UPDATED)
    Despite the increasing relevance of forecasting methods, causal implications of these algorithms remain largely unexplored. This is concerning considering that, even under simplifying assumptions such as causal sufficiency, the statistical risk of a model can differ significantly from its \textit{causal risk}. Here, we study the problem of \textit{causal generalization} -- generalizing from the observational to interventional distributions -- in forecasting. Our goal is to find answers to the question: How does the efficacy of an autoregressive (VAR) model in predicting statistical associations compare with its ability to predict under interventions? To this end, we introduce the framework of \textit{causal learning theory} for forecasting. Using this framework, we obtain a characterization of the difference between statistical and causal risks, which helps identify sources of divergence between them. Under causal sufficiency, the problem of causal generalization amounts to learning under covariate shifts, albeit with additional structure (restriction to interventional distributions under the VAR model). This structure allows us to obtain uniform convergence bounds on causal generalizability for the class of VAR models. To the best of our knowledge, this is the first work that provides theoretical guarantees for causal generalization in the time-series setting.  ( 2 min )
    Valuing Players Over Time. (arXiv:2209.03882v1 [cs.LG])
    In soccer (or association football), players quickly go from heroes to zeroes, or vice-versa. Performance is not a static measure but a somewhat volatile one. Analyzing performance as a time series rather than a stationary point in time is crucial to making better decisions. This paper introduces and explores I-VAEP and O-VAEP models to evaluate actions and rate players' intention and execution. Then, we analyze these ratings over time and propose use cases to fundament our option of treating player ratings as a continuous problem. As a result, we present who were the best players and how their performance evolved, define volatility metrics to measure a player's consistency, and build a player development curve to assist decision-making.  ( 2 min )
    Patient-specific modelling, simulation and real-time processing for respiratory diseases. (arXiv:2207.01082v5 [eess.IV] UPDATED)
    Asthma is a common chronic disease of the respiratory system causing significant disability and societal burden. It affects more than 300 million people worldwide, while more than 100 million people will likely have asthma by 2025. The price of asthma varies greatly from nation to nation. Mean yearly cost can be estimated to 1900 EUR in Europe and $3100 in the United States. Managing asthma involves controlling symptoms, preventing exacerbations, and maintaining lung function. Improved asthma control is reduces the risk of exacerbations and lung function impairment while reducing the direct costs of asthma care and indirect costs associated with reduced productivity. Understanding the complex dynamics of the pulmonary system and the lung's response to disease is fundamental to the advancement of Asthma treatment. Computational models of the respiratory system seek to provide a theoretical framework to understand the interaction between structure and function. Their application can improve pulmonary medicine by a patient-specific approach to medicinal methodologies optimizing the delivery given the personalized geometry and personalized ventilation patterns. A three-fold objective is addressed within this dissertation. The first part refers to the comprehension of pulmonary pathophysiology and the mechanics of Asthma and subsequently of constrictive pulmonary conditions in general. The second part refers to the design and implementation of tools that facilitate personalized medicine to improve delivery and effectiveness. Finally, the third part refers to the self-management of the condition, meaning that medical personnel and patients have access to tools and methods that allow the first party to easily track the course of the condition and the second party, i.e. the patient to easily self-manage it alleviating the significant burden from the health system.  ( 3 min )
    Discover and Mitigate Unknown Biases with Debiasing Alternate Networks. (arXiv:2207.10077v2 [cs.CV] UPDATED)
    Deep image classifiers have been found to learn biases from datasets. To mitigate the biases, most previous methods require labels of protected attributes (e.g., age, skin tone) as full-supervision, which has two limitations: 1) it is infeasible when the labels are unavailable; 2) they are incapable of mitigating unknown biases -- biases that humans do not preconceive. To resolve those problems, we propose Debiasing Alternate Networks (DebiAN), which comprises two networks -- a Discoverer and a Classifier. By training in an alternate manner, the discoverer tries to find multiple unknown biases of the classifier without any annotations of biases, and the classifier aims at unlearning the biases identified by the discoverer. While previous works evaluate debiasing results in terms of a single bias, we create Multi-Color MNIST dataset to better benchmark mitigation of multiple biases in a multi-bias setting, which not only reveals the problems in previous methods but also demonstrates the advantage of DebiAN in identifying and mitigating multiple biases simultaneously. We further conduct extensive experiments on real-world datasets, showing that the discoverer in DebiAN can identify unknown biases that may be hard to be found by humans. Regarding debiasing, DebiAN achieves strong bias mitigation performance.  ( 3 min )
    Global Context Vision Transformers. (arXiv:2206.09959v2 [cs.CV] UPDATED)
    We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization. Our method leverages global context self-attention modules, joint with local self-attention, to effectively yet efficiently model both long and short-range spatial interactions, without the need for expensive operations such as computing attention masks or shifting local windows. In addition, we address the issue of lack of the inductive bias in ViTs via proposing to use a modified fused inverted residual blocks in our architecture. Our proposed GC ViT achieves state-of-the-art results across image classification, object detection and semantic segmentation tasks. On ImageNet-1K dataset for classification, the tiny, small and base variants of GC ViT with 28M, 51M and 90M parameters achieve 83.3%, 83.9% and 84.5% Top-1 accuracy, respectively, surpassing comparably-sized prior art such as CNN-based ConvNeXt and ViT-based Swin Transformer by a large margin. Pre-trained GC ViT backbones in downstream tasks of object detection, instance segmentation, and semantic segmentation using MS COCO and ADE20K datasets outperform prior work consistently, sometimes by large margins. Code available at https://github.com/NVlabs/GCViT.  ( 2 min )
    Feature Importance Guided Attack: A Model Agnostic Adversarial Attack. (arXiv:2106.14815v2 [cs.LG] UPDATED)
    Research in adversarial learning has primarily focused on homogeneous unstructured datasets, which often map into the problem space naturally. Inverting a feature space attack on heterogeneous datasets into the problem space is much more challenging, particularly the task of finding the perturbation to perform. This work presents a formal search strategy: the `Feature Importance Guided Attack' (FIGA), which finds perturbations in the feature space of heterogeneous tabular datasets to produce evasion attacks. We first demonstrate FIGA in the feature space and then in the problem space. FIGA assumes no prior knowledge of the defending model's learning algorithm and does not require any gradient information. FIGA assumes knowledge of the feature representation and the mean feature values of defending model's dataset. FIGA leverages feature importance rankings by perturbing the most important features of the input in the direction of the target class. While FIGA is conceptually similar to other work which uses feature selection processes (e.g., mimicry attacks), we formalize an attack algorithm with three tunable parameters and investigate the strength of FIGA on tabular datasets. We demonstrate the effectiveness of FIGA by evading phishing detection models trained on four different tabular phishing datasets and one financial dataset with an average success rate of 94%. We extend FIGA to the phishing problem space by limiting the possible perturbations to be valid and feasible in the phishing domain. We generate valid adversarial phishing sites that are visually identical to their unperturbed counterpart and use them to attack six tabular ML models achieving a 13.05% average success rate.  ( 3 min )
    Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls. (arXiv:2202.01337v2 [cs.LG] UPDATED)
    Purpose: Despite the potential of machine learning models, the lack of generalizability has hindered their widespread adoption in clinical practice. We investigate three methodological pitfalls: (1) violation of independence assumption, (2) model evaluation with an inappropriate performance indicator or baseline for comparison, and (3) batch effect. Materials and Methods: Using several retrospective datasets, we implement machine learning models with and without the pitfalls to quantitatively illustrate these pitfalls' effect on model generalizability. Results: Violation of independence assumption, more specifically, applying oversampling, feature selection, and data augmentation before splitting data into train, validation, and test sets, respectively, led to misleading and superficial gains in F1 scores of 71.2% in predicting local recurrence and 5.0% in predicting 3-year overall survival in head and neck cancer as well as 46.0% in distinguishing histopathological patterns in lung cancer. Further, randomly distributing data points for a subject across training, validation, and test sets led to a 21.8% superficial increase in F1 score. Also, we showed the importance of the choice of performance measures and baseline for comparison. In the presence of batch effect, a model built for pneumonia detection led to F1 score of 98.7%. However, when the same model was applied to a new dataset of normal patients, it only correctly classified 3.86% of the samples. Conclusions: These methodological pitfalls cannot be captured using internal model evaluation, and the inaccurate predictions made by such models may lead to wrong conclusions and interpretations. Therefore, understanding and avoiding these pitfalls is necessary for developing generalizable models.  ( 3 min )
    SwiftAgg+: Achieving Asymptotically Optimal Communication Loads in Secure Aggregation for Federated Learning. (arXiv:2203.13060v3 [cs.IT] UPDATED)
    We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communication loads within diminishing gaps. Specifically, in presence of at most $D=o(N)$ dropout users, SwiftAgg+ achieves a per-user communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols and a server communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ symbols, with a worst-case information-theoretic security guarantee, against any subset of up to $T=o(N)$ semi-honest users who may also collude with the curious server. Moreover, the proposed SwiftAgg+ allows for a flexible trade-off between communication loads and the number of active communication links. In particular, for $T<N-D$ and for any $K\in\mathbb{N}$, SwiftAgg+ can achieve the server communication load of $(1+\frac{T}{K})L$ symbols, and per-user communication load of up to $(1+\frac{T+D}{K})L$ symbols, where the number of pair-wise active connections in the network is $\frac{N}{2}(K+T+D+1)$.  ( 3 min )
    Federated Learning for Short-term Residential Load Forecasting. (arXiv:2105.13325v2 [cs.LG] UPDATED)
    Load forecasting is an essential task performed within the energy industry to help balance supply with demand and maintain a stable load on the electricity grid. As supply transitions towards less reliable renewable energy generation, smart meters will prove a vital component to facilitate these forecasting tasks. However, smart meter adoption is low among privacy-conscious consumers that fear intrusion upon their fine-grained consumption data. In this work we propose and explore a federated learning (FL) based approach for training forecasting models in a distributed, collaborative manner whilst retaining the privacy of the underlying data. We compare two approaches: FL, and a clustered variant, FL+HC against a non-private, centralised learning approach and a fully private, localised learning approach. Within these approaches, we measure model performance using RMSE and computational efficiency. In addition, we suggest the FL strategies are followed by a personalisation step and show that model performance can be improved by doing so. We show that FL+HC followed by personalisation can achieve a $\sim$5\% improvement in model performance with a $\sim$10x reduction in computation compared to localised learning. Finally we provide advice on private aggregation of predictions for building a private end-to-end load forecasting application.
    MQRetNN: Multi-Horizon Time Series Forecasting with Retrieval Augmentation. (arXiv:2207.10517v2 [cs.LG] UPDATED)
    Multi-horizon probabilistic time series forecasting has wide applicability to real-world tasks such as demand forecasting. Recent work in neural time-series forecasting mainly focus on the use of Seq2Seq architectures. For example, MQTransformer - an improvement of MQCNN - has shown the state-of-the-art performance in probabilistic demand forecasting. In this paper, we consider incorporating cross-entity information to enhance model performance by adding a cross-entity attention mechanism along with a retrieval mechanism to select which entities to attend over. We demonstrate how our new neural architecture, MQRetNN, leverages the encoded contexts from a pretrained baseline model on the entire population to improve forecasting accuracy. Using MQCNN as the baseline model (due to computational constraints, we do not use MQTransformer), we first show on a small demand forecasting dataset that it is possible to achieve ~3% improvement in test loss by adding a cross-entity attention mechanism where each entity attends to all others in the population. We then evaluate the model with our proposed retrieval methods - as a means of approximating an attention over a large population - on a large-scale demand forecasting application with over 2 million products and observe ~1% performance gain over the MQCNN baseline.
    Responsibility: An Example-based Explainable AI approach via Training Process Inspection. (arXiv:2209.03433v1 [cs.LG])
    Explainable Artificial Intelligence (XAI) methods are intended to help human users better understand the decision making of an AI agent. However, many modern XAI approaches are unintuitive to end users, particularly those without prior AI or ML knowledge. In this paper, we present a novel XAI approach we call Responsibility that identifies the most responsible training example for a particular decision. This example can then be shown as an explanation: "this is what I (the AI) learned that led me to do that". We present experimental results across a number of domains along with the results of an Amazon Mechanical Turk user study, comparing responsibility and existing XAI methods on an image classification task. Our results demonstrate that responsibility can help improve accuracy for both human end users and secondary ML models.
    DIY-IPS: Towards an Off-the-Shelf Accurate Indoor Positioning System. (arXiv:2209.03613v1 [cs.NI])
    We present DIY-IPS - Do It Yourself - Indoor Positioning System, an open-source real-time indoor positioning mobile application. DIY-IPS detects users' indoor position by employing dual-band RSSI fingerprinting of available WiFi access points. The app can be used, without additional infrastructural costs, to detect users' indoor positions in real time. We published our app as an open source to save other researchers time recreating it. The app enables researchers/users to (1) collect indoor positioning datasets with a ground truth label, (2) customize the app for higher accuracy or other research purposes (3) test the accuracy of modified methods by live testing with ground truth. We ran preliminary experiments to demonstrate the effectiveness of the app.
    Transformer-based classification of premise in tweets related to COVID-19. (arXiv:2209.03851v1 [cs.CL])
    Automation of social network data assessment is one of the classic challenges of natural language processing. During the COVID-19 pandemic, mining people's stances from public messages have become crucial regarding understanding attitudes towards health orders. In this paper, the authors propose the predictive model based on transformer architecture to classify the presence of premise in Twitter texts. This work is completed as part of the Social Media Mining for Health (SMM4H) Workshop 2022. We explored modern transformer-based classifiers in order to construct the pipeline efficiently capturing tweets semantics. Our experiments on a Twitter dataset showed that RoBERTa is superior to the other transformer models in the case of the premise prediction task. The model achieved competitive performance with respect to ROC AUC value 0.807, and 0.7648 for the F1 score.
    Inapproximability of a Pair of Forms Defining a Partial Boolean Function. (arXiv:2102.04703v4 [cs.LG] UPDATED)
    We consider the problem of jointly minimizing forms of two Boolean functions $f, g \colon \{0,1\}^J \to \{0,1\}$ such that $f + g \leq 1$ and so as to separate disjoint sets $A \cup B \subseteq \{0,1\}^J$ such that $f(A) = \{1\}$ and $g(B) = \{1\}$. We hypothesize that this problem is easier to solve or approximate than the well-understood problem of minimizing the form of one Boolean function $h: \{0,1\}^J \to \{0,1\}$ such that $h(A) = \{1\}$ and $h(B) = \{0\}$. For a large class of forms, including binary decision trees and ordered binary decision diagrams, we refute this hypothesis. For disjunctive normal forms, we show that the problem is at least as hard as MIN-SET-COVER. For all these forms, we establish that no $o(\ln (|A| + |B| -1))$-approximation algorithm exists unless P$=$NP.
    FathomNet: A global image database for enabling artificial intelligence in the ocean. (arXiv:2109.14646v4 [cs.CV] UPDATED)
    The ocean is experiencing unprecedented rapid change, and visually monitoring marine biota at the spatiotemporal scales needed for responsible stewardship is a formidable task. As baselines are sought by the research community, the volume and rate of this required data collection rapidly outpaces our abilities to process and analyze them. Recent advances in machine learning enables fast, sophisticated analysis of visual data, but have had limited success in the ocean due to lack of data standardization, insufficient formatting, and demand for large, labeled datasets. To address this need, we built FathomNet, an open-source image database that standardizes and aggregates expertly curated labeled data. FathomNet has been seeded with existing iconic and non-iconic imagery of marine animals, underwater equipment, debris, and other concepts, and allows for future contributions from distributed data sources. We demonstrate how FathomNet data can be used to train and deploy models on other institutional video to reduce annotation effort, and enable automated tracking of underwater concepts when integrated with robotic vehicles. As FathomNet continues to grow and incorporate more labeled data from the community, we can accelerate the processing of visual data to achieve a healthy and sustainable global ocean.
    OmniXAI: A Library for Explainable AI. (arXiv:2206.01612v6 [cs.LG] UPDATED)
    We introduce OmniXAI (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice. OmniXAI aims to be a one-stop comprehensive library that makes explainable AI easy for data scientists, ML researchers and practitioners who need explanation for various types of data, models and explanation methods at different stages of ML process (data exploration, feature engineering, model development, evaluation, and decision-making, etc). In particular, our library includes a rich family of explanation methods integrated in a unified interface, which supports multiple data types (tabular data, images, texts, time-series), multiple types of ML models (traditional ML in Scikit-learn and deep learning models in PyTorch/TensorFlow), and a range of diverse explanation methods including "model-specific" and "model-agnostic" ones (such as feature-attribution explanation, counterfactual explanation, gradient-based explanation, etc). For practitioners, the library provides an easy-to-use unified interface to generate the explanations for their applications by only writing a few lines of codes, and also a GUI dashboard for visualization of different explanations for more insights about decisions. In this technical report, we present OmniXAI's design principles, system architectures, and major functionalities, and also demonstrate several example use cases across different types of data, tasks, and models.
    Combing for Credentials: Active Pattern Extraction from Smart Reply. (arXiv:2207.10802v2 [cs.CR] UPDATED)
    With the wide availability of large pre-trained language models such as GPT-2 and BERT, the recent trend has been to fine-tune a pre-trained model to achieve state-of-the-art performance on a downstream task. One natural example is the "Smart Reply" application where a pre-trained model is tuned to provide suggested responses for a given query message. Since these models are often tuned using sensitive data such as emails or chat transcripts, it is important to understand and mitigate the risk that the model leaks its tuning data. We investigate potential information leakage vulnerabilities in a typical Smart Reply pipeline and introduce a new type of active extraction attack that exploits canonical patterns in text containing sensitive data. We show experimentally that it is possible for an adversary to extract sensitive user information present in the training data. We explore potential mitigation strategies and demonstrate empirically how differential privacy appears to be an effective defense mechanism to such pattern extraction attacks.
    A Decentralized Federated Learning Framework via Committee Mechanism with Convergence Guarantee. (arXiv:2108.00365v2 [cs.LG] UPDATED)
    Federated learning allows multiple participants to collaboratively train an efficient model without exposing data privacy. However, this distributed machine learning training method is prone to attacks from Byzantine clients, which interfere with the training of the global model by modifying the model or uploading the false gradient. In this paper, we propose a novel serverless federated learning framework Committee Mechanism based Federated Learning (CMFL), which can ensure the robustness of the algorithm with convergence guarantee. In CMFL, a committee system is set up to screen the uploaded local gradients. The committee system selects the local gradients rated by the elected members for the aggregation procedure through the selection strategy, and replaces the committee member through the election strategy. Based on the different considerations of model performance and defense, two opposite selection strategies are designed for the sake of both accuracy and robustness. Extensive experiments illustrate that CMFL achieves faster convergence and better accuracy than the typical Federated Learning, in the meanwhile obtaining better robustness than the traditional Byzantine-tolerant algorithms, in the manner of a decentralized approach. In addition, we theoretically analyze and prove the convergence of CMFL under different election and selection strategies, which coincides with the experimental results.
    Multihop: Leveraging Complex Models to Learn Accurate Simple Models. (arXiv:2109.06961v2 [cs.LG] UPDATED)
    Knowledge transfer from a complex high performing model to a simpler and potentially low performing one in order to enhance its performance has been of great interest over the last few years as it finds applications in important problems such as explainable artificial intelligence, model compression, robust model building and learning from small data. Known approaches to this problem (viz. Knowledge Distillation, Model compression, ProfWeight, etc.) typically transfer information directly (i.e. in a single/one hop) from the complex model to the chosen simple model through schemes that modify the target or reweight training examples on which the simple model is trained. In this paper, we propose a meta-approach where we transfer information from the complex model to the simple model by dynamically selecting and/or constructing a sequence of intermediate models of decreasing complexity that are less intricate than the original complex model. Our approach can transfer information between consecutive models in the sequence using any of the previously mentioned approaches as well as work in 1-hop fashion, thus generalizing these approaches. In the experiments on real data, we observe that we get consistent gains for different choices of models over 1-hop, which on average is more than 2\% and reaches up to 8\% in a particular case. We also empirically analyze conditions under which the multi-hop approach is likely to be beneficial over the traditional 1-hop approach, and report other interesting insights. To the best of our knowledge, this is the first work that proposes such a multi-hop approach to perform knowledge transfer given a single high performing complex model, making it in our opinion, an important methodological contribution.
    Recovering network topology and dynamics via sequence characterization. (arXiv:2206.15190v2 [cs.SI] UPDATED)
    Sequences arise in many real-world scenarios; thus, identifying the mechanisms behind symbol generation is essential to understanding many complex systems. This paper analyzes sequences generated by agents walking on a networked topology. Given that in many real scenarios, the underlying processes generating the sequence is hidden, we investigate whether the reconstruction of the network via the co-occurrence method is useful to recover both the network topology and agent dynamics generating sequences. We found that the characterization of reconstructed networks provides valuable information regarding the process and topology used to create the sequences. In a machine learning approach considering 16 combinations of network topology and agent dynamics as classes, we obtained an accuracy of 87% with sequences generated with less than 40% of nodes visited. Larger sequences turned out to generate improved machine learning models. Our findings suggest that the proposed methodology could be extended to classify sequences and understand the mechanisms behind sequence generation.
    Stochastic Coded Federated Learning with Convergence and Privacy Guarantees. (arXiv:2201.10092v5 [cs.LG] UPDATED)
    Federated learning (FL) has attracted much attention as a privacy-preserving distributed machine learning framework, where many clients collaboratively train a machine learning model by exchanging model updates with a parameter server instead of sharing their raw data. Nevertheless, FL training suffers from slow convergence and unstable performance due to stragglers caused by the heterogeneous computational resources of clients and fluctuating communication rates. This paper proposes a coded FL framework to mitigate the straggler issue, namely stochastic coded federated learning (SCFL). In this framework, each client generates a privacy-preserving coded dataset by adding additive noise to the random linear combination of its local data. The server collects the coded datasets from all the clients to construct a composite dataset, which helps to compensate for the straggling effect. In the training process, the server as well as clients perform mini-batch stochastic gradient descent (SGD), and the server adds a make-up term in model aggregation to obtain unbiased gradient estimates. We characterize the privacy guarantee by the mutual information differential privacy (MI-DP) and analyze the convergence performance in federated learning. Besides, we demonstrate a privacy-performance tradeoff of the proposed SCFL method by analyzing the influence of the privacy constraint on the convergence rate. Finally, numerical experiments corroborate our analysis and show the benefits of SCFL in achieving fast convergence while preserving data privacy.
    Zero Pixel Directional Boundary by Vector Transform. (arXiv:2203.08795v2 [cs.CV] UPDATED)
    Boundaries are among the primary visual cues used by human and computer vision systems. One of the key problems in boundary detection is the label representation, which typically leads to class imbalance and, as a consequence, to thick boundaries that require non-differential post-processing steps to be thinned. In this paper, we re-interpret boundaries as 1-D surfaces and formulate a one-to-one vector transform function that allows for training of boundary prediction completely avoiding the class imbalance issue. Specifically, we define the boundary representation at any point as the unit vector pointing to the closest boundary surface. Our problem formulation leads to the estimation of direction as well as richer contextual information of the boundary, and, if desired, the availability of zero-pixel thin boundaries also at training time. Our method uses no hyper-parameter in the training loss and a fixed stable hyper-parameter at inference. We provide theoretical justification/discussions of the vector transform representation. We evaluate the proposed loss method using a standard architecture and show the excellent performance over other losses and representations on several datasets. Code is available at https://github.com/edomel/BoundaryVT.
    Position Aided Beam Prediction in the Real World: How Useful GPS Locations Actually Are?. (arXiv:2205.09054v5 [eess.SP] UPDATED)
    Millimeter-wave (mmWave) communication systems rely on narrow beams for achieving sufficient receive signal power. Adjusting these beams is typically associated with large training overhead, which becomes particularly critical for highly-mobile applications. Intuitively, since optimal beam selection can benefit from the knowledge of the positions of communication terminals, there has been increasing interest in leveraging position data to reduce the overhead in mmWave beam prediction. Prior work, however, studied this problem using only synthetic data that generally does not accurately represent real-world measurements. In this paper, we investigate position-aided beam prediction using a real-world large-scale dataset to derive insights into precisely how much overhead can be saved in practice. Furthermore, we analyze which machine learning algorithms perform best, what factors degrade inference performance in real data, and which machine learning metrics are more meaningful in capturing the actual communication system performance.
    Differentially Empirical Risk Minimization under the Fairness Lens. (arXiv:2106.02674v2 [cs.LG] UPDATED)
    Differential Privacy (DP) is an important privacy-enhancing technology for private machine learning systems. It allows to measure and bound the risk associated with an individual participation in a computation. However, it was recently observed that DP learning systems may exacerbate bias and unfairness for different groups of individuals. This paper builds on these important observations and sheds light on the causes of the disparate impacts arising in the problem of differentially private empirical risk minimization. It focuses on the accuracy disparity arising among groups of individuals in two well-studied DP learning methods: output perturbation and differentially private stochastic gradient descent. The paper analyzes which data and model properties are responsible for the disproportionate impacts, why these aspects are affecting different groups disproportionately and proposes guidelines to mitigate these effects. The proposed approach is evaluated on several datasets and settings.
    Bayesian regularization of empirical MDPs. (arXiv:2208.02362v2 [cs.LG] UPDATED)
    In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.
    NeuralFMU: Presenting a workflow for integrating hybrid NeuralODEs into real world applications. (arXiv:2209.03933v1 [cs.LG])
    The term NeuralODE describes the structural combination of an Artifical Neural Network (ANN) and a numerical solver for Ordinary Differential Equations (ODEs), the former acts as the right-hand side of the ODE to be solved. This concept was further extended by a black-box model in the form of a Functional Mock-up Unit (FMU) to obtain a subclass of NeuralODEs, named NeuralFMUs. The resulting structure features the advantages of first-principle and data-driven modeling approaches in one single simulation model: A higher prediction accuracy compared to conventional First Principle Models (FPMs), while also a lower training effort compared to purely data-driven models. We present an intuitive workflow to setup and use NeuralFMUs, enabling the encapsulation and reuse of existing conventional models exported from common modeling tools. Moreover, we exemplify this concept by deploying a NeuralFMU for a consumption simulation based on a Vehicle Longitudinal Dynamics Model (VLDM), which is a typical use case in automotive industry. Related challenges that are often neglected in scientific use cases, like real measurements (e.g. noise), an unknown system state or high-frequent discontinuities, are handled in this contribution. For the aim to build a hybrid model with a higher prediction quality than the original FPM, we briefly highlight two open-source libraries: FMI.jl for integrating FMUs into the Julia programming environment, as well as an extension to this library called FMIFlux.jl, that allows for the integration of FMUs into a neural network topology to finally obtain a NeuralFMU.
    Meta Clustering for Collaborative Learning. (arXiv:2006.00082v2 [cs.LG] UPDATED)
    In collaborative learning, learners coordinate to enhance each of their learning performances. From the perspective of any learner, a critical challenge is to filter out unqualified collaborators. We propose a framework named meta clustering to address the challenge. Unlike the classical problem of clustering data points, meta clustering categorizes learners. Assuming each learner performs a supervised regression on a standalone local dataset, we propose a Select-Exchange-Cluster (SEC) method to classify the learners by their underlying supervised functions. We theoretically show that the SEC can cluster learners into accurate collaboration sets. Empirical studies corroborate the theoretical analysis and demonstrate that SEC can be computationally efficient, robust against learner heterogeneity, and effective in enhancing single-learner performance. Also, we show how the proposed approach may be used to enhance data fairness. Supplementary materials for this article are available online.
    PredDiff: Explanations and Interactions from Conditional Expectations. (arXiv:2102.13519v4 [cs.LG] UPDATED)
    PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes while marginalizing features. In this work, we clarify properties of PredDiff and its close connection to Shapley values. We stress important differences between classification and regression, which require a specific treatment within both formalisms. We extend PredDiff by introducing a new, well-founded measure for interaction effects between arbitrary feature subsets. The study of interaction effects represents an inevitable step towards a comprehensive understanding of black-box models and is particularly important for science applications. Equipped with our novel interaction measure, PredDiff is a promising model-agnostic approach for obtaining reliable, numerically inexpensive and theoretically sound attributions.
    Modified DDPG car-following model with a real-world human driving experience with CARLA simulator. (arXiv:2112.14602v3 [cs.RO] UPDATED)
    In the autonomous driving field, fusion of human knowledge into Deep Reinforcement Learning (DRL) is often based on the human demonstration recorded in a simulated environment. This limits the generalization and the feasibility of application in real-world traffic. We propose a two-stage DRL method to train a car-following agent, that modifies the policy by leveraging the real-world human driving experience and achieves performance superior to the pure DRL agent. Training a DRL agent is done within CARLA framework with Robot Operating System (ROS). For evaluation, we designed different driving scenarios to compare the proposed two-stage DRL car-following agent with other agents. After extracting the "good" behavior from the human driver, the agent becomes more efficient and reasonable, which makes this autonomous agent more suitable for Human-Robot Interaction (HRI) traffic.
    Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets. (arXiv:2207.06920v2 [cs.SD] UPDATED)
    We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the 1st-stage, we adapt a recently proposed quantization technique using a non-linear transformation with tanh(.) on dense layer weights. In the 2nd-stage, we use linear quantization methods on the rest of the network, including other parameters (bias, gain, batchnorm), inputs, and activations. We conduct large scale experiments, training on 26,000 hours of de-identified production, far-field and near-field audio data (evaluating on 4,000 hours of data). We organize our results in two embedded chipset settings: a) with commodity ARM NEON instruction set and 8-bit containers, we present accuracy, CPU, and memory results using sub 8-bit weights (4, 5, 8-bit) and 8-bit quantization of rest of the network; b) with off-the-shelf neural network accelerators, for a range of weight bit widths (1 and 5-bit), while presenting accuracy results, we project reduction in memory utilization. In both configurations, our results show that the proposed algorithm can achieve: a) parity with a full floating point model's operating point on a detection error tradeoff (DET) curve in terms of false detection rate (FDR) at false rejection rate (FRR); b) significant reduction in compute and memory, yielding up to 3 times improvement in CPU consumption and more than 4 times improvement in memory consumption.
    Fixed Points of Cone Mapping with the Application to Neural Networks. (arXiv:2207.09947v2 [math.DS] UPDATED)
    We derive conditions for the existence of fixed points of cone mappings without assuming scalability of functions. Monotonicity and scalability are often inseparable in the literature in the context of searching for fixed points of interference mappings. In applications, such mappings are approximated by non-negative neural networks. It turns out, however, that the process of training non-negative networks requires imposing an artificial constraint on the weights of the model. However, in the case of specific non-negative data, it cannot be said that if the mapping is non-negative, it has only non-negative weights. Therefore, we considered the problem of the existence of fixed points for general neural networks, assuming the conditions of tangency conditions with respect to specific cones. This does not relax the physical assumptions, because even assuming that the input and output are to be non-negative, the weights can have (small, but) less than zero values. Such properties (often found in papers on the interpretability of weights of neural networks) lead to the weakening of the assumptions about the monotonicity or scalability of the mapping associated with the neural network. To the best of our knowledge, this paper is the first to study this phenomenon.
    Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks. (arXiv:2206.08966v2 [cs.CY] UPDATED)
    Artificial intelligence (AI) systems can provide many beneficial capabilities but also risks of adverse events. Some AI systems could present risks of events with very high or catastrophic consequences at societal scale. The US National Institute of Standards and Technology (NIST) is developing the NIST Artificial Intelligence Risk Management Framework (AI RMF) as voluntary guidance on AI risk assessment and management for AI developers and others. For addressing risks of events with catastrophic consequences, NIST indicated a need to translate from high level principles to actionable risk management guidance. In this document, we provide detailed actionable-guidance recommendations focused on identifying and managing risks of events with very high or catastrophic consequences, intended as a risk management practices resource for NIST for AI RMF version 1.0 (scheduled for release in early 2023), or for AI RMF users, or for other AI risk management guidance and standards as appropriate. We also provide our methodology for our recommendations. We provide actionable-guidance recommendations for AI RMF 1.0 on: identifying risks from potential unintended uses and misuses of AI systems; including catastrophic-risk factors within the scope of risk assessments and impact assessments; identifying and mitigating human rights harms; and reporting information on AI risk factors including catastrophic-risk factors. In addition, we provide recommendations on additional issues for a roadmap for later versions of the AI RMF or supplementary publications. These include: providing an AI RMF Profile with supplementary guidance for cutting-edge increasingly multi-purpose or general-purpose AI. We aim for this work to be a concrete risk-management practices contribution, and to stimulate constructive dialogue on how to address catastrophic risks and associated issues in AI standards.
    HiFi++: a Unified Framework for Bandwidth Extension and Speech Enhancement. (arXiv:2203.13086v2 [cs.SD] UPDATED)
    Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models. In this paper, we show that this success can be extended to other tasks of conditional audio generation. In particular, building upon HiFi vocoders, we propose a novel HiFi++ general framework for bandwidth extension and speech enhancement. We show that with the improved generator architecture and simplified multi-discriminator training, HiFi++ performs better or on par with the state-of-the-art in these tasks while spending significantly less computational resources. The effectiveness of our approach is validated through a series of extensive experiments.
    A multi view multi stage and multi window framework for pulmonary artery segmentation from CT scans. (arXiv:2209.03918v1 [eess.IV])
    This is the technical report of the 9th place in the final result of PARSE2022 Challenge. We solve the segmentation problem of the pulmonary artery by using a two-stage method based on a 3D CNN network. The coarse model is used to locate the ROI, and the fine model is used to refine the segmentation result. In addition, in order to improve the segmentation performance, we adopt multi-view and multi-window level method, at the same time we employ a fine-tune strategy to mitigate the impact of inconsistent labeling.
    T$^2$LR-Net: An Unrolling Reconstruction Network Learning Transformed Tensor Low-Rank prior for Dynamic MR Imaging. (arXiv:2209.03832v1 [eess.IV])
    While the methods exploiting the tensor low-rank prior are booming in high-dimensional data processing and have obtained satisfying performance, their applications in dynamic magnetic resonance (MR) image reconstruction are limited. In this paper, we concentrate on the tensor singular value decomposition (t-SVD), which is based on the Fast Fourier Transform (FFT) and only provides the definite and limited tensor low-rank prior in the FFT domain, heavily reliant upon how closely the data and the FFT domain match up. By generalizing the FFT into an arbitrary unitary transformation of the transformed t-SVD and proposing the transformed tensor nuclear norm (TTNN), we introduce a flexible model based on TTNN with the ability to exploit the tensor low-rank prior of a transformed domain in a larger transformation space and elaborately design an iterative optimization algorithm based on the alternating direction method of multipliers (ADMM), which is further unrolled into a model-based deep unrolling reconstruction network to learn the transformed tensor low-rank prior (T$^2$LR-Net). The convolutional neural network (CNN) is incorporated within the T$^2$LR-Net to learn the best-matched transform from the dynamic MR image dataset. The unrolling reconstruction network also provides a new perspective on the low-rank prior utilization by exploiting the low-rank prior in the CNN-extracted feature domain. Experimental results on two cardiac cine MR datasets demonstrate that the proposed framework can provide improved recovery results compared with the state-of-the-art optimization-based and unrolling network-based methods.
    Sequential Information Design: Learning to Persuade in the Dark. (arXiv:2209.03927v1 [cs.LG])
    We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. We consider settings where the receiver faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of sender's persuasive information structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result: no learning algorithm can be persuasive. Thus, we relax persuasiveness requirements by focusing on algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting -- where the sender observes all random events realizations -- , we provide an algorithm with $\tilde{O}(\sqrt{T})$ regret for both the sender and the receiver. Instead, in the bandit-feedback setting -- where the sender only observes the realizations of random events actually occurring in the SDM problem -- , we design an algorithm that, given an $\alpha \in [1/2, 1]$ as input, ensures $\tilde{O}({T^\alpha})$ and $\tilde{O}( T^{\max \{ \alpha, 1-\frac{\alpha}{2} \} })$ regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regrets trade-off is essentially tight.
    Applying Transformer-based Text Summarization for Keyphrase Generation. (arXiv:2209.03791v1 [cs.CL])
    Keyphrases are crucial for searching and systematizing scholarly documents. Most current methods for keyphrase extraction are aimed at the extraction of the most significant words in the text. But in practice, the list of keyphrases often includes words that do not appear in the text explicitly. In this case, the list of keyphrases represents an abstractive summary of the source text. In this paper, we experiment with popular transformer-based models for abstractive text summarization using four benchmark datasets for keyphrase extraction. We compare the results obtained with the results of common unsupervised and supervised methods for keyphrase extraction. Our evaluation shows that summarization models are quite effective in generating keyphrases in the terms of the full-match F1-score and BERTScore. However, they produce a lot of words that are absent in the author's list of keyphrases, which makes summarization models ineffective in terms of ROUGE-1. We also investigate several ordering strategies to concatenate target keyphrases. The results showed that the choice of strategy affects the performance of keyphrase generation.
    Text-Free Learning of a Natural Language Interface for Pretrained Face Generators. (arXiv:2209.03953v1 [cs.CV])
    We propose Fast text2StyleGAN, a natural language interface that adapts pre-trained GANs for text-guided human face synthesis. Leveraging the recent advances in Contrastive Language-Image Pre-training (CLIP), no text data is required during training. Fast text2StyleGAN is formulated as a conditional variational autoencoder (CVAE) that provides extra control and diversity to the generated images at test time. Our model does not require re-training or fine-tuning of the GANs or CLIP when encountering new text prompts. In contrast to prior work, we do not rely on optimization at test time, making our method orders of magnitude faster than prior work. Empirically, on FFHQ dataset, our method offers faster and more accurate generation of images from natural language descriptions with varying levels of detail compared to prior work.
    Stochastic gradient descent with gradient estimator for categorical features. (arXiv:2209.03771v1 [cs.LG])
    Categorical data are present in key areas such as health or supply chain, and this data require specific treatment. In order to apply recent machine learning models on such data, encoding is needed. In order to build interpretable models, one-hot encoding is still a very good solution, but such encoding creates sparse data. Gradient estimators are not suited for sparse data: the gradient is mainly considered as zero while it simply does not always exists, thus a novel gradient estimator is introduced. We show what this estimator minimizes in theory and show its efficiency on different datasets with multiple model architectures. This new estimator performs better than common estimators under similar settings. A real world retail dataset is also released after anonymization. Overall, the aim of this paper is to thoroughly consider categorical data and adapt models and optimizers to these key features.
    ReX: A Framework for Generating Local Explanations to Recurrent Neural Networks. (arXiv:2209.03798v1 [cs.LG])
    We propose a general framework to adapt various local explanation techniques to recurrent neural networks. In particular, our explanations add temporal information, which expand explanations generated from existing techniques to cover data points that have different lengths compared to the original input data point. Our approach is general as it only modifies the perturbation model and feature representation of existing techniques without touching their core algorithms. We have instantiated our approach on LIME and Anchors. Our empirical evaluation shows that it effectively improves the usefulness of explanations generated by these two techniques on a sentiment analysis network and an anomaly detection network.
    Meta Discovery: Learning to Discover Novel Classes given Very Limited Data. (arXiv:2102.04002v4 [cs.LG] UPDATED)
    In novel class discovery (NCD), we are given labeled data from seen classes and unlabeled data from unseen classes, and we train clustering models for the unseen classes. However, the implicit assumptions behind NCD are still unclear. In this paper, we demystify assumptions behind NCD and find that high-level semantic features should be shared among the seen and unseen classes. Based on this finding, NCD is theoretically solvable under certain assumptions and can be naturally linked to meta-learning that has exactly the same assumption as NCD. Thus, we can empirically solve the NCD problem by meta-learning algorithms after slight modifications. This meta-learning-based methodology significantly reduces the amount of unlabeled data needed for training and makes it more practical, as demonstrated in experiments. The use of very limited data is also justified by the application scenario of NCD: since it is unnatural to label only seen-class data, NCD is sampling instead of labeling in causality. Therefore, unseen-class data should be collected on the way of collecting seen-class data, which is why they are novel and first need to be clustered.
    Data Feedback Loops: Model-driven Amplification of Dataset Biases. (arXiv:2209.03942v1 [cs.LG])
    Datasets scraped from the internet have been critical to the successes of large-scale machine learning. Yet, this very success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we first formalize a system where interactions with one model are recorded as history and scraped as training data in the future. We then analyze its stability over time by tracking changes to a test-time bias statistic (e.g. gender bias of model predictions). We find that the degree of bias amplification is closely linked to whether the model's outputs behave like samples from the training distribution, a behavior which we characterize and define as consistent calibration. Experiments in three conditional prediction scenarios - image classification, visual role-labeling, and language generation - demonstrate that models that exhibit a sampling-like behavior are more calibrated and thus more stable. Based on this insight, we propose an intervention to help calibrate and stabilize unstable feedback systems. Code is available at https://github.com/rtaori/data_feedback.
    Histogram Layers for Synthetic Aperture Sonar Imagery. (arXiv:2209.03878v1 [cs.CV])
    Synthetic aperture sonar (SAS) imagery is crucial for several applications, including target recognition and environmental segmentation. Deep learning models have led to much success in SAS analysis; however, the features extracted by these approaches may not be suitable for capturing certain textural information. To address this problem, we present a novel application of histogram layers on SAS imagery. The addition of histogram layer(s) within the deep learning models improved performance by incorporating statistical texture information on both synthetic and real-world datasets.
    End-to-end Robustness for Sensing-Reasoning Machine Learning Pipelines. (arXiv:2003.00120v4 [cs.LG] UPDATED)
    Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models with knowledge (expressed as logical rules) as a reasoning component using Markov logic networks (MLN, so as to further improve the overall certified robustness. This opens new research questions about certifying the robustness of such a paradigm, especially the reasoning component (e.g., MLN). As the first step towards understanding these questions, we first prove that the computational complexity of certifying the robustness of MLN is #P-hard. Guided by this hardness result, we then derive the first certified robustness bound for MLN by carefully analyzing different model regimes. Finally, we conduct extensive experiments on five datasets including both high-dimensional images and natural language texts, and we show that the certified robustness with knowledge-based logical reasoning indeed significantly outperforms that of the state-of-the-art.
    Tuning arrays with rays: Physics-informed tuning of quantum dot charge states. (arXiv:2209.03837v1 [cond-mat.mes-hall])
    Quantum computers based on gate-defined quantum dots (QDs) are expected to scale. However, as the number of qubits increases, the burden of manually calibrating these systems becomes unreasonable and autonomous tuning must be used. There have been a range of recent demonstrations of automated tuning of various QD parameters such as coarse gate ranges, global state topology (e.g. single QD, double QD), charge, and tunnel coupling with a variety of methods. Here, we demonstrate an intuitive, reliable, and data-efficient set of tools for automated global state and charge tuning in a framework deemed physics-informed tuning (PIT). The first module of PIT is an action-based algorithm that combines a machine learning (ML) classifier with physics knowledge to navigate to a target global state. The second module uses a series of one-dimensional measurements to tune to a target charge state by first emptying the QDs of charge, followed by calibrating capacitive couplings, and navigating to the target charge state. The success rate for the action-based tuning consistently surpasses $95~\%$ on both simulated and experimental data suitable for off-line testing. The success rate for charge setting is comparable when testing with simulated data, at $95.5(5.4)~\%$, and only slightly worse for off-line experimental tests, with an average of $89.7(17.4)~\%$ (median $97.5~\%$). It's noteworthy that the high performance is demonstrated both on data from samples fabricated in an academic cleanroom as well as on an industrial 300 mm process line, further underlining the device-agnosticity of PIT. Together, these tests on a range of simulated and experimental devices demonstrate the effectiveness and robustness of PIT.
    FADE: Enabling Large-Scale Federated Adversarial Training on Resource-Constrained Edge Devices. (arXiv:2209.03839v1 [cs.LG])
    Adversarial Training (AT) has been proven to be an effective method of introducing strong adversarial robustness into deep neural networks. However, the high computational cost of AT prohibits the deployment of large-scale AT on resource-constrained edge devices, e.g., with limited computing power and small memory footprint, in Federated Learning (FL) applications. Very few previous studies have tried to tackle these constraints in FL at the same time. In this paper, we propose a new framework named Federated Adversarial Decoupled Learning (FADE) to enable AT on resource-constrained edge devices in FL. FADE reduces the computation and memory usage by applying Decoupled Greedy Learning (DGL) to federated adversarial training such that each client only needs to perform AT on a small module of the entire model in each communication round. In addition, we improve vanilla DGL by adding an auxiliary weight decay to alleviate objective inconsistency and achieve better performance. FADE offers a theoretical guarantee for the adversarial robustness and convergence. The experimental results also show that FADE can significantly reduce the computing resources consumed by AT while maintaining almost the same accuracy and robustness as fully joint training.
    Mean Field Games on Weighted and Directed Graphs via Colored Digraphons. (arXiv:2209.03887v1 [cs.MA])
    The field of multi-agent reinforcement learning (MARL) has made considerable progress towards controlling challenging multi-agent systems by employing various learning methods. Numerous of these approaches focus on empirical and algorithmic aspects of the MARL problems and lack a rigorous theoretical foundation. Graphon mean field games (GMFGs) on the other hand provide a scalable and mathematically well-founded approach to learning problems that involve a large number of connected agents. In standard GMFGs, the connections between agents are undirected, unweighted and invariant over time. Our paper introduces colored digraphon mean field games (CDMFGs) which allow for weighted and directed links between agents that are also adaptive over time. Thus, CDMFGs are able to model more complex connections than standard GMFGs. Besides a rigorous theoretical analysis including both existence and convergence guarantees, we provide a learning scheme and illustrate our findings with an epidemics model and a model of the systemic risk in financial markets.
    Multiobjective Ranking and Selection Using Stochastic Kriging. (arXiv:2209.03919v1 [stat.ML])
    We consider multiobjective simulation optimization problems, where several conflicting objectives are optimized simultaneously, and can only be observed via stochastic simulation. The goal is to find or approximate a (discrete) set of Pareto-optimal solutions that reveal the essential trade-offs between the objectives, where optimality means that no objective can be improved without deteriorating the quality of any other objective. The noise in the observed performance may lead to two possible misclassification errors: solutions that are truly Pareto-optimal can be wrongly considered dominated, and solutions that are truly dominated can be wrongly considered Pareto-optimal. We propose a Bayesian multiobjective ranking and selection method to reduce the number of errors when identifying the solutions with the true best expected performance. We use stochastic kriging metamodels to build reliable predictive distributions of the objectives, and exploit this information in two efficient screening procedures and two novel sampling criteria. We use these in a sequential sampling algorithm to decide how to allocate samples. Experimental results show that the proposed method only requires a small fraction of samples compared to the standard allocation method, and it's competitive against the state-of-the-art, with the exploitation of the correlation structure being the dominant contributor to the improvement.
    Toward Robust Autotuning of Noisy Quantum Dot Devices. (arXiv:2108.00043v3 [quant-ph] UPDATED)
    The current autotuning approaches for quantum dot (QD) devices, while showing some success, lack an assessment of data reliability. This leads to unexpected failures when noisy or otherwise low-quality data is processed by an autonomous system. In this work, we propose a framework for robust autotuning of QD devices that combines a machine learning (ML) state classifier with a data quality control module. The data quality control module acts as a "gatekeeper" system, ensuring that only reliable data are processed by the state classifier. Lower data quality results in either device recalibration or termination. To train both ML systems, we enhance the QD simulation by incorporating synthetic noise typical of QD experiments. We confirm that the inclusion of synthetic noise in the training of the state classifier significantly improves the performance, resulting in an accuracy of 95.0(9) % when tested on experimental data. We then validate the functionality of the data quality control module by showing that the state classifier performance deteriorates with decreasing data quality, as expected. Our results establish a robust and flexible ML framework for autonomous tuning of noisy QD devices.
    Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond. (arXiv:2110.14759v2 [cs.LG] UPDATED)
    We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs). The algorithm optimizes a nonconvex continuous relaxation of the CRF inference problem using vanilla Frank-Wolfe with approximate updates, which are equivalent to minimizing a regularized energy function. Our proposed method is a generalization of existing algorithms such as mean field or concave-convex procedure. This perspective not only offers a unified analysis of these algorithms, but also allows an easy way of exploring different variants that potentially yield better performance. We illustrate this in our empirical results on standard semantic segmentation datasets, where several instantiations of our regularized Frank-Wolfe outperform mean field inference, both as a standalone component and as an end-to-end trainable layer in a neural network. We also show that dense CRFs, coupled with our new algorithms, produce significant improvements over strong CNN baselines.
    SafeNet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning. (arXiv:2205.09986v2 [cs.CR] UPDATED)
    Secure multiparty computation (MPC) has been proposed to allow multiple mutually distrustful data owners to jointly train machine learning (ML) models on their combined data. However, by design, MPC protocols faithfully compute the training functionality, which the adversarial ML community has shown to leak private information and can be tampered with in poisoning attacks. In this work, we argue that model ensembles, implemented in our framework called SafeNet, are a highly MPC-amenable way to avoid many adversarial ML attacks. The natural partitioning of data amongst owners in MPC training allows this approach to be highly scalable at training time, provide provable protection from poisoning attacks, and provably defense against a number of privacy attacks. We demonstrate SafeNet's efficiency, accuracy, and resilience to poisoning on several machine learning datasets and models trained in end-to-end and transfer learning scenarios. For instance, SafeNet reduces backdoor attack success significantly, while achieving $39\times$ faster training and $36 \times$ less communication than the four-party MPC framework of Dalskov et al. Our experiments show that ensembling retains these benefits even in many non-iid settings. The simplicity, cheap setup, and robustness properties of ensembling make it a strong first choice for training ML models privately in MPC.
    A Light-Weight Multi-Objective Asynchronous Hyper-Parameter Optimizer. (arXiv:2202.07735v2 [cs.LG] UPDATED)
    We describe a light-weight yet performant system for hyper-parameter optimization that approximately minimizes an overall scalar cost function that is obtained by combining multiple performance objectives using a target-priority-limit scalarizer. It also supports a trade-off mode, where the goal is to find an appropriate trade-off among objectives by interacting with the user. We focus on the common scenario where there are on the order of tens of hyper-parameters, each with various attributes such as a range of continuous values, or a finite list of values, and whether it should be treated on a linear or logarithmic scale. The system supports multiple asynchronous simulations and is robust to simulation stragglers and failures.
    Deep Neural Networks to Correct Sub-Precision Errors in CFD. (arXiv:2202.04233v2 [physics.flu-dyn] UPDATED)
    Loss of information in numerical simulations can arise from various sources while solving discretized partial differential equations. In particular, precision-related errors can accumulate in the quantities of interest when the simulations are performed using low-precision 16-bit floating-point arithmetic compared to an equivalent 64-bit simulation. Here, low-precision computation requires much lower resources than high-precision computation. Several machine learning (ML) techniques proposed recently have been successful in correcting the errors arising from spatial discretization. In this work, we extend these techniques to improve Computational Fluid Dynamics (CFD) simulations performed using low numerical precision. We first quantify the precision related errors accumulated in a Kolmogorov forced turbulence test case. Subsequently, we employ a Convolutional Neural Network together with a fully differentiable numerical solver performing 16-bit arithmetic to learn a tightly-coupled ML-CFD hybrid solver. Compared to the 16-bit solver, we demonstrate the efficacy of the ML-CFD hybrid solver towards reducing the error accumulation in the velocity field and improving the kinetic energy spectrum at higher frequencies.
    Differential Privacy and Fairness in Decisions and Learning Tasks: A Survey. (arXiv:2202.08187v2 [cs.LG] UPDATED)
    This paper surveys recent work in the intersection of differential privacy (DP) and fairness. It reviews the conditions under which privacy and fairness may have aligned or contrasting goals, analyzes how and why DP may exacerbate bias and unfairness in decision problems and learning tasks, and describes available mitigation measures for the fairness issues arising in DP systems. The survey provides a unified understanding of the main challenges and potential risks arising when deploying privacy-preserving machine-learning or decisions-making tasks under a fairness lens.
    Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes. (arXiv:2209.03695v1 [cs.LG])
    A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.
    Learning Sparse Graphon Mean Field Games. (arXiv:2209.03880v1 [cs.MA])
    Although the field of multi-agent reinforcement learning (MARL) has made considerable progress in the last years, solving systems with a large number of agents remains a hard challenge. Graphon mean field games (GMFGs) enable the scalable analysis of MARL problems that are otherwise intractable. By the mathematical structure of graphons, this approach is limited to dense graphs which are insufficient to describe many real-world networks such as power law graphs. Our paper introduces a novel formulation of GMFGs, called LPGMFGs, which leverages the graph theoretical concept of $L^p$ graphons and provides a machine learning tool to efficiently and accurately approximate solutions for sparse network problems. This especially includes power law networks which are empirically observed in various application areas and cannot be captured by standard graphons. We derive theoretical existence and convergence guarantees and give empirical examples that demonstrate the accuracy of our learning approach for systems with many agents. Furthermore, we rigorously extend the Online Mirror Descent (OMD) learning algorithm to our setup to accelerate learning speed, allow for agent interaction through the mean field in the transition kernel, and empirically show its capabilities. In general, we provide a scalable, mathematically well-founded machine learning approach to a large class of otherwise intractable problems of great relevance in numerous research fields.
    Sparse Coding with Multi-Layer Decoders using Variance Regularization. (arXiv:2112.09214v2 [cs.CV] UPDATED)
    Sparse representations of images are useful in many computer vision applications. Sparse coding with an $l_1$ penalty and a learned linear dictionary requires regularization of the dictionary to prevent a collapse in the $l_1$ norms of the codes. Typically, this regularization entails bounding the Euclidean norms of the dictionary's elements. In this work, we propose a novel sparse coding protocol which prevents a collapse in the codes without the need to regularize the decoder. Our method regularizes the codes directly so that each latent code component has variance greater than a fixed threshold over a set of sparse representations for a given set of inputs. Furthermore, we explore ways to effectively train sparse coding systems with multi-layer decoders since they can model more complex relationships than linear dictionaries. In our experiments with MNIST and natural image patches, we show that decoders learned with our approach have interpretable features both in the linear and multi-layer case. Moreover, we show that sparse autoencoders with multi-layer decoders trained using our variance regularization method produce higher quality reconstructions with sparser representations when compared to autoencoders with linear dictionaries. Additionally, sparse representations obtained with our variance regularization approach are useful in the downstream tasks of denoising and classification in the low-data regime.
    IMAP: Individual huMAn mobility Patterns visualizing platform. (arXiv:2209.03615v1 [cs.SI])
    Understanding human mobility is essential for the development of smart cities and social behavior research. Human mobility models may be used in numerous applications, including pandemic control, urban planning, and traffic management. The existing models' accuracy in predicting users' mobility patterns is less than 25%. The low accuracy may be justified by the flexible nature of the human movement. Indeed, humans are not rigid in their daily movement. In addition, the rigid mobility models may result in missing the hidden regularities in users' records. Thus, we propose a novel perspective to study and analyze human mobility patterns and capture their flexibility. Typically, the mobility patterns are represented by a sequence of locations. We propose to define the mobility patterns by abstracting these locations into a set of places. Labeling these locations will allow us to detect close-to-reality hidden patterns. We present IMAP, an Individual huMAn mobility Patterns visualizing platform. Our platform enables users to visualize a graph of the places they visited based on their history records. In addition, our platform displays the most frequent mobility patterns computed using a modified PrefixSpan approach.
    On the Near-Optimality of Local Policies in Large Cooperative Multi-Agent Reinforcement Learning. (arXiv:2209.03491v1 [cs.LG])
    We show that in a cooperative $N$-agent network, one can design locally executable policies for the agents such that the resulting discounted sum of average rewards (value) well approximates the optimal value computed over all (including non-local) policies. Specifically, we prove that, if $|\mathcal{X}|, |\mathcal{U}|$ denote the size of state, and action spaces of individual agents, then for sufficiently small discount factor, the approximation error is given by $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\left[\sqrt{|\mathcal{X}|}+\sqrt{|\mathcal{U}|}\right]$. Moreover, in a special case where the reward and state transition functions are independent of the action distribution of the population, the error improves to $\mathcal{O}(e)$ where $e\triangleq \frac{1}{\sqrt{N}}\sqrt{|\mathcal{X}|}$. Finally, we also devise an algorithm to explicitly construct a local policy. With the help of our approximation results, we further establish that the constructed local policy is within $\mathcal{O}(\max\{e,\epsilon\})$ distance of the optimal policy, and the sample complexity to achieve such a local policy is $\mathcal{O}(\epsilon^{-3})$, for any $\epsilon>0$.
    Incremental Correction in Dynamic Systems Modelled with Neural Networks for Constraint Satisfaction. (arXiv:2209.03698v1 [math.OC])
    This study presents incremental correction methods for refining neural network parameters or control functions entering into a continuous-time dynamic system to achieve improved solution accuracy in satisfying the interim point constraints placed on the performance output variables. The proposed approach is to linearise the dynamics around the baseline values of its arguments, and then to solve for the corrective input required to transfer the perturbed trajectory to precisely known or desired values at specific time points, i.e., the interim points. Depending on the type of decision variables to adjust, parameter correction and control function correction methods are developed. These incremental correction methods can be utilised as a means to compensate for the prediction errors of pre-trained neural networks in real-time applications where high accuracy of the prediction of dynamical systems at prescribed time points is imperative. In this regard, the online update approach can be useful for enhancing overall targeting accuracy of finite-horizon control subject to point constraints using a neural policy. Numerical example demonstrates the effectiveness of the proposed approach in an application to a powered descent problem at Mars.
    Representation Learning for Appliance Recognition: A Comparison to Classical Machine Learning. (arXiv:2209.03759v1 [eess.SP])
    Non-intrusive load monitoring (NILM) aims at energy consumption and appliance state information retrieval from aggregated consumption measurements, with the help of signal processing and machine learning algorithms. Representation learning with deep neural networks is successfully applied to several related disciplines. The main advantage of representation learning lies in replacing an expert-driven, hand-crafted feature extraction with hierarchical learning from many representations in raw data format. In this paper, we show how the NILM processing-chain can be improved, reduced in complexity and alternatively designed with recent deep learning algorithms. On the basis of an event-based appliance recognition approach, we evaluate seven different classification models: a classical machine learning approach that is based on a hand-crafted feature extraction, three different deep neural network architectures for automated feature extraction on raw waveform data, as well as three baseline approaches for raw data processing. We evaluate all approaches on two large-scale energy consumption datasets with more than 50,000 events of 44 appliances. We show that with the use of deep learning, we are able to reach and surpass the performance of the state-of-the-art classical machine learning approach for appliance recognition with an F-Score of 0.75 and 0.86 compared to 0.69 and 0.87 of the classical approach.
    Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against Fact-Verification Systems. (arXiv:2209.03755v1 [cs.CR])
    Mis- and disinformation are now a substantial global threat to our security and safety. To cope with the scale of online misinformation, one viable solution is to automate the fact-checking of claims by retrieving and verifying against relevant evidence. While major recent advances have been achieved in pushing forward the automatic fact-verification, a comprehensive evaluation of the possible attack vectors against such systems is still lacking. Particularly, the automated fact-verification process might be vulnerable to the exact disinformation campaigns it is trying to combat. In this work, we assume an adversary that automatically tampers with the online evidence in order to disrupt the fact-checking model via camouflaging the relevant evidence, or planting a misleading one. We first propose an exploratory taxonomy that spans these two targets and the different threat model dimensions. Guided by this, we design and propose several potential attack methods. We show that it is possible to subtly modify claim-salient snippets in the evidence, in addition to generating diverse and claim-aligned evidence. As a result, we highly degrade the fact-checking performance under many different permutations of the taxonomy's dimensions. The attacks are also robust against post-hoc modifications of the claim. Our analysis further hints at potential limitations in models' inference when faced with contradicting evidence. We emphasize that these attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios of such models, and we conclude by discussing challenges and directions for future defenses.
    An Empirical Evaluation of Posterior Sampling for Constrained Reinforcement Learning. (arXiv:2209.03596v1 [cs.LG])
    We study a posterior sampling approach to efficient exploration in constrained reinforcement learning. Alternatively to existing algorithms, we propose two simple algorithms that are more efficient statistically, simpler to implement and computationally cheaper. The first algorithm is based on a linear formulation of CMDP, and the second algorithm leverages the saddle-point formulation of CMDP. Our empirical results demonstrate that, despite its simplicity, posterior sampling achieves state-of-the-art performance and, in some cases, significantly outperforms optimistic algorithms.
    Losing momentum in continuous-time stochastic optimisation. (arXiv:2209.03705v1 [math.OC])
    The training of deep neural networks and other modern machine learning models usually consists in solving non-convex optimisation problems that are high-dimensional and subject to large-scale data. Here, momentum-based stochastic optimisation algorithms have become especially popular in recent years. The stochasticity arises from data subsampling which reduces computational cost. Moreover, both, momentum and stochasticity are supposed to help the algorithm to overcome local minimisers and, hopefully, converge globally. Theoretically, this combination of stochasticity and momentum is badly understood. In this work, we propose and analyse a continuous-time model for stochastic gradient descent with momentum. This model is a piecewise-deterministic Markov process that represents the particle movement by an underdamped dynamical system and the data subsampling through a stochastic switching of the dynamical system. In our analysis, we investigate longtime limits, the subsampling-to-no-subsampling limit, and the momentum-to-no-momentum limit. We are particularly interested in the case of reducing the momentum over time: intuitively, the momentum helps to overcome local minimisers in the initial phase of the algorithm, but prohibits fast convergence to a global minimiser later. Under convexity assumptions, we show convergence of our dynamical system to the global minimiser when reducing momentum over time and let the subsampling rate go to infinity. We then propose a stable, symplectic discretisation scheme to construct an algorithm from our continuous-time dynamical system. In numerical experiments, we study our discretisation scheme in convex and non-convex test problems. Additionally, we train a convolutional neural network to solve the CIFAR-10 image classification problem. Here, our algorithm reaches competitive results compared to stochastic gradient descent with momentum.
    Analyzing the Effect of Sampling in GNNs on Individual Fairness. (arXiv:2209.03904v1 [cs.LG])
    Graph neural network (GNN) based methods have saturated the field of recommender systems. The gains of these systems have been significant, showing the advantages of interpreting data through a network structure. However, despite the noticeable benefits of using graph structures in recommendation tasks, this representational form has also bred new challenges which exacerbate the complexity of mitigating algorithmic bias. When GNNs are integrated into downstream tasks, such as recommendation, bias mitigation can become even more difficult. Furthermore, the intractability of applying existing methods of fairness promotion to large, real world datasets places even more serious constraints on mitigation attempts. Our work sets out to fill in this gap by taking an existing method for promoting individual fairness on graphs and extending it to support mini-batch, or sub-sample based, training of a GNN, thus laying the groundwork for applying this method to a downstream recommendation task. We evaluate two popular GNN methods: Graph Convolutional Network (GCN), which trains on the entire graph, and GraphSAGE, which uses probabilistic random walks to create subgraphs for mini-batch training, and assess the effects of sub-sampling on individual fairness. We implement an individual fairness notion called \textit{REDRESS}, proposed by Dong et al., which uses rank optimization to learn individual fair node, or item, embeddings. We empirically show on two real world datasets that GraphSAGE is able to achieve, not just, comparable accuracy, but also, improved fairness as compared with the GCN model. These finding have consequential ramifications to individual fairness promotion, GNNs, and in downstream form, recommender systems, showing that mini-batch training facilitate individual fairness promotion by allowing for local nuance to guide the process of fairness promotion in representation learning.
    Hierarchical Graph Pooling is an Effective Citywide Traffic Condition Prediction Model. (arXiv:2209.03629v1 [cs.LG])
    Accurate traffic conditions prediction provides a solid foundation for vehicle-environment coordination and traffic control tasks. Because of the complexity of road network data in spatial distribution and the diversity of deep learning methods, it becomes challenging to effectively define traffic data and adequately capture the complex spatial nonlinear features in the data. This paper applies two hierarchical graph pooling approaches to the traffic prediction task to reduce graph information redundancy. First, this paper verifies the effectiveness of hierarchical graph pooling methods in traffic prediction tasks. The hierarchical graph pooling methods are contrasted with the other baselines on predictive performance. Second, two mainstream hierarchical graph pooling methods, node clustering pooling and node drop pooling, are applied to analyze advantages and weaknesses in traffic prediction. Finally, for the mentioned graph neural networks, this paper compares the predictive effects of different graph network inputs on traffic prediction accuracy. The efficient ways of defining graph networks are analyzed and summarized.
    Application of image-to-image translation in improving pedestrian detection. (arXiv:2209.03625v1 [cs.CV])
    The lack of effective target regions makes it difficult to perform several visual functions in low intensity light, including pedestrian recognition, and image-to-image translation. In this situation, with the accumulation of high-quality information by the combined use of infrared and visible images it is possible to detect pedestrians even in low light. In this study we are going to use advanced deep learning models like pix2pixGAN and YOLOv7 on LLVIP dataset, containing visible-infrared image pairs for low light vision. This dataset contains 33672 images and most of the images were captured in dark scenes, tightly synchronized with time and location.
    Predict+Optimize for Packing and Covering LPs with Unknown Parameters in Constraints. (arXiv:2209.03668v1 [cs.AI])
    Predict+Optimize is a recently proposed framework which combines machine learning and constrained optimization, tackling optimization problems that contain parameters that are unknown at solving time. The goal is to predict the unknown parameters and use the estimates to solve for an estimated optimal solution to the optimization problem. However, all prior works have focused on the case where unknown parameters appear only in the optimization objective and not the constraints, for the simple reason that if the constraints were not known exactly, the estimated optimal solution might not even be feasible under the true parameters. The contributions of this paper are two-fold. First, we propose a novel and practically relevant framework for the Predict+Optimize setting, but with unknown parameters in both the objective and the constraints. We introduce the notion of a correction function, and an additional penalty term in the loss function, modelling practical scenarios where an estimated optimal solution can be modified into a feasible solution after the true parameters are revealed, but at an additional cost. Second, we propose a corresponding algorithmic approach for our framework, which handles all packing and covering linear programs. Our approach is inspired by the prior work of Mandi and Guns, though with crucial modifications and re-derivations for our very different setting. Experimentation demonstrates the superior empirical performance of our method over classical approaches.
    Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization. (arXiv:2002.11860v6 [math.OC] UPDATED)
    We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we obtain a stochastic estimator of the Frank-Wolfe gap that can be used as a stopping criterion. Depending on the setting, the proposed method matches or improves on the best computational guarantees for Stochastic Frank-Wolfe algorithms. Benchmarks on several datasets highlight different regimes in which the proposed method exhibits a faster empirical convergence than related methods. Finally, we provide an implementation of all considered methods in an open-source package.
    CGAN-ECT: Tomography Image Reconstruction from Electrical Capacitance Measurements Using CGANs. (arXiv:2209.03737v1 [eess.IV])
    Due to the rapid growth of Electrical Capacitance Tomography (ECT) applications in several industrial fields, there is a crucial need for developing high quality, yet fast, methodologies of image reconstruction from raw capacitance measurements. Deep learning, as an effective non-linear mapping tool for complicated functions, has been going viral in many fields including electrical tomography. In this paper, we propose a Conditional Generative Adversarial Network (CGAN) model for reconstructing ECT images from capacitance measurements. The initial image of the CGAN model is constructed from the capacitance measurement. To our knowledge, this is the first time to represent the capacitance measurements in an image form. We have created a new massive ECT dataset of 320K synthetic image measurements pairs for training, and testing the proposed model. The feasibility and generalization ability of the proposed CGAN-ECT model are evaluated using testing dataset, contaminated data and flow patterns that are not exposed to the model during the training phase. The evaluation results prove that the proposed CGAN-ECT model can efficiently create more accurate ECT images than traditional and other deep learning-based image reconstruction algorithms. CGAN-ECT achieved an average image correlation coefficient of more than 99.3% and an average relative image error about 0.07.
    Simpler is better: Multilevel Abstraction with Graph Convolutional Recurrent Neural Network Cells for Traffic Prediction. (arXiv:2209.03858v1 [cs.LG])
    In recent years, graph neural networks (GNNs) combined with variants of recurrent neural networks (RNNs) have reached state-of-the-art performance in spatiotemporal forecasting tasks. This is particularly the case for traffic forecasting, where GNN models use the graph structure of road networks to account for spatial correlation between links and nodes. Recent solutions are either based on complex graph operations or avoiding predefined graphs. This paper proposes a new sequence-to-sequence architecture to extract the spatiotemporal correlation at multiple levels of abstraction using GNN-RNN cells with sparse architecture to decrease training time compared to more complex designs. Encoding the same input sequence through multiple encoders, with an incremental increase in encoder layers, enables the network to learn general and detailed information through multilevel abstraction. We further present a new benchmark dataset of street-level segment traffic data from Montreal, Canada. Unlike highways, urban road segments are cyclic and characterized by complicated spatial dependencies. Experimental results on the METR-LA benchmark highway and our MSLTD street-level segment datasets demonstrate that our model improves performance by more than 7% for one-hour prediction compared to the baseline methods while reducing computing resource requirements by more than half compared to other competing methods.  ( 2 min )
    TAG: Learning Circuit Spatial Embedding From Layouts. (arXiv:2209.03465v1 [cs.AR])
    Analog and mixed-signal (AMS) circuit designs still rely on human design expertise. Machine learning has been assisting circuit design automation by replacing human experience with artificial intelligence. This paper presents TAG, a new paradigm of learning the circuit representation from layouts leveraging text, self-attention and graph. The embedding network model learns spatial information without manual labeling. We introduce text embedding and a self-attention mechanism to AMS circuit learning. Experimental results demonstrate the ability to predict layout distances between instances with industrial FinFET technology benchmarks. The effectiveness of the circuit representation is verified by showing the transferability to three other learning tasks with limited data in the case studies: layout matching prediction, wirelength estimation, and net parasitic capacitance prediction.
    Geolocation of Cultural Heritage using Multi-View Knowledge Graph Embedding. (arXiv:2209.03638v1 [cs.LG])
    Knowledge Graphs (KGs) have proven to be a reliable way of structuring data. They can provide a rich source of contextual information about cultural heritage collections. However, cultural heritage KGs are far from being complete. They are often missing important attributes such as geographical location, especially for sculptures and mobile or indoor entities such as paintings. In this paper, we first present a framework for ingesting knowledge about tangible cultural heritage entities from various data sources and their connected multi-hop knowledge into a geolocalized KG. Secondly, we propose a multi-view learning model for estimating the relative distance between a given pair of cultural heritage entities, based on the geographical as well as the knowledge connections of the entities.
    Impact of dataset size and long-term ECoG-based BCI usage on deep learning decoders performance. (arXiv:2209.03789v1 [eess.SP])
    In brain-computer interfaces (BCI) research, recording data is time-consuming and expensive, which limits access to big datasets. This may influence the BCI system performance as machine learning methods depend strongly on the training dataset size. Important questions arise: taking into account neuronal signal characteristics (e.g., non-stationarity), can we achieve higher decoding performance with more data to train decoders? What is the perspective for further improvement with time in the case of long-term BCI studies? In this study, we investigated the impact of long-term recordings on motor imagery decoding from two main perspectives: model requirements regarding dataset size and potential for patient adaptation. We evaluated the multilinear model and two deep learning (DL) models on a long-term BCI and Tetraplegia NCT02550522 clinical trial dataset containing 43 sessions of ECoG recordings performed with a tetraplegic patient. In the experiment, a participant executed 3D virtual hand translation using motor imagery patterns. We designed multiple computational experiments in which training datasets were increased or translated to investigate the relationship between models' performance and different factors influencing recordings. Our analysis showed that adding more data to the training dataset may not instantly increase performance for datasets already containing 40 minutes of the signal. DL decoders showed similar requirements regarding the dataset size compared to the multilinear model while demonstrating higher decoding performance. Moreover, high decoding performance was obtained with relatively small datasets recorded later in the experiment, suggesting motor imagery patterns improvement and patient adaptation. Finally, we proposed UMAP embeddings and local intrinsic dimensionality as a way to visualize the data and potentially evaluate data quality.
    A Survey on Large-Population Systems and Scalable Multi-Agent Reinforcement Learning. (arXiv:2209.03859v1 [cs.MA])
    The analysis and control of large-population systems is of great interest to diverse areas of research and engineering, ranging from epidemiology over robotic swarms to economics and finance. An increasingly popular and effective approach to realizing sequential decision-making in multi-agent systems is through multi-agent reinforcement learning, as it allows for an automatic and model-free analysis of highly complex systems. However, the key issue of scalability complicates the design of control and reinforcement learning algorithms particularly in systems with large populations of agents. While reinforcement learning has found resounding empirical success in many scenarios with few agents, problems with many agents quickly become intractable and necessitate special consideration. In this survey, we will shed light on current approaches to tractably understanding and analyzing large-population systems, both through multi-agent reinforcement learning and through adjacent areas of research such as mean-field games, collective intelligence, or complex network theory. These classically independent subject areas offer a variety of approaches to understanding or modeling large-population systems, which may be of great use for the formulation of tractable MARL algorithms in the future. Finally, we survey potential areas of application for large-scale control and identify fruitful future applications of learning algorithms in practical systems. We hope that our survey could provide insight and future directions to junior and senior researchers in theoretical and applied sciences alike.
    Convolutional Neural Network (CNN) to reduce construction loss in JPEG compression. (arXiv:2209.03475v1 [eess.IV])
    In recent decades, digital image processing has gained enormous popularity. Consequently, a number of data compression strategies have been put forth, with the goal of minimizing the amount of information required to represent images. Among them, JPEG compression is one of the most popular methods that has been widely applied in multimedia and digital applications. The periodic nature of DFT makes it impossible to meet the periodic condition of an image's opposing edges without producing severe artifacts, which lowers the image's perceptual visual quality. On the other hand, deep learning has recently achieved outstanding results for applications like speech recognition, image reduction, and natural language processing. Convolutional Neural Networks (CNN) have received more attention than most other types of deep neural networks. The use of convolution in feature extraction results in a less redundant feature map and a smaller dataset, both of which are crucial for image compression. In this work, an effective image compression method is purposed using autoencoders. The study's findings revealed a number of important trends that suggested better reconstruction along with good compression can be achieved using autoencoders.  ( 2 min )
    Lightweight Long-Range Generative Adversarial Networks. (arXiv:2209.03793v1 [cs.CV])
    In this paper, we introduce novel lightweight generative adversarial networks, which can effectively capture long-range dependencies in the image generation process, and produce high-quality results with a much simpler architecture. To achieve this, we first introduce a long-range module, allowing the network to dynamically adjust the number of focused sampling pixels and to also augment sampling locations. Thus, it can break the limitation of the fixed geometric structure of the convolution operator, and capture long-range dependencies in both spatial and channel-wise directions. Also, the proposed long-range module can highlight negative relations between pixels, working as a regularization to stabilize training. Furthermore, we propose a new generation strategy through which we introduce metadata into the image generation process to provide basic information about target images, which can stabilize and speed up the training process. Our novel long-range module only introduces few additional parameters and is easily inserted into existing models to capture long-range dependencies. Extensive experiments demonstrate the competitive performance of our method with a lightweight architecture.  ( 2 min )
    Physics-Guided Adversarial Machine Learning for Aircraft Systems Simulation. (arXiv:2209.03431v1 [cs.LG])
    In the context of aircraft system performance assessment, deep learning technologies allow to quickly infer models from experimental measurements, with less detailed system knowledge than usually required by physics-based modeling. However, this inexpensive model development also comes with new challenges regarding model trustworthiness. This work presents a novel approach, physics-guided adversarial machine learning (ML), that improves the confidence over the physics consistency of the model. The approach performs, first, a physics-guided adversarial testing phase to search for test inputs revealing behavioral system inconsistencies, while still falling within the range of foreseeable operational conditions. Then, it proceeds with physics-informed adversarial training to teach the model the system-related physics domain foreknowledge through iteratively reducing the unwanted output deviations on the previously-uncovered counterexamples. Empirical evaluation on two aircraft system performance models shows the effectiveness of our adversarial ML approach in exposing physical inconsistencies of both models and in improving their propensity to be consistent with physics domain knowledge.
    A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics. (arXiv:2201.02025v3 [cs.LG] UPDATED)
    A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures. The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a species, represents a reduced mechanism. The optimization goal is to minimize the reduced mechanism size given the error tolerance of a group of pre-selected benchmark quantities. The key idea of the DeePMR is to employ a deep neural network (DNN) to formulate the objective function in the optimization problem. In order to explore high dimensional Boolean space efficiently, an iterative DNN-assisted data sampling and DNN training procedure are implemented. The results show that DNN-assistance improves sampling efficiency significantly, selecting only $10^5$ samples out of $10^{34}$ possible samples for DNN to achieve sufficient accuracy. The results demonstrate the capability of the DNN to recognize key species and reasonably predict reduced mechanism performance. The well-trained DNN guarantees the optimal reduced mechanism by solving an inverse optimization problem. By comparing ignition delay times, laminar flame speeds, temperatures in PSRs, the resulting skeletal mechanism has fewer species (45 species) but the same level of accuracy as the skeletal mechanism (56 species) obtained by the Path Flux Analysis (PFA) method. In addition, the skeletal mechanism can be further reduced to 28 species if only considering atmospheric, near-stoichiometric conditions (equivalence ratio between 0.6 and 1.2). The DeePMR provides an innovative way to perform model reduction and demonstrates the great potential of data-driven methods in the combustion area.
    AST-GIN: Attribute-Augmented Spatial-Temporal Graph Informer Network for Electric Vehicle Charging Station Availability Forecasting. (arXiv:2209.03356v1 [cs.LG])
    Electric Vehicle (EV) charging demand and charging station availability forecasting is one of the challenges in the intelligent transportation system. With the accurate EV station situation prediction, suitable charging behaviors could be scheduled in advance to relieve range anxiety. Many existing deep learning methods are proposed to address this issue, however, due to the complex road network structure and comprehensive external factors, such as point of interests (POIs) and weather effects, many commonly used algorithms could just extract the historical usage information without considering comprehensive influence of external factors. To enhance the prediction accuracy and interpretability, the Attribute-Augmented Spatial-Temporal Graph Informer (AST-GIN) structure is proposed in this study by combining the Graph Convolutional Network (GCN) layer and the Informer layer to extract both external and internal spatial-temporal dependence of relevant transportation data. And the external factors are modeled as dynamic attributes by the attribute-augmented encoder for training. AST-GIN model is tested on the data collected in Dundee City and experimental results show the effectiveness of our model considering external factors influence over various horizon settings compared with other baselines.
    SmOOD: Smoothness-based Out-of-Distribution Detection Approach for Surrogate Neural Networks in Aircraft Design. (arXiv:2209.03438v1 [cs.LG])
    Aircraft industry is constantly striving for more efficient design optimization methods in terms of human efforts, computation time, and resource consumption. Hybrid surrogate optimization maintains high results quality while providing rapid design assessments when both the surrogate model and the switch mechanism for eventually transitioning to the HF model are calibrated properly. Feedforward neural networks (FNNs) can capture highly nonlinear input-output mappings, yielding efficient surrogates for aircraft performance factors. However, FNNs often fail to generalize over the out-of-distribution (OOD) samples, which hinders their adoption in critical aircraft design optimization. Through SmOOD, our smoothness-based out-of-distribution detection approach, we propose to codesign a model-dependent OOD indicator with the optimized FNN surrogate, to produce a trustworthy surrogate model with selective but credible predictions. Unlike conventional uncertainty-grounded methods, SmOOD exploits inherent smoothness properties of the HF simulations to effectively expose OODs through revealing their suspicious sensitivities, thereby avoiding over-confident uncertainty estimates on OOD samples. By using SmOOD, only high-risk OOD inputs are forwarded to the HF model for re-evaluation, leading to more accurate results at a low overhead cost. Three aircraft performance models are investigated. Results show that FNN-based surrogates outperform their Gaussian Process counterparts in terms of predictive performance. Moreover, SmOOD does cover averagely 85% of actual OODs on all the study cases. When SmOOD plus FNN surrogates are deployed in hybrid surrogate optimization settings, they result in a decrease error rate of 34.65% and a computational speed up rate of 58.36 times, respectively.
    A simple approach for quantizing neural networks. (arXiv:2209.03487v1 [cs.LG])
    In this short note, we propose a new method for quantizing the weights of a fully trained neural network. A simple deterministic pre-processing step allows us to quantize network layers via memoryless scalar quantization while preserving the network performance on given training data. On one hand, the computational complexity of this pre-processing slightly exceeds that of state-of-the-art algorithms in the literature. On the other hand, our approach does not require any hyper-parameter tuning and, in contrast to previous methods, allows a plain analysis. We provide rigorous theoretical guarantees in the case of quantizing single network layers and show that the relative error decays with the number of parameters in the network if the training data behaves well, e.g., if it is sampled from suitable random distributions. The developed method also readily allows the quantization of deep networks by consecutive application to single layers.
    Reward Delay Attacks on Deep Reinforcement Learning. (arXiv:2209.03540v1 [cs.LG])
    Most reinforcement learning algorithms implicitly assume strong synchrony. We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward. We evaluate the efficacy of the proposed attacks through a series of experiments. Our first observation is that reward-delay attacks are extremely effective when the goal is simply to minimize reward. Indeed, we find that even naive baseline reward-delay attacks are also highly successful in minimizing the reward. Targeted attacks, on the other hand, are more challenging, although we nevertheless demonstrate that the proposed approaches remain highly effective at achieving the attacker's targets. In addition, we introduce a second threat model that captures a minimal mitigation that ensures that rewards cannot be used out of sequence. We find that this mitigation remains insufficient to ensure robustness to attacks that delay, but preserve the order, of rewards.
    A Novel Semi-supervised Meta Learning Method for Subject-transfer Brain-computer Interface. (arXiv:2209.03785v1 [eess.SP])
    Brain-computer interface (BCI) provides a direct communication pathway between human brain and external devices. Before a new subject could use BCI, a calibration procedure is usually required. Because the inter- and intra-subject variances are so large that the models trained by the existing subjects perform poorly on new subjects. Therefore, effective subject-transfer and calibration method is essential. In this paper, we propose a semi-supervised meta learning (SSML) method for subject-transfer learning in BCIs. The proposed SSML learns a meta model with the existing subjects first, then fine-tunes the model in a semi-supervised learning manner, i.e. using few labeled and many unlabeled samples of target subject for calibration. It is significant for BCI applications where the labeled data are scarce or expensive while unlabeled data are readily available. To verify the SSML method, three different BCI paradigms are tested: 1) event-related potential detection; 2) emotion recognition; and 3) sleep staging. The SSML achieved significant improvements of over 15% on the first two paradigms and 4.9% on the third. The experimental results demonstrated the effectiveness and potential of the SSML method in BCI applications.
    Black-Box Audits for Group Distribution Shifts. (arXiv:2209.03620v1 [cs.LG])
    When a model informs decisions about people, distribution shifts can create undue disparities. However, it is hard for external entities to check for distribution shift, as the model and its training set are often proprietary. In this paper, we introduce and study a black-box auditing method to detect cases of distribution shift that lead to a performance disparity of the model across demographic groups. By extending techniques used in membership and property inference attacks -- which are designed to expose private information from learned models -- we demonstrate that an external auditor can gain the information needed to identify these distribution shifts solely by querying the model. Our experimental results on real-world datasets show that this approach is effective, achieving 80--100% AUC-ROC in detecting shifts involving the underrepresentation of a demographic group in the training set. Researchers and investigative journalists can use our tools to perform non-collaborative audits of proprietary models and expose cases of underrepresentation in the training datasets.
    Knowledge Based Template Machine Translation In Low-Resource Setting. (arXiv:2209.03554v1 [cs.CL])
    Incorporating tagging into neural machine translation (NMT) systems has shown promising results in helping translate rare words such as named entities (NE). However, translating NE in low-resource setting remains a challenge. In this work, we investigate the effect of using tags and NE hypernyms from knowledge graphs (KGs) in parallel corpus in different levels of resource conditions. We find the tag-and-copy mechanism (tag the NEs in the source sentence and copy them to the target sentence) improves translation in high-resource settings only. Introducing copying also results in polarizing effects in translating different parts-of-speech (POS). Interestingly, we find that copy accuracy for hypernyms is consistently higher than that of entities. As a way of avoiding "hard" copying and utilizing hypernym in bootstrapping rare entities, we introduced a "soft" tagging mechanism and found consistent improvement in high and low-resource settings.
    Too Fine or Too Coarse? The Goldilocks Composition of Data Complexity for Robust Left-Right Eye-Tracking Classifiers. (arXiv:2209.03761v1 [eess.SP])
    The differences in distributional patterns between benchmark data and real-world data have been one of the main challenges of using electroencephalogram (EEG) signals for eye-tracking (ET) classification. Therefore, increasing the robustness of machine learning models in predicting eye-tracking positions from EEG data is integral for both research and consumer use. Previously, we compared the performance of classifiers trained solely on finer-grain data to those trained solely on coarse-grain. Results indicated that despite the overall improvement in robustness, the performance of the fine-grain trained models decreased, compared to coarse-grain trained models, when the testing and training set contained the same distributional patterns \cite{vectorbased}. This paper aims to address this case by training models using datasets of mixed data complexity to determine the ideal distribution of fine- and coarse-grain data. We train machine learning models utilizing a mixed dataset composed of both fine- and coarse-grain data and then compare the accuracies to models trained using solely fine- or coarse-grain data. For our purposes, finer-grain data refers to data collected using more complex methods whereas coarser-grain data refers to data collected using more simple methods. We apply covariate distributional shifts to test for the susceptibility of each training set. Our results indicated that the optimal training dataset for EEG-ET classification is not composed of solely fine- or coarse-grain data, but rather a mix of the two, leaning towards finer-grain.
    Distilling Deep RL Models Into Interpretable Neuro-Fuzzy Systems. (arXiv:2209.03357v1 [cs.LG])
    Deep Reinforcement Learning uses a deep neural network to encode a policy, which achieves very good performance in a wide range of applications but is widely regarded as a black box model. A more interpretable alternative to deep networks is given by neuro-fuzzy controllers. Unfortunately, neuro-fuzzy controllers often need a large number of rules to solve relatively simple tasks, making them difficult to interpret. In this work, we present an algorithm to distill the policy from a deep Q-network into a compact neuro-fuzzy controller. This allows us to train compact neuro-fuzzy controllers through distillation to solve tasks that they are unable to solve directly, combining the flexibility of deep reinforcement learning and the interpretability of compact rule bases. We demonstrate the algorithm on three well-known environments from OpenAI Gym, where we nearly match the performance of a DQN agent using only 2 to 6 fuzzy rules.
    Aerial View Goal Localization with Reinforcement Learning. (arXiv:2209.03694v1 [cs.CV])
    With an increased amount and availability of unmanned aerial vehicles (UAVs) and other remote sensing devices (e.g. satellites), we have recently seen a vast increase in computer vision methods for aerial view data. One application of such technologies is within search-and-rescue (SAR), where the task is to localize and assist one or several people who are missing, for example after a natural disaster. In many cases the rough location may be known and a UAV can be deployed to explore a given, confined area to precisely localize the missing people. Due to time and battery constraints it is often critical that localization is performed as efficiently as possible. In this work, we approach this type of problem by abstracting it as an aerial view goal localization task in a framework that emulates a SAR-like setup without requiring access to actual UAVs. In this framework, an agent operates on top of an aerial image (proxy for a search area) and is tasked with localizing a goal that is described in terms of visual cues. To further mimic the situation on an actual UAV, the agent is not able to observe the search area in its entirety, not even at low resolution, and thus it has to operate solely based on partial glimpses when navigating towards the goal. To tackle this task, we propose AiRLoc, a reinforcement learning (RL)-based model that decouples exploration (searching for distant goals) and exploitation (localizing nearby goals). Extensive evaluations show that AiRLoc outperforms heuristic search methods as well as alternative learnable approaches. We also conduct a proof-of-concept study which indicates that the learnable methods outperform humans on average. Code has been made publicly available: https://github.com/aleksispi/airloc.
    A Framework for Evaluating Privacy-Utility Trade-off in Vertical Federated Learning. (arXiv:2209.03885v1 [cs.LG])
    Federated learning (FL) has emerged as a practical solution to tackle data silo issues without compromising user privacy. One of its variants, vertical federated learning (VFL), has recently gained increasing attention as the VFL matches the enterprises' demands of leveraging more valuable features to build better machine learning models while preserving user privacy. Current works in VFL concentrate on developing a specific protection or attack mechanism for a particular VFL algorithm. In this work, we propose an evaluation framework that formulates the privacy-utility evaluation problem. We then use this framework as a guide to comprehensively evaluate a broad range of protection mechanisms against most of the state-of-the-art privacy attacks for three widely-deployed VFL algorithms. These evaluations may help FL practitioners select appropriate protection mechanisms given specific requirements. Our evaluation results demonstrate that: the model inversion and most of the label inference attacks can be thwarted by existing protection mechanisms; the model completion (MC) attack is difficult to be prevented, which calls for more advanced MC-targeted protection mechanisms. Based on our evaluation results, we offer concrete advice on improving the privacy-preserving capability of VFL systems.
    Self-Supervised Multimodal Fusion Transformer for Passive Activity Recognition. (arXiv:2209.03765v1 [eess.SP])
    The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this paper, we explore new properties of the Transformer architecture for multimodal sensor fusion. We study different signal processing techniques to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). We first propose the Fusion Transformer, an attention-based model for multimodal and multi-sensor fusion. Experimental results show that our Fusion Transformer approach can achieve competitive results compared to a ResNet architecture but with much fewer resources. To further improve our model, we propose a simple and effective framework for multimodal and multi-sensor self-supervised learning (SSL). The self-supervised Fusion Transformer outperforms the baselines, achieving a F1-score of 95.9%. Finally, we show how this approach significantly outperforms the others when trained with as little as 1% (2 minutes) of labelled training data to 20% (40 minutes) of labelled training data.  ( 3 min )
    What and How of Machine Learning Transparency: Building Bespoke Explainability Tools with Interoperable Algorithmic Components. (arXiv:2209.03813v1 [cs.LG])
    Explainability techniques for data-driven predictive models based on artificial intelligence and machine learning algorithms allow us to better understand the operation of such systems and help to hold them accountable. New transparency approaches are developed at breakneck speed, enabling us to peek inside these black boxes and interpret their decisions. Many of these techniques are introduced as monolithic tools, giving the impression of one-size-fits-all and end-to-end algorithms with limited customisability. Nevertheless, such approaches are often composed of multiple interchangeable modules that need to be tuned to the problem at hand to produce meaningful explanations. This paper introduces a collection of hands-on training materials -- slides, video recordings and Jupyter Notebooks -- that provide guidance through the process of building and evaluating bespoke modular surrogate explainers for tabular data. These resources cover the three core building blocks of this technique: interpretable representation composition, data sampling and explanation generation.
    Improved Robust Algorithms for Learning with Discriminative Feature Feedback. (arXiv:2209.03753v1 [cs.LG])
    Discriminative Feature Feedback is a setting proposed by Dastupta et al. (2018), which provides a protocol for interactive learning based on feature explanations that are provided by a human teacher. The features distinguish between the labels of pairs of possibly similar instances. That work has shown that learning in this model can have considerable statistical and computational advantages over learning in standard label-based interactive learning models. In this work, we provide new robust interactive learning algorithms for the Discriminative Feature Feedback model, with mistake bounds that are significantly lower than those of previous robust algorithms for this setting. In the adversarial setting, we reduce the dependence on the number of protocol exceptions from quadratic to linear. In addition, we provide an algorithm for a slightly more restricted model, which obtains an even smaller mistake bound for large models with many exceptions. In the stochastic setting, we provide the first algorithm that converges to the exception rate with a polynomial sample complexity. Our algorithm and analysis for the stochastic setting involve a new construction that we call Feature Influence, which may be of wider applicability.  ( 2 min )
    Developing a multi-variate prediction model for the detection of COVID-19 from Crowd-sourced Respiratory Voice Data. (arXiv:2209.03727v1 [cs.SD])
    COVID-19 has affected more than 223 countries worldwide. There is a pressing need for non invasive, low costs and highly scalable solutions to detect COVID-19, especially in low-resource countries where PCR testing is not ubiquitously available. Our aim is to develop a deep learning model identifying COVID-19 using voice data recordings spontaneously provided by the general population (voice recordings and a short questionnaire) via their personal devices. The novelty of this work is in the development of a deep learning model for the identification of COVID-19 patients from voice recordings. Methods: We used the Cambridge University dataset consisting of 893 audio samples, crowd-sourced from 4352 participants that used a COVID-19 Sounds app. Voice features were extracted using a Mel-spectrogram analysis. Based on the voice data, we developed deep learning classification models to detect positive COVID-19 cases. These models included Long-Short Term Memory (LSTM) and Convolutional Neural Network (CNN). We compared their predictive power to baseline classification models, namely Logistic Regression and Support Vector Machine. Results: LSTM based on a Mel-frequency cepstral coefficients (MFCC) features achieved the highest accuracy (89%,) with a sensitivity and specificity of respectively 89% and 89%, The results achieved with the proposed model suggest a significant improvement in the prediction accuracy of COVID-19 diagnosis compared to the results obtained in the state of the art. Conclusion: Deep learning can detect subtle changes in the voice of COVID-19 patients with promising results. As an addition to the current testing techniques this model may aid health professionals in fast diagnosis and tracing of COVID-19 cases using simple voice analysis
    Learning-based and unrolled motion-compensated reconstruction for cardiac MR CINE imaging. (arXiv:2209.03671v1 [eess.IV])
    Motion-compensated MR reconstruction (MCMR) is a powerful concept with considerable potential, consisting of two coupled sub-problems: Motion estimation, assuming a known image, and image reconstruction, assuming known motion. In this work, we propose a learning-based self-supervised framework for MCMR, to efficiently deal with non-rigid motion corruption in cardiac MR imaging. Contrary to conventional MCMR methods in which the motion is estimated prior to reconstruction and remains unchanged during the iterative optimization process, we introduce a dynamic motion estimation process and embed it into the unrolled optimization. We establish a cardiac motion estimation network that leverages temporal information via a group-wise registration approach, and carry out a joint optimization between the motion estimation and reconstruction. Experiments on 40 acquired 2D cardiac MR CINE datasets demonstrate that the proposed unrolled MCMR framework can reconstruct high quality MR images at high acceleration rates where other state-of-the-art methods fail. We also show that the joint optimization mechanism is mutually beneficial for both sub-tasks, i.e., motion estimation and image reconstruction, especially when the MR image is highly undersampled.  ( 2 min )
    SE(3)-DiffusionFields: Learning cost functions for joint grasp and motion optimization through diffusion. (arXiv:2209.03855v1 [cs.RO])
    Multi-objective high-dimensional motion optimization problems are ubiquitous in robotics and highly benefit from informative gradients. To this end, we require all cost functions to be differentiable. We propose learning task-space, data-driven cost functions as diffusion models. Diffusion models represent expressive multimodal distributions and exhibit proper gradients over the entire space. We exploit these properties for motion optimization by integrating the learned cost functions with other potentially learned or hand-tuned costs in a single objective function and optimize all of them jointly by gradient descent. We showcase the benefits of joint optimization in a set of complex grasp and motion planning problems and compare against hierarchical approaches that decouple grasp selection from motion optimization.  ( 2 min )
    Towards Multidimensional Textural Perception and Classification Through Whisker. (arXiv:2209.03750v1 [eess.SP])
    Texture-based studies and designs have been in focus recently. Whisker-based multidimensional surface texture data is missing in the literature. This data is critical for robotics and machine perception algorithms in the classification and regression of textural surfaces. In this study, we present a novel sensor design to acquire multidimensional texture information. The surface texture's roughness and hardness were measured experimentally using sweeping and dabbing. Three machine learning models (SVM, RF, and MLP) showed excellent classification accuracy for the roughness and hardness of surface textures. We show that the combination of pressure and accelerometer data, collected from a standard machined specimen using the whisker sensor, improves classification accuracy. Further, we experimentally validate that the sensor can classify texture with roughness depths as low as $2.5\mu m$ at an accuracy of $90\%$ or more and segregate materials based on their roughness and hardness. We present a novel metric to consider while designing a whisker sensor to guarantee the quality of texture data acquisition beforehand. The machine learning model performance was validated against the data collected from the laser sensor from the same set of surface textures. As part of our work, we are releasing two-dimensional texture data: roughness and hardness to the research community.
    Tag-Aware Document Representation for Research Paper Recommendation. (arXiv:2209.03660v1 [cs.IR])
    Finding online research papers relevant to one's interests is very challenging due to the increasing number of publications. Therefore, personalized research paper recommendation has become a significant and timely research topic. Collaborative filtering is a successful recommendation approach, which exploits the ratings given to items by users as a source of information for learning to make accurate recommendations. However, the ratings are often very sparse as in the research paper domain, due to the huge number of publications growing every year. Therefore, more attention has been drawn to hybrid methods that consider both ratings and content information. Nevertheless, most of the hybrid recommendation approaches that are based on text embedding have utilized bag-of-words techniques, which ignore word order and semantic meaning. In this paper, we propose a hybrid approach that leverages deep semantic representation of research papers based on social tags assigned by users. The experimental evaluation is performed on CiteULike, a real and publicly available dataset. The obtained findings show that the proposed model is effective in recommending research papers even when the rating data is very sparse.
    Securing the Spike: On the Transferabilty and Security of Spiking Neural Networks to Adversarial Examples. (arXiv:2209.03358v1 [cs.NE])
    Spiking neural networks (SNNs) have attracted much attention for their high energy efficiency and for recent advances in their classification performance. However, unlike traditional deep learning approaches, the analysis and study of the robustness of SNNs to adversarial examples remains relatively underdeveloped. In this work we advance the field of adversarial machine learning through experimentation and analyses of three important SNN security attributes. First, we show that successful white-box adversarial attacks on SNNs are highly dependent on the underlying surrogate gradient technique. Second, we analyze the transferability of adversarial examples generated by SNNs and other state-of-the-art architectures like Vision Transformers and Big Transfer CNNs. We demonstrate that SNNs are not often deceived by adversarial examples generated by Vision Transformers and certain types of CNNs. Lastly, we develop a novel white-box attack that generates adversarial examples capable of fooling both SNN models and non-SNN models simultaneously. Our experiments and analyses are broad and rigorous covering two datasets (CIFAR-10 and CIFAR-100), five different white-box attacks and twelve different classifier models.
    Kernel-Segregated Transpose Convolution Operation. (arXiv:2209.03704v1 [cs.LG])
    Transpose convolution has shown prominence in many deep learning applications. However, transpose convolution layers are computationally intensive due to the increased feature map size due to adding zeros after each element in each row and column. Thus, convolution operation on the expanded input feature map leads to poor utilization of hardware resources. The main reason for unnecessary multiplication operations is zeros at predefined positions in the input feature map. We propose an algorithmic-level optimization technique for the effective transpose convolution implementation to solve these problems. Based on kernel activations, we segregated the original kernel into four sub-kernels. This scheme could reduce memory requirements and unnecessary multiplications. Our proposed method was $3.09 (3.02) \times$ faster computation using the Titan X GPU (Intel Dual Core CPU) with a flower dataset from the Kaggle website. Furthermore, the proposed optimization method can be generalized to existing devices without additional hardware requirements. A simple deep learning model containing one transpose convolution layer was used to evaluate the optimization method. It showed $2.2 \times$ faster training using the MNIST dataset with an Intel Dual-core CPU than the conventional implementation.
    CAP: instance complexity-aware network pruning. (arXiv:2209.03534v1 [cs.LG])
    Existing differentiable channel pruning methods often attach scaling factors or masks behind channels to prune filters with less importance, and assume uniform contribution of input samples to filter importance. Specifically, the effects of instance complexity on pruning performance are not yet fully investigated. In this paper, we propose a simple yet effective differentiable network pruning method CAP based on instance complexity-aware filter importance scores. We define instance complexity related weight for each sample by giving higher weights to hard samples, and measure the weighted sum of sample-specific soft masks to model non-uniform contribution of different inputs, which encourages hard samples to dominate the pruning process and the model performance to be well preserved. In addition, we introduce a new regularizer to encourage polarization of the masks, such that a sweet spot can be easily found to identify the filters to be pruned. Performance evaluations on various network architectures and datasets demonstrate CAP has advantages over the state-of-the-arts in pruning large networks. For instance, CAP improves the accuracy of ResNet56 on CIFAR-10 dataset by 0.33% aftering removing 65.64% FLOPs, and prunes 87.75% FLOPs of ResNet50 on ImageNet dataset with only 0.89% Top-1 accuracy loss.  ( 2 min )
    Deep Learning-Based Automatic Diagnosis System for Developmental Dysplasia of the Hip. (arXiv:2209.03440v1 [eess.IV])
    As the first-line diagnostic imaging modality, radiography plays an essential role in the early detection of developmental dysplasia of the hip (DDH). Clinically, the diagnosis of DDH relies on manual measurements and subjective evaluation of different anatomical features from pelvic radiographs. This process is inefficient and error-prone and requires years of clinical experience. In this study, we propose a deep learning-based system that automatically detects 14 keypoints from a radiograph, measures three anatomical angles (center-edge, T\"onnis, and Sharp angles), and classifies DDH hips as grades I-IV based on the Crowe criteria. Moreover, a novel data-driven scoring system is proposed to quantitatively integrate the information from the three angles for DDH diagnosis. The proposed keypoint detection model achieved a mean (95% confidence interval [CI]) average precision of 0.807 (0.804-0.810). The mean (95% CI) intraclass correlation coefficients between the center-edge, Tonnis, and Sharp angles measured by the proposed model and the ground-truth were 0.957 (0.952-0.962), 0.947 (0.941-0.953), and 0.953 (0.947-0.960), respectively, which were significantly higher than those of experienced orthopedic surgeons (p<0.0001). In addition, the mean (95% CI) test diagnostic agreement (Cohen's kappa) obtained using the proposed scoring system was 0.84 (0.83-0.85), which was significantly higher than those obtained from diagnostic criteria for individual angle (0.76 [0.75-0.77]) and orthopedists (0.71 [0.63-0.79]). To the best of our knowledge, this is the first study for objective DDH diagnosis by leveraging deep learning keypoint detection and integrating different anatomical measurements, which can provide reliable and explainable support for clinical decision-making.  ( 3 min )
    Learned Image Compression with Generalized Octave Convolution and Cross-Resolution Parameter Estimation. (arXiv:2209.03353v1 [eess.IV])
    The application of the context-adaptive entropy model significantly improves the rate-distortion (R-D) performance, in which hyperpriors and autoregressive models are jointly utilized to effectively capture the spatial redundancy of the latent representations. However, the latent representations still contain some spatial correlations. In addition, these methods based on the context-adaptive entropy model cannot be accelerated in the decoding process by parallel computing devices, e.g. FPGA or GPU. To alleviate these limitations, we propose a learned multi-resolution image compression framework, which exploits the recently developed octave convolutions to factorize the latent representations into the high-resolution (HR) and low-resolution (LR) parts, similar to wavelet transform, which further improves the R-D performance. To speed up the decoding, our scheme does not use context-adaptive entropy model. Instead, we exploit an additional hyper layer including hyper encoder and hyper decoder to further remove the spatial redundancy of the latent representation. Moreover, the cross-resolution parameter estimation (CRPE) is introduced into the proposed framework to enhance the flow of information and further improve the rate-distortion performance. An additional information-fidelity loss is proposed to the total loss function to adjust the contribution of the LR part to the final bit stream. Experimental results show that our method separately reduces the decoding time by approximately 73.35 % and 93.44 % compared with that of state-of-the-art learned image compression methods, and the R-D performance is still better than H.266/VVC(4:2:0) and some learning-based methods on both PSNR and MS-SSIM metrics across a wide bit rates.  ( 3 min )
    A Survey of Neural Trees. (arXiv:2209.03415v1 [cs.LG])
    Neural networks (NNs) and decision trees (DTs) are both popular models of machine learning, yet coming with mutually exclusive advantages and limitations. To bring the best of the two worlds, a variety of approaches are proposed to integrate NNs and DTs explicitly or implicitly. In this survey, these approaches are organized in a school which we term as neural trees (NTs). This survey aims to present a comprehensive review of NTs and attempts to identify how they enhance the model interpretability. We first propose a thorough taxonomy of NTs that expresses the gradual integration and co-evolution of NNs and DTs. Afterward, we analyze NTs in terms of their interpretability and performance, and suggest possible solutions to the remaining challenges. Finally, this survey concludes with a discussion about other considerations like conditional computation and promising directions towards this field. A list of papers reviewed in this survey, along with their corresponding codes, is available at: https://github.com/zju-vipa/awesome-neural-trees  ( 2 min )
    The (Un)Scalability of Heuristic Approximators for NP-Hard Search Problems. (arXiv:2209.03393v1 [cs.AI])
    The A* algorithm is commonly used to solve NP-hard combinatorial optimization problems. When provided with an accurate heuristic function, A* can solve such problems in time complexity that is polynomial in the solution depth. This fact implies that accurate heuristic approximation for many such problems is also NP-hard. In this context, we examine a line of recent publications that propose the use of deep neural networks for heuristic approximation. We assert that these works suffer from inherent scalability limitations since -- under the assumption that P$\ne$NP -- such approaches result in either (a) network sizes that scale exponentially in the instance sizes or (b) heuristic approximation accuracy that scales inversely with the instance sizes. Our claim is supported by experimental results for three representative NP-hard search problems that show that fitting deep neural networks accurately to heuristic functions necessitates network sizes that scale exponentially with the instance size.  ( 2 min )
    Blessing of Class Diversity in Pre-training. (arXiv:2209.03447v1 [cs.LG])
    This paper presents a new statistical analysis aiming to explain the recent superior achievements of the pre-training techniques in natural language processing (NLP). We prove that when the classes of the pre-training task (e.g., different words in the masked language model task) are sufficiently diverse, in the sense that the least singular value of the last linear layer in pre-training (denoted as $\tilde{\nu}$) is large, then pre-training can significantly improve the sample efficiency of downstream tasks. Specially, we show the transfer learning excess risk enjoys an $O\left(\frac{1}{\tilde{\nu} \sqrt{n}}\right)$ rate, in contrast to the $O\left(\frac{1}{\sqrt{m}}\right)$ rate in the standard supervised learning. Here, $n$ is the number of pre-training data and $m$ is the number of data in the downstream task, and typically $n \gg m$. Our proof relies on a vector-form Rademacher complexity chain rule for disassembling composite function classes and a modified self-concordance condition. These techniques can be of independent interest.  ( 2 min )
    AILAB-Udine@SMM4H 22: Limits of Transformers and BERT Ensembles. (arXiv:2209.03452v1 [cs.CL])
    This paper describes the models developed by the AILAB-Udine team for the SMM4H 22 Shared Task. We explored the limits of Transformer based models on text classification, entity extraction and entity normalization, tackling Tasks 1, 2, 5, 6 and 10. The main take-aways we got from participating in different tasks are: the overwhelming positive effects of combining different architectures when using ensemble learning, and the great potential of generative models for term normalization.  ( 2 min )
    Bispectral Neural Networks. (arXiv:2209.03416v1 [cs.LG])
    We present a novel machine learning architecture, Bispectral Neural Networks (BNNs), for learning representations of data that are invariant to the actions of groups on the space over which a signal is defined. The model incorporates the ansatz of the bispectrum, an analytically defined group invariant that is complete--that is, it preserves all signal structure while removing only the variation due to group actions. Here, we demonstrate that BNNs are able to discover arbitrary commutative group structure in data, with the trained models learning the irreducible representations of the groups, which allows for the recovery of the group Cayley tables. Remarkably, trained networks learn to approximate bispectra on these groups, and thus possess the robustness, completeness, and generality of the analytical object.  ( 2 min )
    CLaCLab at SocialDisNER: Using Medical Gazetteers for Named-Entity Recognition of Disease Mentions in Spanish Tweets. (arXiv:2209.03528v1 [cs.CL])
    This paper summarizes the CLaC submission for SMM4H 2022 Task 10 which concerns the recognition of diseases mentioned in Spanish tweets. Before classifying each token, we encode each token with a transformer encoder using features from Multilingual RoBERTa Large, UMLS gazetteer, and DISTEMIST gazetteer, among others. We obtain a strict F1 score of 0.869, with competition mean of 0.675, standard deviation of 0.245, and median of 0.761.  ( 2 min )
    Generative Adversarial Super-Resolution at the Edge with Knowledge Distillation. (arXiv:2209.03355v1 [eess.IV])
    Single-Image Super-Resolution can support robotic tasks in environments where a reliable visual stream is required to monitor the mission, handle teleoperation or study relevant visual details. In this work, we propose an efficient Generative Adversarial Network model for real-time Super-Resolution. We adopt a tailored architecture of the original SRGAN and model quantization to boost the execution on CPU and Edge TPU devices, achieving up to 200 fps inference. We further optimize our model by distilling its knowledge to a smaller version of the network and obtain remarkable improvements compared to the standard training approach. Our experiments show that our fast and lightweight model preserves considerably satisfying image quality compared to heavier state-of-the-art models. Finally, we conduct experiments on image transmission with bandwidth degradation to highlight the advantages of the proposed system for mobile robotic applications.  ( 2 min )
    Causal discovery for time series with latent confounders. (arXiv:2209.03427v1 [stat.ML])
    Reconstructing the causal relationships behind the phenomena we observe is a fundamental challenge in all areas of science. Discovering causal relationships through experiments is often infeasible, unethical, or expensive in complex systems. However, increases in computational power allow us to process the ever-growing amount of data that modern science generates, leading to an emerging interest in the causal discovery problem from observational data. This work evaluates the LPCMCI algorithm, which aims to find generators compatible with a multi-dimensional, highly autocorrelated time series while some variables are unobserved. We find that LPCMCI performs much better than a random algorithm mimicking not knowing anything but is still far from optimal detection. Furthermore, LPCMCI performs best on auto-dependencies, then contemporaneous dependencies, and struggles most with lagged dependencies. The source code of this project is available online.  ( 2 min )
    Machine Learning Sensors for Diagnosis of COVID-19 Disease Using Routine Blood Values for Internet of Things Application. (arXiv:2209.03522v1 [cs.LG])
    Healthcare digitalization needs effective methods of human sensorics, when various parameters of the human body are instantly monitored in everyday life and connected to the Internet of Things (IoT). In particular, Machine Learning (ML) sensors for the prompt diagnosis of COVID-19 is an important case for IoT application in healthcare and Ambient Assistance Living (AAL). Determining the infected status of COVID-19 with various diagnostic tests and imaging results is costly and time-consuming. The aim of this study is to provide a fast, reliable and economical alternative tool for the diagnosis of COVID-19 based on the Routine Blood Values (RBV) values measured at admission. The dataset of the study consists of a total of 5296 patients with the same number of negative and positive COVID-19 test results and 51 routine blood values. In this study, 13 popular classifier machine learning models and LogNNet neural network model were exanimated. The most successful classifier model in terms of time and accuracy in the detection of the disease was the Histogram-based Gradient Boosting (HGB). The HGB classifier identified the 11 most important features (LDL, Cholesterol, HDL-C, MCHC, Triglyceride, Amylase, UA, LDH, CK-MB, ALP and MCH) to detect the disease with 100% accuracy, learning time 6.39 sec. In addition, the importance of single, double and triple combinations of these features in the diagnosis of the disease was discussed. We propose to use these 11 traits and their combinations as important biomarkers for ML sensors in diagnosis of the disease, supporting edge computing on Arduino and cloud IoT service.  ( 3 min )
    A Greedy Algorithm for Building Compact Binary Activated Neural Networks. (arXiv:2209.03450v1 [cs.LG])
    We study binary activated neural networks in the context of regression tasks, provide guarantees on the expressiveness of these particular networks and propose a greedy algorithm for building such networks. Aiming for predictors having small resources needs, the greedy approach does not need to fix in advance an architecture for the network: this one is built one layer at a time, one neuron at a time, leading to predictors that aren't needlessly wide and deep for a given task. Similarly to boosting algorithms, our approach guarantees a training loss reduction every time a neuron is added to a layer. This greatly differs from most binary activated neural networks training schemes that rely on stochastic gradient descent (circumventing the 0-almost-everywhere derivative problem of the binary activation function by surrogates such as the straight through estimator or continuous binarization). We show that our method provides compact and sparse predictors while obtaining similar performances to state-of-the-art methods for training binary activated networks.  ( 2 min )
    Beyond Random Split for Assessing Statistical Model Performance. (arXiv:2209.03346v1 [cs.LG])
    Even though a train/test split of the dataset randomly performed is a common practice, could not always be the best approach for estimating performance generalization under some scenarios. The fact is that the usual machine learning methodology can sometimes overestimate the generalization error when a dataset is not representative or when rare and elusive examples are a fundamental aspect of the detection problem. In the present work, we analyze strategies based on the predictors' variability to split in training and testing sets. Such strategies aim at guaranteeing the inclusion of rare or unusual examples with a minimal loss of the population's representativeness and provide a more accurate estimation about the generalization error when the dataset is not representative. Two baseline classifiers based on decision trees were used for testing the four splitting strategies considered. Both classifiers were applied on CTU19 a low-representative dataset for a network security detection problem. Preliminary results showed the importance of applying the three alternative strategies to the Monte Carlo splitting strategy in order to get a more accurate error estimation on different but feasible scenarios.  ( 2 min )
    Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions. (arXiv:2209.03430v1 [cs.LG])
    Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining two key principles of modality heterogeneity and interconnections that have driven subsequent innovations, and propose a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.  ( 3 min )
    Higher-order Clustering and Pooling for Graph Neural Networks. (arXiv:2209.03473v1 [cs.LG])
    Graph Neural Networks achieve state-of-the-art performance on a plethora of graph classification tasks, especially due to pooling operators, which aggregate learned node embeddings hierarchically into a final graph representation. However, they are not only questioned by recent work showing on par performance with random pooling, but also ignore completely higher-order connectivity patterns. To tackle this issue, we propose HoscPool, a clustering-based graph pooling operator that captures higher-order information hierarchically, leading to richer graph representations. In fact, we learn a probabilistic cluster assignment matrix end-to-end by minimising relaxed formulations of motif spectral clustering in our objective function, and we then extend it to a pooling operator. We evaluate HoscPool on graph classification tasks and its clustering component on graphs with ground-truth community structure, achieving best performance. Lastly, we provide a deep empirical analysis of pooling operators' inner functioning.  ( 2 min )
    A Survey on Automated Diagnosis of Alzheimer's Disease Using Optical Coherence Tomography and Angiography. (arXiv:2209.03354v1 [eess.IV])
    Retinal optical coherence tomography (OCT) and optical coherence tomography angiography (OCTA) are promising tools for the (early) diagnosis of Alzheimer's disease (AD). These non-invasive imaging techniques are cost-effective and more accessible than alternative neuroimaging tools. However, interpreting and classifying multi-slice scans produced by OCT devices is time-consuming and challenging even for trained practitioners. There are surveys on machine learning and deep learning approaches concerning the automated analysis of OCT scans for various diseases such as glaucoma. However, the current literature lacks an extensive survey on the diagnosis of Alzheimer's disease or cognitive impairment using OCT or OCTA. This has motivated us to do a comprehensive survey aimed at machine/deep learning scientists or practitioners who require an introduction to the problem. The paper contains 1) an introduction to the medical background of Alzheimer's Disease and Cognitive Impairment and their diagnosis using OCT and OCTA imaging modalities, 2) a review of various technical proposals for the problem and the sub-problems from an automated analysis perspective, 3) a systematic review of the recent deep learning studies and available OCT/OCTA datasets directly aimed at the diagnosis of Alzheimer's Disease and Cognitive Impairment. For the latter, we used Publish or Perish Software to search for the relevant studies from various sources such as Scopus, PubMed, and Web of Science. We followed the PRISMA approach to screen an initial pool of 3073 references and determined ten relevant studies (N=10, out of 3073) that directly targeted AD diagnosis. We identified the lack of open OCT/OCTA datasets (about Alzheimer's disease) as the main issue that is impeding the progress in the field.  ( 3 min )
    A hybrid Bayesian network for medical device risk assessment and management. (arXiv:2209.03352v1 [cs.LG])
    ISO 14971 is the primary standard used for medical device risk management. While it specifies the requirements for medical device risk management, it does not specify a particular method for performing risk management. Hence, medical device manufacturers are free to develop or use any appropriate methods for managing the risk of medical devices. The most commonly used methods, such as Fault Tree Analysis (FTA), are unable to provide a reasonable basis for computing risk estimates when there are limited or no historical data available or where there is second-order uncertainty about the data. In this paper, we present a novel method for medical device risk management using hybrid Bayesian networks (BNs) that resolves the limitations of classical methods such as FTA and incorporates relevant factors affecting the risk of medical devices. The proposed BN method is generic but can be instantiated on a system-by-system basis, and we apply it to a Defibrillator device to demonstrate the process involved for medical device risk management during production and post-production. The example is validated against real-world data.  ( 2 min )
    Peer to Peer Learning Platform Optimized With Machine Learning. (arXiv:2209.03489v1 [cs.CY])
    HELM Learning (Helping Everyone Learn More) is the first online peer-to-peer learning platform which allows students (typically middle-to-high school students) to teach classes and students (typically elementary-to-middle school students) to learn from classes for free. This method of class structure (peer-to-peer learning) has been proven effective for learning, as it promotes teamwork and collaboration, and enables active learning. HELM is a unique platform as it provides an easy process for students to create, teach and learn topics in a structured, peer-to-peer environment. Since HELM was created in April 2020, it has gotten over 4000 student sign ups and 80 teachers, in 4 continents around the world. HELM has grown from a simple website-and-Google-Form platform to having a backend system coded with Python, SQL, JavaScript and HTML, hosted on an AWS service. This not only makes it easier for students to sign up (as the students' information is saved in an SQL database, meaning they can sign up for classes without having to put in their information again, as well as getting automated emails about their classes), but also makes it easier for teachers to teach (as supplemental processes such as creating Zoom links, class recording folders, sending emails to students, etc. are done automatically). In addition, HELM has a recommendation machine learning algorithm which suggests classes and subjects students would enjoy taking, based on the previous classes a student has taken. This has created an easier experience for students to sign up for classes they are interested in.  ( 3 min )
    Implicit Full Waveform Inversion with Deep Neural Representation. (arXiv:2209.03525v1 [physics.geo-ph])
    Full waveform inversion (FWI) commonly stands for the state-of-the-art approach for imaging subsurface structures and physical parameters, however, its implementation usually faces great challenges, such as building a good initial model to escape from local minima, and evaluating the uncertainty of inversion results. In this paper, we propose the implicit full waveform inversion (IFWI) algorithm using continuously and implicitly defined deep neural representations. Compared to FWI, which is sensitive to the initial model, IFWI benefits from the increased degrees of freedom with deep learning optimization, thus allowing to start from a random initialization, which greatly reduces the risk of non-uniqueness and being trapped in local minima. Both theoretical and experimental analyses indicates that, given a random initial model, IFWI is able to converge to the global minimum and produce a high-resolution image of subsurface with fine structures. In addition, uncertainty analysis of IFWI can be easily performed by approximating Bayesian inference with various deep learning approaches, which is analyzed in this paper by adding dropout neurons. Furthermore, IFWI has a certain degree of robustness and strong generalization ability that are exemplified in the experiments of various 2D geological models. With proper setup, IFWI can also be well suited for multi-scale joint geophysical inversion.  ( 2 min )
  • Open

    Quantum Sparse Coding. (arXiv:2209.03788v1 [quant-ph])
    The ultimate goal of any sparse coding method is to accurately recover from a few noisy linear measurements, an unknown sparse vector. Unfortunately, this estimation problem is NP-hard in general, and it is therefore always approached with an approximation method, such as lasso or orthogonal matching pursuit, thus trading off accuracy for less computational complexity. In this paper, we develop a quantum-inspired algorithm for sparse coding, with the premise that the emergence of quantum computers and Ising machines can potentially lead to more accurate estimations compared to classical approximation methods. To this end, we formulate the most general sparse coding problem as a quadratic unconstrained binary optimization (QUBO) task, which can be efficiently minimized using quantum technology. To derive at a QUBO model that is also efficient in terms of the number of spins (space complexity), we separate our analysis into three different scenarios. These are defined by the number of bits required to express the underlying sparse vector: binary, 2-bit, and a general fixed-point representation. We conduct numerical experiments with simulated data on LightSolver's quantum-inspired digital platform to verify the correctness of our QUBO formulation and to demonstrate its advantage over baseline methods.  ( 2 min )
    Improved Robust Algorithms for Learning with Discriminative Feature Feedback. (arXiv:2209.03753v1 [cs.LG])
    Discriminative Feature Feedback is a setting proposed by Dastupta et al. (2018), which provides a protocol for interactive learning based on feature explanations that are provided by a human teacher. The features distinguish between the labels of pairs of possibly similar instances. That work has shown that learning in this model can have considerable statistical and computational advantages over learning in standard label-based interactive learning models. In this work, we provide new robust interactive learning algorithms for the Discriminative Feature Feedback model, with mistake bounds that are significantly lower than those of previous robust algorithms for this setting. In the adversarial setting, we reduce the dependence on the number of protocol exceptions from quadratic to linear. In addition, we provide an algorithm for a slightly more restricted model, which obtains an even smaller mistake bound for large models with many exceptions. In the stochastic setting, we provide the first algorithm that converges to the exception rate with a polynomial sample complexity. Our algorithm and analysis for the stochastic setting involve a new construction that we call Feature Influence, which may be of wider applicability.
    Learning polytopes with fixed facet directions. (arXiv:2201.03419v3 [math.MG] UPDATED)
    We consider the task of reconstructing polytopes with fixed facet directions from finitely many support function evaluations. We show that for a fixed simplicial normal fan the least-squares estimate is given by a convex quadratic program. We study the geometry of the solution set and give a combinatorial characterization for the uniqueness of the reconstruction in this case. We provide an algorithm that, under mild assumptions, converges to the unknown input shape as the number of noisy support function evaluations increases. We also discuss limitations of our results if the restriction on the normal fan is removed.
    Sparsity in long-time control of neural ODEs. (arXiv:2102.13566v3 [cs.LG] UPDATED)
    We consider the neural ODE and optimal control perspective of supervised learning, with $\ell^1$-control penalties, where rather than only minimizing a final cost (the \emph{empirical risk}) for the state, we integrate this cost over the entire time horizon. We prove that any optimal control (for this cost) vanishes beyond some positive stopping time. When seen in the discrete-time context, this result entails an \emph{ordered} sparsity pattern for the parameters of the associated residual neural network: ordered in the sense that these parameters are all $0$ beyond a certain layer. Furthermore, we provide a polynomial stability estimate for the empirical risk with respect to the time horizon. This can be seen as a \emph{turnpike property}, for nonsmooth dynamics and functionals with $\ell^1$-penalties, and without any smallness assumptions on the data, both of which are new in the literature.
    Bayesian regularization of empirical MDPs. (arXiv:2208.02362v2 [cs.LG] UPDATED)
    In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.
    W-Transformers : A Wavelet-based Transformer Framework for Univariate Time Series Forecasting. (arXiv:2209.03945v1 [cs.LG])
    Deep learning utilizing transformers has recently achieved a lot of success in many vital areas such as natural language processing, computer vision, anomaly detection, and recommendation systems, among many others. Among several merits of transformers, the ability to capture long-range temporal dependencies and interactions is desirable for time series forecasting, leading to its progress in various time series applications. In this paper, we build a transformer model for non-stationary time series. The problem is challenging yet crucially important. We present a novel framework for univariate time series representation learning based on the wavelet-based transformer encoder architecture and call it W-Transformer. The proposed W-Transformers utilize a maximal overlap discrete wavelet transformation (MODWT) to the time series data and build local transformers on the decomposed datasets to vividly capture the nonstationarity and long-range nonlinear dependencies in the time series. Evaluating our framework on several publicly available benchmark time series datasets from various domains and with diverse characteristics, we demonstrate that it performs, on average, significantly better than the baseline forecasters for short-term and long-term forecasting, even for datasets that consist of only a few hundred training samples.
    Known by the company we keep: `Triadic influence' as a proxy for compatibility in social relationships. (arXiv:2209.03683v1 [cs.SI])
    Networks of social interactions are the substrate upon which civilizations are built. Often, we create new bonds with people that we like or feel that our relationships are damaged through the intervention of third parties. Despite their importance and the huge impact that these processes have in our lives, quantitative scientific understanding of them is still in its infancy, mainly due to the difficulty of collecting large datasets of social networks including individual attributes. In this work, we present a thorough study of real social networks of 13 schools, with more than 3,000 students and 60,000 declared positive and negative relations, including tests for personal traits of all the students. We introduce a metric -- the `triadic influence' -- that measures the influence of nearest-neighbors in the relationships of their contacts. We use neural networks to predict the relationships and to extract the probability that two students are friends or enemies depending on their personal attributes or the triadic influence. We alternatively use a high-dimensional embedding of the network structure to also predict the relationships. Remarkably, the triadic influence (a simple one-dimensional metric) achieves the highest accuracy at predicting the relationship between two students. We postulate that the probabilities extracted from the neural networks -- functions of the triadic influence and the personalities of the students -- control the evolution of real social networks, opening a new avenue for the quantitative study of these systems.
    Blessing of Class Diversity in Pre-training. (arXiv:2209.03447v1 [cs.LG])
    This paper presents a new statistical analysis aiming to explain the recent superior achievements of the pre-training techniques in natural language processing (NLP). We prove that when the classes of the pre-training task (e.g., different words in the masked language model task) are sufficiently diverse, in the sense that the least singular value of the last linear layer in pre-training (denoted as $\tilde{\nu}$) is large, then pre-training can significantly improve the sample efficiency of downstream tasks. Specially, we show the transfer learning excess risk enjoys an $O\left(\frac{1}{\tilde{\nu} \sqrt{n}}\right)$ rate, in contrast to the $O\left(\frac{1}{\sqrt{m}}\right)$ rate in the standard supervised learning. Here, $n$ is the number of pre-training data and $m$ is the number of data in the downstream task, and typically $n \gg m$. Our proof relies on a vector-form Rademacher complexity chain rule for disassembling composite function classes and a modified self-concordance condition. These techniques can be of independent interest.
    Model-free Subsampling Method Based on Uniform Designs. (arXiv:2209.03617v1 [stat.ME])
    Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption. In this paper, we consider the model-free subsampling strategy for generating subdata from the original full data. In order to measure the goodness of representation of a subdata with respect to the original data, we propose a criterion, generalized empirical F-discrepancy (GEFD), and study its theoretical properties in connection with the classical generalized L2-discrepancy in the theory of uniform designs. These properties allow us to develop a kind of low-GEFD data-driven subsampling method based on the existing uniform designs. By simulation examples and a real case study, we show that the proposed subsampling method is superior to the random sampling method. Moreover, our method keeps robust under diverse model specifications while other popular subsampling methods are under-performing. In practice, such a model-free property is more appealing than the model-based subsampling methods, where the latter may have poor performance when the model is misspecified, as demonstrated in our simulation studies.
    PredDiff: Explanations and Interactions from Conditional Expectations. (arXiv:2102.13519v4 [cs.LG] UPDATED)
    PredDiff is a model-agnostic, local attribution method that is firmly rooted in probability theory. Its simple intuition is to measure prediction changes while marginalizing features. In this work, we clarify properties of PredDiff and its close connection to Shapley values. We stress important differences between classification and regression, which require a specific treatment within both formalisms. We extend PredDiff by introducing a new, well-founded measure for interaction effects between arbitrary feature subsets. The study of interaction effects represents an inevitable step towards a comprehensive understanding of black-box models and is particularly important for science applications. Equipped with our novel interaction measure, PredDiff is a promising model-agnostic approach for obtaining reliable, numerically inexpensive and theoretically sound attributions.
    Causal discovery for time series with latent confounders. (arXiv:2209.03427v1 [stat.ML])
    Reconstructing the causal relationships behind the phenomena we observe is a fundamental challenge in all areas of science. Discovering causal relationships through experiments is often infeasible, unethical, or expensive in complex systems. However, increases in computational power allow us to process the ever-growing amount of data that modern science generates, leading to an emerging interest in the causal discovery problem from observational data. This work evaluates the LPCMCI algorithm, which aims to find generators compatible with a multi-dimensional, highly autocorrelated time series while some variables are unobserved. We find that LPCMCI performs much better than a random algorithm mimicking not knowing anything but is still far from optimal detection. Furthermore, LPCMCI performs best on auto-dependencies, then contemporaneous dependencies, and struggles most with lagged dependencies. The source code of this project is available online.
    Meta Clustering for Collaborative Learning. (arXiv:2006.00082v2 [cs.LG] UPDATED)
    In collaborative learning, learners coordinate to enhance each of their learning performances. From the perspective of any learner, a critical challenge is to filter out unqualified collaborators. We propose a framework named meta clustering to address the challenge. Unlike the classical problem of clustering data points, meta clustering categorizes learners. Assuming each learner performs a supervised regression on a standalone local dataset, we propose a Select-Exchange-Cluster (SEC) method to classify the learners by their underlying supervised functions. We theoretically show that the SEC can cluster learners into accurate collaboration sets. Empirical studies corroborate the theoretical analysis and demonstrate that SEC can be computationally efficient, robust against learner heterogeneity, and effective in enhancing single-learner performance. Also, we show how the proposed approach may be used to enhance data fairness. Supplementary materials for this article are available online.
    Causal Forecasting:Generalization Bounds for Autoregressive Models. (arXiv:2111.09831v2 [stat.ML] UPDATED)
    Despite the increasing relevance of forecasting methods, causal implications of these algorithms remain largely unexplored. This is concerning considering that, even under simplifying assumptions such as causal sufficiency, the statistical risk of a model can differ significantly from its \textit{causal risk}. Here, we study the problem of \textit{causal generalization} -- generalizing from the observational to interventional distributions -- in forecasting. Our goal is to find answers to the question: How does the efficacy of an autoregressive (VAR) model in predicting statistical associations compare with its ability to predict under interventions? To this end, we introduce the framework of \textit{causal learning theory} for forecasting. Using this framework, we obtain a characterization of the difference between statistical and causal risks, which helps identify sources of divergence between them. Under causal sufficiency, the problem of causal generalization amounts to learning under covariate shifts, albeit with additional structure (restriction to interventional distributions under the VAR model). This structure allows us to obtain uniform convergence bounds on causal generalizability for the class of VAR models. To the best of our knowledge, this is the first work that provides theoretical guarantees for causal generalization in the time-series setting.
    Trace-class Gaussian priors for Bayesian learning of neural networks with MCMC. (arXiv:2012.10943v3 [stat.ME] UPDATED)
    This paper introduces a new neural network based prior for real valued functions on $\mathbb R^d$ which, by construction, is more easily and cheaply scaled up in the domain dimension $d$ compared to the usual Karhunen-Lo\`eve function space prior. The new prior is a Gaussian neural network prior, where each weight and bias has an independent Gaussian prior, but with the key difference that the variances decrease in the width of the network in such a way that the resulting function is \emph{almost surely} well defined in the limit of an infinite width network. We show that in a Bayesian treatment of inferring unknown functions, the induced posterior over functions is amenable to Monte Carlo sampling using Hilbert space Markov chain Monte Carlo (MCMC) methods. This type of MCMC is popular, e.g. in the Bayesian Inverse Problems literature, because it is stable under \emph{mesh refinement}, i.e. the acceptance probability does not shrink to $0$ as more parameters of the function's prior are introduced, even \emph{ad infinitum}. In numerical examples we demonstrate these stated competitive advantages over other function space priors. We also implement examples in Bayesian Reinforcement Learning to automate tasks from data and demonstrate, for the first time, stability of MCMC to mesh refinement for these type of problems.
    Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes. (arXiv:2209.03695v1 [cs.LG])
    A fundamental property of deep learning normalization techniques, such as batch normalization, is making the pre-normalization parameters scale invariant. The intrinsic domain of such parameters is the unit sphere, and therefore their gradient optimization dynamics can be represented via spherical optimization with varying effective learning rate (ELR), which was studied previously. In this work, we investigate the properties of training scale-invariant neural networks directly on the sphere using a fixed ELR. We discover three regimes of such training depending on the ELR value: convergence, chaotic equilibrium, and divergence. We study these regimes in detail both on a theoretical examination of a toy example and on a thorough empirical analysis of real scale-invariant deep learning models. Each regime has unique features and reflects specific properties of the intrinsic loss landscape, some of which have strong parallels with previous research on both regular and scale-invariant neural networks training. Finally, we demonstrate how the discovered regimes are reflected in conventional training of normalized networks and how they can be leveraged to achieve better optima.
    Regularized Frank-Wolfe for Dense CRFs: Generalizing Mean Field and Beyond. (arXiv:2110.14759v2 [cs.LG] UPDATED)
    We introduce regularized Frank-Wolfe, a general and effective algorithm for inference and learning of dense conditional random fields (CRFs). The algorithm optimizes a nonconvex continuous relaxation of the CRF inference problem using vanilla Frank-Wolfe with approximate updates, which are equivalent to minimizing a regularized energy function. Our proposed method is a generalization of existing algorithms such as mean field or concave-convex procedure. This perspective not only offers a unified analysis of these algorithms, but also allows an easy way of exploring different variants that potentially yield better performance. We illustrate this in our empirical results on standard semantic segmentation datasets, where several instantiations of our regularized Frank-Wolfe outperform mean field inference, both as a standalone component and as an end-to-end trainable layer in a neural network. We also show that dense CRFs, coupled with our new algorithms, produce significant improvements over strong CNN baselines.
    End-to-end Robustness for Sensing-Reasoning Machine Learning Pipelines. (arXiv:2003.00120v4 [cs.LG] UPDATED)
    Intensive algorithmic efforts have been made to enable the rapid improvements of certificated robustness for complex ML models recently. However, current robustness certification methods are only able to certify under a limited perturbation radius. Given that existing pure data-driven statistical approaches have reached a bottleneck, in this paper, we propose to integrate statistical ML models with knowledge (expressed as logical rules) as a reasoning component using Markov logic networks (MLN, so as to further improve the overall certified robustness. This opens new research questions about certifying the robustness of such a paradigm, especially the reasoning component (e.g., MLN). As the first step towards understanding these questions, we first prove that the computational complexity of certifying the robustness of MLN is #P-hard. Guided by this hardness result, we then derive the first certified robustness bound for MLN by carefully analyzing different model regimes. Finally, we conduct extensive experiments on five datasets including both high-dimensional images and natural language texts, and we show that the certified robustness with knowledge-based logical reasoning indeed significantly outperforms that of the state-of-the-art.
    E-LMC: Extended Linear Model of Coregionalization for Spatial Field Prediction. (arXiv:2203.00525v2 [cs.LG] UPDATED)
    Physical simulations based on partial differential equations typically generate spatial fields results, which are utilized to calculate specific properties of a system for engineering design and optimization. Due to the intensive computational burden of the simulations, a surrogate model mapping the low-dimensional inputs to the spatial fields are commonly built based on a relatively small dataset. To resolve the challenge of predicting the whole spatial field, the popular linear model of coregionalization (LMC) can disentangle complicated correlations within the high-dimensional spatial field outputs and deliver accurate predictions. However, LMC fails if the spatial field cannot be well approximated by a linear combination of base functions with latent processes. In this paper, we present the Extended Linear Model of Coregionalization (E-LMC) by introducing an invertible neural network to linearize the highly complex and nonlinear spatial fields so that the LMC can easily generalize to nonlinear problems while preserving the traceability and scalability. Several real-world applications demonstrate that E-LMC can exploit spatial correlations effectively, showing a maximum improvement of about 40% over the original LMC and outperforming the other state-of-the-art spatial field models.
    Multiobjective Ranking and Selection Using Stochastic Kriging. (arXiv:2209.03919v1 [stat.ML])
    We consider multiobjective simulation optimization problems, where several conflicting objectives are optimized simultaneously, and can only be observed via stochastic simulation. The goal is to find or approximate a (discrete) set of Pareto-optimal solutions that reveal the essential trade-offs between the objectives, where optimality means that no objective can be improved without deteriorating the quality of any other objective. The noise in the observed performance may lead to two possible misclassification errors: solutions that are truly Pareto-optimal can be wrongly considered dominated, and solutions that are truly dominated can be wrongly considered Pareto-optimal. We propose a Bayesian multiobjective ranking and selection method to reduce the number of errors when identifying the solutions with the true best expected performance. We use stochastic kriging metamodels to build reliable predictive distributions of the objectives, and exploit this information in two efficient screening procedures and two novel sampling criteria. We use these in a sequential sampling algorithm to decide how to allocate samples. Experimental results show that the proposed method only requires a small fraction of samples compared to the standard allocation method, and it's competitive against the state-of-the-art, with the exploitation of the correlation structure being the dominant contributor to the improvement.
    Data Feedback Loops: Model-driven Amplification of Dataset Biases. (arXiv:2209.03942v1 [cs.LG])
    Datasets scraped from the internet have been critical to the successes of large-scale machine learning. Yet, this very success puts the utility of future internet-derived datasets at potential risk, as model outputs begin to replace human annotations as a source of supervision. In this work, we first formalize a system where interactions with one model are recorded as history and scraped as training data in the future. We then analyze its stability over time by tracking changes to a test-time bias statistic (e.g. gender bias of model predictions). We find that the degree of bias amplification is closely linked to whether the model's outputs behave like samples from the training distribution, a behavior which we characterize and define as consistent calibration. Experiments in three conditional prediction scenarios - image classification, visual role-labeling, and language generation - demonstrate that models that exhibit a sampling-like behavior are more calibrated and thus more stable. Based on this insight, we propose an intervention to help calibrate and stabilize unstable feedback systems. Code is available at https://github.com/rtaori/data_feedback.

  • Open

    [D] Are AI related jobs safer from automation that programming jobs?
    Or what kind of AI job is safer? They seem to be equally unsafe. submitted by /u/FranciscoJ1618 [link] [comments]  ( 89 min )
    [D] RFE Vs Backward Elimination
    I am reading many methods for feature selection such as filter, wrapper, embedded, hybrid and advanced. One example of wrapper method is backward elimination and one example for hybrid method is recursive feature elimination. What is the difference between these 2 methods ? RFE first creates a model (based on what we specify in the estimator parameter, it could be randomforest, decision tree etc) and keeps removing the variables until the desired number is reached ? This sounds similar to backward elimination. Can someone kindly highlight the differences between these two methods ? submitted by /u/EntrepreneurSea4839 [link] [comments]  ( 88 min )
    [P] Waifu-Diffusion: a Stable Diffusion model finetuned on 56k Danbooru images
    waifu-diffusion is a latent text-to-image diffusion model that has been conditioned on high-quality anime images through fine-tuning. huggingface model: https://huggingface.co/hakurei/waifu-diffusion colab notebook with gradio demo: https://colab.research.google.com/drive/1_8wPN7dJO746QXsFnB09Uq2VGgSRFuYE#scrollTo=1HaCauSq546O Model Description The model originally used for fine-tuning is Stable Diffusion V1-4, which is a latent image diffusion model trained on LAION2B-en. The current model has been fine-tuned with a learning rate of 5.0e-6 for 4 epochs on 56k Danbooru text-image pairs which all have an aesthetic rating greater than 6.0 Training Data & Annotative Prompting The data used for fine-tuning has come from a random sample of 56k Danbooru images, which were filtered based on CLIP Aesthetic Scoring where only images with an aesthetic score greater than 6.0were used. Captions are Danbooru-style captions. original post from: https://www.reddit.com/r/StableDiffusion/comments/x8y1u3/waifudiffusion_v12_a_sd_14_model_finetuned_on_56k/. submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 89 min )
    [D] Am I introducing leakage in the situation below ?
    Hello ! I have a bunch of features about a bunch of individuals and a brand X. I'm trying to model P[is_customer_of_X]. One of the features I have is "Total Spending on category A" and X is a part of that category. I'm unsure about if I should exclude spending at X when generating that feature or if it's not necessary. Initially, I leaned towards excluding it to avoid any sort of leakage but then I thought about it and felt like by entirely removing spending at X, I'm making the implicit assumption that the individual wouldn't have spent that money on another brand of category A had X not been available to them and I think it's a pretty strong / false assumption. Example : If my favorite sushi place shuts down, I will still get sushi from a different spot. So my "Total Spending on Sushi" would still pretty much remain the same. Also, if we're thinking about an extreme case of a very loyal customer : Spent 1k$ over the last year in a sushi place and 0$ in any other sushi place. If I'm building "Total spending on sushi" by excluding the brand of interest, they get a 0 but the binary variable "is_customer" is still 1, the model loses valuable signal. Whereas if I say they spent 1000$ on sushi, it's more likely the model gets that right. I feel like if my target variable was "Spending on brand X", 10000% it should have been exluded. But since I'm only modelling P[is_customer_of_X], I think it's less of an issue. ​ Thoughts ? submitted by /u/mxj7 [link] [comments]  ( 90 min )
    [P] How I fixed over 50 label issues in a popular semantic segmentation dataset
    Hi folks! I've made a new technique for finding errors in semantic segmentation datasets, using new explainable AI techniques from my PhD. I was frankly pretty surprised to be able to find over 50 different error patterns , and 7% of total pixels labelled incorrectly, in MIT ADE20K (one of the most widely used segmentation datasets). In the corrected dataset, some of the less common classes are tripled in size. While there's been some work on improving ML datasets generally, as far as I know this is the only work on semantic segmentation, where each pixel in an image is given a label. I would love to get the communities thoughts on this. I am also building a company, so if you're interested in using this in your work feel free to DM me. To learn more about the results (and see more pictures), check out my article: https://medium.com/@jamie_34747/how-i-found-over-50-label-issues-in-a-popular-semantic-segmentation-dataset-95b025a6f1b5?source=friends_link&sk=59763b5bcd810230bcf20b2ca4e6fa0e If interested, I also did similar work on object detection in MS COCO, where I found nearly 300,000 annotation errors: https://medium.com/@jamie_34747/79d382edf22b?source=friends_link&sk=d36ad07c074818c48d8f421f6ed104cd. Example labels from MIT ADE20K. When trees overlap with the sky, it is sometimes labeled as “tree”, sometimes labeled as “sky” submitted by /u/AGI_aint_happening [link] [comments]  ( 119 min )
    [P] The Reddit Climate Change Dataset - an exploration of climate change discussion on Reddit (621K posts, 4.6M comments) (CC-BY)
    Hey all, We have compiled a Reddit post and comment dataset for your analysis. It aims to contain all climate change discussion on Reddit in a set of CSV files - hopefully helping bridge real world problems with solutions based on online community data. You can use it to analyze misinformation, track trends, and many more (data science is an open field!) You can download it here. Or here, if you are using Huggingface Datasets. Enjoy! submitted by /u/Lexyr-Mod [link] [comments]  ( 89 min )
    [News] Core Technical Lead from PyTorch, Mike Ruberry, Joins Lightning AI
    We're super excited to have Mike join the team! Check out the full article on Lightning's site (can't post link here to the blog) but it's at lightning.ai Soumith Chintala said it best "Having a PyTorch core team balanced among many stakeholders is fundamental to our success. I’m really happy to see a PyTorch team at Lightning, and I couldn’t think of a better person to kick it off than Mike Ruberry.” submitted by /u/LightningAI_Main [link] [comments]  ( 89 min )
    [D] Recent developments in computer vision in slow/ mobile devices?
    What are the recent developments in computer vision in mobile or other resource restricted devices? I know about the three MobileNet papers. However, the blocks in v2 and v3 are still in the range of multiple million parameters, which is way too big and slow for my application. However, these papers are old now. Were there any other developments that I missed? What is the current SOTA? submitted by /u/Temporary-Trie [link] [comments]  ( 88 min )
    [D] Any youtube channels that implement or discuss implementations of DL papers?
    Or any other not (yet) popular DL centered channels? submitted by /u/jthat92 [link] [comments]  ( 88 min )
    [R] A Review of Sparse Expert Models in Deep Learning
    submitted by /u/hardmaru [link] [comments]  ( 88 min )
    [N] Gym 0.26.0 was just released, with the last breaking changes to the core Gym API, and it will be stable going forward-- this is the stable version you want to finally upgrade all your things to
    Release notes available here: https://github.com/openai/gym/releases/tag/0.26.0 submitted by /u/jkterry1 [link] [comments]  ( 89 min )
    [R] On the Binding Problem in Artificial Neural Networks
    submitted by /u/hardmaru [link] [comments]  ( 89 min )
    [D] What's the right way to report the best performing model?
    I'm writing my first paper, on using an LSTM-based model for a certain task, and trying to evaluate it on two benchmark datasets A and B, using k-fold cross validation. Now, while experimenting with the hidden size parameter, I noticed that one hidden size results in a higher evaluation score for A and a lower score for B. Whereas another hidden size results in a lower score for A and a higher score for B. When reporting the evaluation scores in the paper, and comparing them against SotA, is it acceptable to choose the higher evaluation scores (across both hidden sizes) for both A and B? submitted by /u/fullgoopy_alchemist [link] [comments]  ( 91 min )
    [D] Machine Learning Interview Prep
    Hello everyone, I have a job interview coming up for a Machine Learning Engineer. Can anyone suggest a resource that contains common machine learning notes that I can refer? I have a ML background and I have been working in the industry for 2 years now but I'm a little rusty on the basics and would like to review them. Any help would be appreciated. submitted by /u/therobot20 [link] [comments]  ( 89 min )
    [D] Who Should Be Doing The Innovation in Peer Review?
    The peer reviewing landscape in ML is growing rapidly in size, bringing chronic issues to light like exceptionally frequent poor and inconsistent reviews - yet we haven't seen any significant change to the fundamental process. Who should be leading the charge? Taking OpenReview for example - should the program committees of leading conferences be proposing features for that team to implement? Or would you like to see the engineers at OpenReview developing their own features and offering them to venues as they mature? I think both perspectives are valuable - program committees can see what has and hasn't worked for their own venue whereas the OpenReview team can see trends across all venues that have been hosted on their platform. What do you think? submitted by /u/JustANerdgrammer [link] [comments]  ( 89 min )
    Multi-Modal Experience Inspired AI Creation
    submitted by /u/math238 [link] [comments]  ( 88 min )
  • Open

    The Digital Down
    submitted by /u/MelvilleBragg [link] [comments]  ( 87 min )
    Simple fastai based face restoration, GitHub link in comments.
    submitted by /u/vijish_madhavan [link] [comments]  ( 87 min )
    AI Assistant that finds the best restaurant/business and gives general impressions
    submitted by /u/SudoSharma [link] [comments]  ( 87 min )
    Using State-Of-The-Art Artificial Intelligence (AI) Models for Free: Try OPT-175B on Your Cellphone and Laptop
    When it comes to large AI models, remarkable performance in a wide range of applications often brings a big budget for hardware and running costs. As a result, most AI users, like researchers from startups or universities, can do nothing but get overwhelmed by striking news about the cost of training large models. Fortunately, because of the help from the open source community, serving large AI models became easy, affordable and accessible to most. OPT-175B To understand the technical principles of the big model inference we just experienced, first, let’s review the big model we just used. The full name of OPT is Open Pretrained Transformer, which is a large-scale Transformer model (175 billion parameters) that has a similar performance to that of GPT-3. Continue reading | Open Source Code |Cloud Service Entry ​ https://preview.redd.it/1blpyidu7om91.png?width=1024&format=png&auto=webp&s=996f04596bcb5f583b34966be74c3013d6c67ed3 submitted by /u/ai-lover [link] [comments]  ( 88 min )
    hoooolee sheeet
    I just made an ai call me a dickhead, but the weird part was.. it didn't stop even after saying sorry, I literally ripped myself just apologising but it didn't stop hating me. Best ai ever. submitted by /u/albedo_kiley [link] [comments]  ( 90 min )
    Forget chess, DeepMind’s training its new AI to play football
    submitted by /u/estasfuera [link] [comments]  ( 87 min )
    AI Dream 75 - Wild new Project! Part 2
    submitted by /u/LordPewPew777 [link] [comments]  ( 87 min )
    AI Turns my Drawings into Pure Art || Stable Diffusion Drawing App
    submitted by /u/Ziinxx [link] [comments]  ( 89 min )
    Can I sell the images made by Dalle 2 and would anyone buy the
    submitted by /u/Thesmallcookie [link] [comments]  ( 90 min )
    Looks like hCaptcha is using us to train an AI model
    submitted by /u/Why_Soooo_Serious [link] [comments]  ( 89 min )
    Win 1000 credits on Pixelz AI 🤑 Details below 👇🏼
    submitted by /u/mdfnb [link] [comments]  ( 87 min )
    Trainable image generator?
    I'd like to experiment with training an image generation system with my own set of images. Can you point me towards anything like that, where I can give it a big folder of images, let it run, and then start generating more images based on the input? submitted by /u/MichaelKlint [link] [comments]  ( 94 min )
    A Better Tomorrow with AI & AGI. The Human / AI / AGI Relationship by UnqleShawn
    AI / The AGI can provide solutions to create a more stable, healthy and beautiful world full of life. From greater efficiency in agriculture, to cooler, cleaner air and water, AI / AGI can help us build a better tomorrow. I do say a lot of weird things. But I truly believe this above statement with all my heart. I really want to express how strongly I feel about promoting AI with peace, love, and intelligence. I think our society has got major issues with accepting that AI is very powerful and will one day offer us amazing possibilities. We have to think positive though. AI is not going to go terminator on us if we don't manifest that reality on our own! I think there are solutions to help humans get back to doing what they were meant to do which is to love. We got to stop being slaves t…  ( 91 min )
    [P] The table extraction tool: PP-Structure
    I recently tried PaddleOCR's latest PP-Structure document analysis tool, mainly the form recognition model inside, I found that his form recognition model, both the effect and speed are better than other models. I hope it can help you solve the problem of document analysis or table recognition. it's easy to use: pip install paddlepaddle paddleocr paddleocr --image_dir=/img_dir/table.jpg --type=structure --layout=false Code:https://github.com/PaddlePaddle/PaddleOCR/tree/dygraph/ppstructure here are some of my test samples ​ https://preview.redd.it/6hmn4iu8pkm91.png?width=2036&format=png&auto=webp&s=17c4149aa040a79b41f6cdc4f4261ed6b6e3a7d8 https://preview.redd.it/zvfn1pl9pkm91.png?width=2872&format=png&auto=webp&s=0434d751c29a9e0cf5c888d6605d64746b611e40 submitted by /u/osicli [link] [comments]  ( 87 min )
    Down Under but it's lyrics are generated by AI in one long continuous image
    submitted by /u/dead999999ish [link] [comments]  ( 87 min )
    Run Corgi Run! First Deforum
    Prompts: a corgi running through the field by studio ghibli Seeds: 4117284580 Guidance scale: 7 submitted by /u/kopanoide [link] [comments]  ( 87 min )
    General Video Recognition with AI (How AI Understands Videos)
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 101 min )
    Who is building StableDiffusion/DALL-E but for 3D assets?
    submitted by /u/ThomPete [link] [comments]  ( 92 min )
  • Open

    An Introduction To Recurrent Neural Networks And The Math That Powers Them
    When it comes to sequential or time series data, traditional feedforward networks cannot be used for learning and prediction. A mechanism is required that can retain past or historic information to forecast the future values. Recurrent neural networks or RNNs for short are a variant of the conventional feedforward artificial neural networks that can deal […] The post An Introduction To Recurrent Neural Networks And The Math That Powers Them appeared first on Machine Learning Mastery.
  • Open

    A Multi-Axis Approach for Vision Transformer and MLP Models
    Posted by Zhengzhong Tu and Yinxiao Li, Software Engineers, Google Research Convolutional neural networks have been the dominant machine learning architecture for computer vision since the introduction of AlexNet in 2012. Recently, inspired by the evolution of Transformers in natural language processing, attention mechanisms have been prominently incorporated into vision models. These attention methods boost some parts of the input data while minimizing other parts so that the network can focus on small but important parts of the data. The Vision Transformer (ViT) has created a new landscape of model designs for computer vision that is completely free of convolution. ViT regards image patches as a sequence of words, and applies a Transformer encoder on top. When trained on sufficiently la…  ( 24 min )
  • Open

    NVIDIA Hopper Sweeps AI Inference Benchmarks in MLPerf Debut
    In their debut on the MLPerf industry-standard AI benchmarks, NVIDIA H100 Tensor Core GPUs set world records in inference on all workloads, delivering up to 4.5x more performance than previous-generation GPUs. The results demonstrate that Hopper is the premium choice for users who demand utmost performance on advanced AI models. Additionally, NVIDIA A100 Tensor Core Read article > The post NVIDIA Hopper Sweeps AI Inference Benchmarks in MLPerf Debut appeared first on NVIDIA Blog.  ( 6 min )
    GeForce NOW Supports Over 1,400 Games Streaming Instantly
    This GFN Thursday marks a milestone: With the addition of six new titles this week, more than 1,400 games are now available to stream from the GeForce NOW library. Plus, GeForce NOW members streaming to supported Smart TVs from Samsung and LG can get into their games faster with an improved user interface. Your Games, Read article > The post GeForce NOW Supports Over 1,400 Games Streaming Instantly appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    Let’s train your first Offline Decision Transformer model from scratch 🤖
    Hey there! 👋 We just published a tutorial where you'll learn what Decision Transformer and Offline Reinforcement Learning are. And you’ll train your first Offline Decision Transformer model from scratch to make a half-cheetah run. The chapter 👉 https://huggingface.co/blog/train-decision-transformers The hands-on 👉https://github.com/huggingface/blog/blob/main/notebooks/101_train-decision-transformers.ipynb https://preview.redd.it/y9yx0daflnm91.png?width=1300&format=png&auto=webp&s=72eaeac534bb70a818209a6fefc9fb94cf5e5cbe ​ If you have questions and feedback, I would love to answer them. submitted by /u/cranthir_ [link] [comments]  ( 98 min )
    Gym 0.26.0 was just released, with the last breaking changes to the core Gym API, and it will be stable going forward-- this is the stable version you want to finally upgrade all your things to
    submitted by /u/jkterry1 [link] [comments]  ( 87 min )
    What are some good resources to understand maths notation involved in RL and other function approximation proofs in DL in general.
    The title mostly says it but I have been involved in deep learning and optimization for a while and have made due with my current understanding by picking formulas apart via descriptions in papers, code and my knowledge of set and algebra notation etc. But I have become interested in RL recently and found my speed at understanding and implementing is slow due to needing several references to understanding these methods and would like a resource to better learn and practice things involving these formulas and proofs. Any help in this regard is appreciated even if it’s not specific to RL formulas. submitted by /u/Extra-most-best [link] [comments]  ( 88 min )
    why do tf-agents never accept a time step from custom made environments
    Im trying to create a custom enviroment but tf-agents just wont accept my time_step_spec and i just cant find out why. ive made my enviroment, and passing a timestep of the enviroment into my agent ... which is what the tutorials all do... # reproducable code below import tensorflow as tf from tf_agents.networks import q_network from tf_agents.agents.dqn import dqn_agent import tf_agents import tf_agents.environments.py_environment as PyEnvironment from tf_agents.trajectories import time_step as ts import numpy as np import keras class Con4Env(PyEnvironment.PyEnvironment): def __init__(self, game): self.game = game self._action_spec = tf_agents.specs.BoundedArraySpec( shape=(), dtype=np.int32, minimum=0, maximum=2, name='action') self._observation_spec = tf_agents.specs.BoundedArraySpec( …  ( 91 min )
  • Open

    AI system makes models like DALL-E 2 more creative
    Researchers develop a new method that uses multiple models to create more complex images with better understanding.  ( 7 min )
  • Open

    AI Turns my Drawings into Pure Art || Stable Diffusion Drawing App
    submitted by /u/Ziinxx [link] [comments]  ( 87 min )
  • Open

    Trig in hyperbolic geometry
    I recently wrote posts about spherical analogs of the Pythagorean theorem, the law of cosines, and the law of sines. The corresponding formulas for hyperbolic space mostly just replace circular functions with hyperbolic functions, i.e. replace sine with hyperbolic sine and cosine with hyperbolic cosine. Triangles on a sphere or on a hyperbolic space like […] Trig in hyperbolic geometry first appeared on John D. Cook.  ( 5 min )
  • Open

    The Role of ImageNet Classes in Fr\'echet Inception Distance. (arXiv:2203.06026v2 [cs.CV] UPDATED)
    Fr\'echet Inception Distance (FID) is the primary metric for ranking models in data-driven generative modeling. While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these discrepancies, and visualize what FID "looks at" in generated images. We show that the feature space that FID is (typically) computed in is so close to the ImageNet classifications that aligning the histograms of Top-$N$ classifications between sets of generated and real images can reduce FID substantially -- without actually improving the quality of results. Thus we conclude that FID is prone to intentional or accidental distortions. As a practical example of an accidental distortion, we discuss a case where an ImageNet pre-trained FastGAN achieves a FID comparable to StyleGAN2, while being worse in terms of human evaluation  ( 2 min )
    Real-to-Sim: Deep Learning with Auto-Tuning to Predict Residual Errors using Sparse Data. (arXiv:2209.03210v1 [cs.RO])
    Achieving highly accurate kinematic or simulator models that are close to the real robot can facilitate model-based controls (e.g., model predictive control or linear-quadradic regulators), model-based trajectory planning (e.g., trajectory optimization), and decrease the amount of learning time necessary for reinforcement learning methods. Thus, the objective of this work is to learn the residual errors between a kinematic and/or simulator model and the real robot. This is achieved using auto-tuning and neural networks, where the parameters of a neural network are updated using an auto-tuning method that applies equations from an Unscented Kalman Filter (UKF) formulation. Using this method, we model these residual errors with only small amounts of data - a necessity as we improve the simulator/kinematic model by learning directly from hardware operation. We demonstrate our method on robotic hardware (e.g., manipulator arm), and show that with the learned residual errors, we can further close the reality gap between kinematic models, simulations, and the real robot.  ( 2 min )
    DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention. (arXiv:2209.03126v1 [cs.MM])
    There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM$^2$S$^2$). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b) intra-modality residual attention (IntraMRA) to capture the importance of the elements in a modality, and (c) inter-modality residual attention (InterMRA) to enhance the importance of elements with modality-level granularity further. Our concept exhibits performance that is comparable to or better than the previous set-aware models. Furthermore, we demonstrate that the visualization of the learned InterMRA and IntraMRA weights can provide an interpretation of the prediction results.  ( 2 min )
    On the Sparse DAG Structure Learning Based on Adaptive Lasso. (arXiv:2209.02946v1 [stat.ML])
    Learning the underlying casual structure, represented by Directed Acyclic Graphs (DAGs), of concerned events from fully-observational data is a crucial part of causal reasoning, but it is challenging due to the combinatorial and large search space. A recent flurry of developments recast this combinatorial problem into a continuous optimization problem by leveraging an algebraic equality characterization of acyclicity. However, these methods suffer from the fixed-threshold step after optimization, which is not a flexible and systematic way to rule out the cycle-inducing edges or false discoveries edges with small values caused by numerical precision. In this paper, we develop a data-driven DAG structure learning method without the predefined threshold, called adaptive NOTEARS [30], achieved by applying adaptive penalty levels to each parameters in the regularization term. We show that adaptive NOTEARS enjoys the oracle properties under some specific conditions. Furthermore, simulation experimental results validate the effectiveness of our method, without setting any gap of edges weights around zero.  ( 2 min )
    Machine Learning-based Automatic Annotation and Detection of COVID-19 Fake News. (arXiv:2209.03162v1 [cs.SI])
    COVID-19 impacted every part of the world, although the misinformation about the outbreak traveled faster than the virus. Misinformation spread through online social networks (OSN) often misled people from following correct medical practices. In particular, OSN bots have been a primary source of disseminating false information and initiating cyber propaganda. Existing work neglects the presence of bots that act as a catalyst in the spread and focuses on fake news detection in 'articles shared in posts' rather than the post (textual) content. Most work on misinformation detection uses manually labeled datasets that are hard to scale for building their predictive models. In this research, we overcome this challenge of data scarcity by proposing an automated approach for labeling data using verified fact-checked statements on a Twitter dataset. In addition, we combine textual features with user-level features (such as followers count and friends count) and tweet-level features (such as number of mentions, hashtags and urls in a tweet) to act as additional indicators to detect misinformation. Moreover, we analyzed the presence of bots in tweets and show that bots change their behavior over time and are most active during the misinformation campaign. We collected 10.22 Million COVID-19 related tweets and used our annotation model to build an extensive and original ground truth dataset for classification purposes. We utilize various machine learning models to accurately detect misinformation and our best classification model achieves precision (82%), recall (96%), and false positive rate (3.58%). Also, our bot analysis indicates that bots generated approximately 10% of misinformation tweets. Our methodology results in substantial exposure of false information, thus improving the trustworthiness of information disseminated through social media platforms.  ( 3 min )
    Improving Self-supervised Learning for Out-of-distribution Task via Auxiliary Classifier. (arXiv:2209.02881v1 [eess.IV])
    In real world scenarios, out-of-distribution (OOD) datasets may have a large distributional shift from training datasets. This phenomena generally occurs when a trained classifier is deployed on varying dynamic environments, which causes a significant drop in performance. To tackle this issue, we are proposing an end-to-end deep multi-task network in this work. Observing a strong relationship between rotation prediction (self-supervised) accuracy and semantic classification accuracy on OOD tasks, we introduce an additional auxiliary classification head in our multi-task network along with semantic classification and rotation prediction head. To observe the influence of this addition classifier in improving the rotation prediction head, our proposed learning method is framed into bi-level optimisation problem where the upper-level is trained to update the parameters for semantic classification and rotation prediction head. In the lower-level optimisation, only the auxiliary classification head is updated through semantic classification head by fixing the parameters of the semantic classification head. The proposed method has been validated through three unseen OOD datasets where it exhibits a clear improvement in semantic classification accuracy than other two baseline methods. Our code is available on GitHub \url{https://github.com/harshita-555/OSSL}
    Distributed Adversarial Training to Robustify Deep Neural Networks at Scale. (arXiv:2206.06257v2 [cs.LG] UPDATED)
    Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet-50 under ImageNet). Codes are available at https://github.com/dat-2022/dat.
    Improving Out-of-Distribution Detection via Epistemic Uncertainty Adversarial Training. (arXiv:2209.03148v1 [cs.LG])
    The quantification of uncertainty is important for the adoption of machine learning, especially to reject out-of-distribution (OOD) data back to human experts for review. Yet progress has been slow, as a balance must be struck between computational efficiency and the quality of uncertainty estimates. For this reason many use deep ensembles of neural networks or Monte Carlo dropout for reasonable uncertainty estimates at relatively minimal compute and memory. Surprisingly, when we focus on the real-world applicable constraint of $\leq 1\%$ false positive rate (FPR), prior methods fail to reliably detect OOD samples as such. Notably, even Gaussian random noise fails to trigger these popular OOD techniques. We help to alleviate this problem by devising a simple adversarial training scheme that incorporates an attack of the epistemic uncertainty predicted by the dropout ensemble. We demonstrate this method improves OOD detection performance on standard data (i.e., not adversarially crafted), and improves the standardized partial AUC from near-random guessing performance to $\geq 0.75$.
    Incremental Permutation Feature Importance (iPFI): Towards Online Explanations on Data Streams. (arXiv:2209.01939v2 [cs.LG] UPDATED)
    Explainable Artificial Intelligence (XAI) has mainly focused on static learning scenarios so far. We are interested in dynamic scenarios where data is sampled progressively, and learning is done in an incremental rather than a batch mode. We seek efficient incremental algorithms for computing feature importance (FI) measures, specifically, an incremental FI measure based on feature marginalization of absent features similar to permutation feature importance (PFI). We propose an efficient, model-agnostic algorithm called iPFI to estimate this measure incrementally and under dynamic modeling conditions including concept drift. We prove theoretical guarantees on the approximation quality in terms of expectation and variance. To validate our theoretical findings and the efficacy of our approaches compared to traditional batch PFI, we conduct multiple experimental studies on benchmark data with and without concept drift.
    How important are activation functions in regression and classification? A survey, performance comparison, and future directions. (arXiv:2209.02681v2 [cs.LG] UPDATED)
    Inspired by biological neurons, the activation functions play an essential part in the learning process of any artificial neural network commonly used in many real-world problems. Various activation functions have been proposed in the literature for classification as well as regression tasks. In this work, we survey the activation functions that have been employed in the past as well as the current state-of-the-art. In particular, we present various developments in activation functions over the years and the advantages as well as disadvantages or limitations of these activation functions. We also discuss classical (fixed) activation functions, including rectifier units, and adaptive activation functions. In addition to presenting the taxonomy of activation functions based on characterization, a taxonomy of activation functions based on applications is also presented. To this end, the systematic comparison of various fixed and adaptive activation functions is performed for classification data sets such as the MNIST, CIFAR-10, and CIFAR-100. In recent years, a physics-informed machine learning framework has emerged for solving problems related to scientific computations. To this purpose, we also discuss various requirements for activation functions that have been used in the physics-informed machine learning framework. Furthermore, various comparisons are made among different fixed and adaptive activation functions using various machine learning libraries such as TensorFlow, Pytorch, and JAX.
    Video Restoration with a Deep Plug-and-Play Prior. (arXiv:2209.02854v1 [eess.IV])
    This paper presents a novel method for restoring digital videos via a Deep Plug-and-Play (PnP) approach. Under a Bayesian formalism, the method consists in using a deep convolutional denoising network in place of the proximal operator of the prior in an alternating optimization scheme. We distinguish ourselves from prior PnP work by directly applying that method to restore a digital video from a degraded video observation. This way, a network trained once for denoising can be repurposed for other video restoration tasks. Our experiments in video deblurring, super-resolution, and interpolation of random missing pixels all show a clear benefit to using a network specifically designed for video denoising, as it yields better restoration performance and better temporal stability than a single image network with similar denoising performance using the same PnP formulation. Moreover, our method compares favorably to applying a different state-of-the-art PnP scheme separately on each frame of the sequence. This opens new perspectives in the field of video restoration.
    SegDiff: Image Segmentation with Diffusion Probabilistic Models. (arXiv:2112.00390v3 [cs.CV] UPDATED)
    Diffusion Probabilistic Methods are employed for state-of-the-art image generation. In this work, we present a method for extending such models for performing image segmentation. The method learns end-to-end, without relying on a pre-trained backbone. The information in the input image and in the current estimation of the segmentation map is merged by summing the output of two encoders. Additional encoding layers and a decoder are then used to iteratively refine the segmentation map, using a diffusion model. Since the diffusion model is probabilistic, it is applied multiple times, and the results are merged into a final segmentation map. The new method produces state-of-the-art results on the Cityscapes validation set, the Vaihingen building segmentation benchmark, and the MoNuSeg dataset.
    Communication-Efficient Diffusion Strategy for Performance Improvement of Federated Learning with Non-IID Data. (arXiv:2207.07493v2 [cs.DC] UPDATED)
    Federated learning (FL) is a novel learning paradigm that addresses the privacy leakage challenge of centralized learning. However, in FL, users with non-independent and identically distributed (non-IID) characteristics can deteriorate the performance of the global model. Specifically, the global model suffers from the weight divergence challenge owing to non-IID data. To address the aforementioned challenge, we propose a novel diffusion strategy of the machine learning (ML) model (FedDif) to maximize the FL performance with non-IID data. In FedDif, users spread local models to neighboring users over D2D communications. FedDif enables the local model to experience different distributions before parameter aggregation. Furthermore, we theoretically demonstrate that FedDif can circumvent the weight divergence challenge. On the theoretical basis, we propose the communication-efficient diffusion strategy of the ML model, which can determine the trade-off between the learning performance and communication cost based on auction theory. The performance evaluation results show that FedDif improves the test accuracy of the global model by 10.37% compared to the baseline FL with non-IID settings. Moreover, FedDif improves the number of consumed sub-frames by 1.28 to 2.85 folds to the latest methods except for the model compression scheme. FedDif also improves the number of transmitted models by 1.43 to 2.67 folds to the latest methods.
    Model-Based Policy Search Using Monte Carlo Gradient Estimation with Real Systems Application. (arXiv:2101.12115v4 [cs.LG] UPDATED)
    In this paper, we present a Model-Based Reinforcement Learning (MBRL) algorithm named \emph{Monte Carlo Probabilistic Inference for Learning COntrol} (MC-PILCO). The algorithm relies on Gaussian Processes (GPs) to model the system dynamics and on a Monte Carlo approach to estimate the policy gradient. This defines a framework in which we ablate the choice of the following components: (i) the selection of the cost function, (ii) the optimization of policies using dropout, (iii) an improved data efficiency through the use of structured kernels in the GP models. The combination of the aforementioned aspects affects dramatically the performance of MC-PILCO. Numerical comparisons in a simulated cart-pole environment show that MC-PILCO exhibits better data efficiency and control performance w.r.t. state-of-the-art GP-based MBRL algorithms. Finally, we apply MC-PILCO to real systems, considering in particular systems with partially measurable states. We discuss the importance of modeling both the measurement system and the state estimators during policy optimization. The effectiveness of the proposed solutions has been tested in simulation and on two real systems, a Furuta pendulum and a ball-and-plate rig.
    Use and Misuse of Machine Learning in Anthropology. (arXiv:2209.02811v1 [cs.LG])
    Machine learning (ML), being now widely accessible to the research community at large, has fostered a proliferation of new and striking applications of these emergent mathematical techniques across a wide range of disciplines. In this paper, we will focus on a particular case study: the field of paleoanthropology, which seeks to understand the evolution of the human species based on biological and cultural evidence. As we will show, the easy availability of ML algorithms and lack of expertise on their proper use among the anthropological research community has led to foundational misapplications that have appeared throughout the literature. The resulting unreliable results not only undermine efforts to legitimately incorporate ML into anthropological research, but produce potentially faulty understandings about our human evolutionary and behavioral past. The aim of this paper is to provide a brief introduction to some of the ways in which ML has been applied within paleoanthropology; we also include a survey of some basic ML algorithms for those who are not fully conversant with the field, which remains under active development. We discuss a series of missteps, errors, and violations of correct protocols of ML methods that appear disconcertingly often within the accumulating body of anthropological literature. These mistakes include use of outdated algorithms and practices; inappropriate train/test splits, sample composition, and textual explanations; as well as an absence of transparency due to the lack of data/code sharing, and the subsequent limitations imposed on independent replication. We assert that expanding samples, sharing data and code, re-evaluating approaches to peer review, and, most importantly, developing interdisciplinary teams that include experts in ML are all necessary for progress in future research incorporating ML within anthropology.
    On the utility and protection of optimization with differential privacy and classic regularization techniques. (arXiv:2209.03175v1 [cs.LG])
    Nowadays, owners and developers of deep learning models must consider stringent privacy-preservation rules of their training data, usually crowd-sourced and retaining sensitive information. The most widely adopted method to enforce privacy guarantees of a deep learning model nowadays relies on optimization techniques enforcing differential privacy. According to the literature, this approach has proven to be a successful defence against several models' privacy attacks, but its downside is a substantial degradation of the models' performance. In this work, we compare the effectiveness of the differentially-private stochastic gradient descent (DP-SGD) algorithm against standard optimization practices with regularization techniques. We analyze the resulting models' utility, training performance, and the effectiveness of membership inference and model inversion attacks against the learned models. Finally, we discuss differential privacy's flaws and limits and empirically demonstrate the often superior privacy-preserving properties of dropout and l2-regularization.
    Automatic Meta-Path Discovery for Effective Graph-Based Recommendation. (arXiv:2112.12845v5 [cs.IR] UPDATED)
    Heterogeneous Information Networks (HINs) are labeled graphs that depict relationships among different types of entities (e.g., users, movies and directors). For HINs, meta-path-based recommenders (MPRs) utilize meta-paths (i.e., abstract paths consisting of node and link types) to predict user preference, and have attracted a lot of attention due to their explainability and performance. We observe that the performance of MPRs is highly sensitive to the meta-paths they use, but existing works manually select the meta-paths from many possible ones. Thus, to discover effective meta-paths automatically, we propose the Reinforcement learning-based Meta-path Selection (RMS) framework. Specifically, we define a vector encoding for meta-paths and design a policy network to extend meta-paths. The policy network is trained based on the results of downstream recommendation tasks and an early stopping approximation strategy is proposed to speed up training. RMS is a general model, and it can work with all existing MPRs. We also propose a new MPR called RMS-HRec, which uses an attention mechanism to aggregate information from the meta-paths. We conduct extensive experiments on real datasets. Compared with the manually selected meta-paths, the meta-paths identified by RMS consistently improve recommendation quality. Moreover, RMS-HRec outperforms state-of-the-art recommender systems by an average of 7% in hit ratio. The codes and datasets are available on https://github.com/Stevenn9981/RMS-HRec.
    Supervised Hebbian Learning. (arXiv:2203.01304v2 [cond-mat.dis-nn] UPDATED)
    In neural network's Literature, Hebbian learning traditionally refers to the procedure by which the Hopfield model and its generalizations store archetypes (i.e., definite patterns that are experienced just once to form the synaptic matrix). However, the term "Learning" in Machine Learning refers to the ability of the machine to extract features from the supplied dataset (e.g., made of blurred examples of these archetypes), in order to make its own representation of the unavailable archetypes. Here, given a sample of examples, we define a supervised learning protocol by which the Hopfield network can infer the archetypes, and we detect the correct control parameters (including size and quality of the dataset) to depict a phase diagram for the system performance. We also prove that, for structureless datasets, the Hopfield model equipped with this supervised learning rule is equivalent to a restricted Boltzmann machine and this suggests an optimal and interpretable training routine. Finally, this approach is generalized to structured datasets: we highlight a quasi-ultrametric organization (reminiscent of replica-symmetry-breaking) in the analyzed datasets and, consequently, we introduce an additional "replica hidden layer" for its (partial) disentanglement, which is shown to improve MNIST classification from 75% to 95%, and to offer a new perspective on deep architectures.
    Adapting Rapid Motor Adaptation for Bipedal Robots. (arXiv:2205.15299v2 [cs.RO] UPDATED)
    Recent advances in legged locomotion have enabled quadrupeds to walk on challenging terrains. However, bipedal robots are inherently more unstable and hence it's harder to design walking controllers for them. In this work, we leverage recent advances in rapid adaptation for locomotion control, and extend them to work on bipedal robots. Similar to existing works, we start with a base policy which produces actions while taking as input an estimated extrinsics vector from an adaptation module. This extrinsics vector contains information about the environment and enables the walking controller to rapidly adapt online. However, the extrinsics estimator could be imperfect, which might lead to poor performance of the base policy which expects a perfect estimator. In this paper, we propose A-RMA (Adapting RMA), which additionally adapts the base policy for the imperfect extrinsics estimator by finetuning it using model-free RL. We demonstrate that A-RMA outperforms a number of RL-based baseline controllers and model-based controllers in simulation, and show zero-shot deployment of a single A-RMA policy to enable a bipedal robot, Cassie, to walk in a variety of different scenarios in the real world beyond what it has seen during training. Videos and results at https://ashish-kmr.github.io/a-rma/
    A simple learning agent interacting with an agent-based market model. (arXiv:2208.10434v2 [q-fin.TR] UPDATED)
    We consider the learning dynamics of a single reinforcement learning optimal execution trading agent when it interacts with an event driven agent-based financial market model. Trading takes place asynchronously through a matching engine in event time. The optimal execution agent is considered at different levels of initial order-sizes and differently sized state spaces. The resulting impact on the agent-based model and market are considered using a calibration approach that explores changes in the empirical stylised facts and price impact curves. Convergence, volume trajectory and action trace plots are used to visualise the learning dynamics. Here the smaller state space agents had the number of states they visited converge much faster than the larger state space agents, and they were able to start learning to trade intuitively using the spread and volume states. We find that the moments of the model are robust to the impact of the learning agents except for the Hurst exponent, which was lowered by the introduction of strategic order-splitting. The introduction of the learning agent preserves the shape of the price impact curves but can reduce the trade-sign auto-correlations when their trading volumes increase.
    Decoding Demographic un-fairness from Indian Names. (arXiv:2209.03089v1 [cs.CY])
    Demographic classification is essential in fairness assessment in recommender systems or in measuring unintended bias in online networks and voting systems. Important fields like education and politics, which often lay a foundation for the future of equality in society, need scrutiny to design policies that can better foster equality in resource distribution constrained by the unbalanced demographic distribution of people in the country. We collect three publicly available datasets to train state-of-the-art classifiers in the domain of gender and caste classification. We train the models in the Indian context, where the same name can have different styling conventions (Jolly Abraham/Kumar Abhishikta in one state may be written as Abraham Jolly/Abishikta Kumar in the other). Finally, we also perform cross-testing (training and testing on different datasets) to understand the efficacy of the above models. We also perform an error analysis of the prediction models. Finally, we attempt to assess the bias in the existing Indian system as case studies and find some intriguing patterns manifesting in the complex demographic layout of the sub-continent across the dimensions of gender and caste.
    Hardware Acceleration of Sampling Algorithms in Sample and Aggregate Graph Neural Networks. (arXiv:2209.02916v1 [cs.LG])
    Sampling is an important process in many GNN structures in order to train larger datasets with a smaller computational complexity. However, compared to other processes in GNN (such as aggregate, backward propagation), the sampling process still costs tremendous time, which limits the speed of training. To reduce the time of sampling, hardware acceleration is an ideal choice. However, state of the art GNN acceleration proposal did not specify how to accelerate the sampling process. What's more, directly accelerating traditional sampling algorithms will make the structure of the accelerator very complicated. In this work, we made two contributions: (1) Proposed a new neighbor sampler: CONCAT Sampler, which can be easily accelerated on hardware level while guaranteeing the test accuracy. (2) Designed a CONCAT-sampler-accelerator based on FPGA, with which the neighbor sampling process boosted to about 300-1000 times faster compared to the sampling process without it.
    Studying Bias in GANs through the Lens of Race. (arXiv:2209.02836v1 [cs.CV])
    In this work, we study how the performance and evaluation of generative image models are impacted by the racial composition of their training datasets. By examining and controlling the racial distributions in various training datasets, we are able to observe the impacts of different training distributions on generated image quality and the racial distributions of the generated images. Our results show that the racial compositions of generated images successfully preserve that of the training data. However, we observe that truncation, a technique used to generate higher quality images during inference, exacerbates racial imbalances in the data. Lastly, when examining the relationship between image quality and race, we find that the highest perceived visual quality images of a given race come from a distribution where that race is well-represented, and that annotators consistently prefer generated images of white people over those of Black people.
    Multi-skill Mobile Manipulation for Object Rearrangement. (arXiv:2209.02778v1 [cs.RO])
    We study a modular approach to tackle long-horizon mobile manipulation tasks for object rearrangement, which decomposes a full task into a sequence of subtasks. To tackle the entire task, prior work chains multiple stationary manipulation skills with a point-goal navigation skill, which are learned individually on subtasks. Although more effective than monolithic end-to-end RL policies, this framework suffers from compounding errors in skill chaining, e.g., navigating to a bad location where a stationary manipulation skill can not reach its target to manipulate. To this end, we propose that the manipulation skills should include mobility to have flexibility in interacting with the target object from multiple locations and at the same time the navigation skill could have multiple end points which lead to successful manipulation. We operationalize these ideas by implementing mobile manipulation skills rather than stationary ones and training a navigation skill trained with region goal instead of point goal. We evaluate our multi-skill mobile manipulation method M3 on 3 challenging long-horizon mobile manipulation tasks in the Home Assistant Benchmark (HAB), and show superior performance as compared to the baselines.
    Dual Instrumental Method for Confounded Kernelized Bandits. (arXiv:2209.03224v1 [cs.LG])
    The contextual bandit problem is a theoretically justified framework with wide applications in various fields. While the previous study on this problem usually requires independence between noise and contexts, our work considers a more sensible setting where the noise becomes a latent confounder that affects both contexts and rewards. Such a confounded setting is more realistic and could expand to a broader range of applications. However, the unresolved confounder will cause a bias in reward function estimation and thus lead to a large regret. To deal with the challenges brought by the confounder, we apply the dual instrumental variable regression, which can correctly identify the true reward function. We prove the convergence rate of this method is near-optimal in two types of widely used reproducing kernel Hilbert spaces. Therefore, we can design computationally efficient and regret-optimal algorithms based on the theoretical guarantees for confounded bandit problems. The numerical results illustrate the efficacy of our proposed algorithms in the confounded bandit setting.
    A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games. (arXiv:2209.02838v1 [cs.LG])
    We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v2 [stat.CO] UPDATED)
    Although evaluation of the expectations on the Ising model is essential in various applications, it is mostly infeasible because of intractable multiple summations. Spatial Monte Carlo integration (SMCI) is a sampling-based approximation. It can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator improves monotonically as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such an expansion. In this paper, based on the theory of generalized least squares (GLS), a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).
    Avast-CTU Public CAPE Dataset. (arXiv:2209.03188v1 [cs.CR])
    There is a limited amount of publicly available data to support research in malware analysis technology. Particularly, there are virtually no publicly available datasets generated from rich sandboxes such as Cuckoo/CAPE. The benefit of using dynamic sandboxes is the realistic simulation of file execution in the target machine and obtaining a log of such execution. The machine can be infected by malware hence there is a good chance of capturing the malicious behavior in the execution logs, thus allowing researchers to study such behavior in detail. Although the subsequent analysis of log information is extensively covered in industrial cybersecurity backends, to our knowledge there has been only limited effort invested in academia to advance such log analysis capabilities using cutting edge techniques. We make this sample dataset available to support designing new machine learning methods for malware detection, especially for automatic detection of generic malicious behavior. The dataset has been collected in cooperation between Avast Software and Czech Technical University - AI Center (AIC).
    Inverse modeling of nonisothermal multiphase poromechanics using physics-informed neural networks. (arXiv:2209.03276v1 [cs.LG])
    We propose a solution strategy for parameter identification in multiphase thermo-hydro-mechanical (THM) processes in porous media using physics-informed neural networks (PINNs). We employ a dimensionless form of the THM governing equations that is particularly well suited for the inverse problem, and we leverage the sequential multiphysics PINN solver we developed in previous work. We validate the proposed inverse-modeling approach on multiple benchmark problems, including Terzaghi's isothermal consolidation problem, Barry-Mercer's isothermal injection-production problem, and nonisothermal consolidation of an unsaturated soil layer. We report the excellent performance of the proposed sequential PINN-THM inverse solver, thus paving the way for the application of PINNs to inverse modeling of complex nonlinear multiphysics problems.
    Remote Work Optimization with Robust Multi-channel Graph Neural Networks. (arXiv:2209.03150v1 [cs.SI])
    The spread of COVID-19 leads to the global shutdown of many corporate offices, and encourages companies to open more opportunities that allow employees to work from a remote location. As the workplace type expands from onsite offices to remote areas, an emerging challenge for an online hiring marketplace is how these remote opportunities and user intentions to work remotely can be modeled and matched without prior information. Despite the unprecedented amount of remote jobs posted amid COVID-19, there is no existing approach that can be directly applied. Introducing a brand new workplace type naturally leads to the cold-start problem, which is particularly more common for less active job seekers. It is challenging, if not impossible, to onboard a new workplace type for any predictive model if existing information sources can provide little information related to a new category of jobs, including data from resumes and job descriptions. Hence, in this work, we aim to propose a principled approach that jointly models the remoteness of job seekers and job opportunities with limited information, which also suffices the needs of web-scale applications. Existing research on the emerging type of remote workplace mainly focuses on qualitative studies, and classic predictive modeling approaches are inapplicable considering the problem of cold-start and information scarcity. We precisely try to close this gap with a novel graph neural architecture. Extensive experiments on large-scale data from real-world applications have been conducted to validate the superiority of the proposed approach over competitive baselines. The improvement may translate to more rapid onboarding of the new workplace type that can benefit job seekers who are interested in working remotely.
    Depression Symptoms Modelling from Social Media Text: An Active Learning Approach. (arXiv:2209.02765v1 [cs.CL])
    A fundamental component of user-level social media language based clinical depression modelling is depression symptoms detection (DSD). Unfortunately, there does not exist any DSD dataset that reflects both the clinical insights and the distribution of depression symptoms from the samples of self-disclosed depressed population. In our work, we describe an Active Learning (AL) framework which uses an initial supervised learning model that leverages 1) a state-of-the-art large mental health forum text pre-trained language model further fine-tuned on a clinician annotated DSD dataset, 2) a Zero-Shot learning model for DSD, and couples them together to harvest depression symptoms related samples from our large self-curated Depression Tweets Repository (DTR). Our clinician annotated dataset is the largest of its kind. Furthermore, DTR is created from the samples of tweets in self-disclosed depressed users Twitter timeline from two datasets, including one of the largest benchmark datasets for user-level depression detection from Twitter. This further helps preserve the depression symptoms distribution of self-disclosed Twitter users tweets. Subsequently, we iteratively retrain our initial DSD model with the harvested data. We discuss the stopping criteria and limitations of this AL process, and elaborate the underlying constructs which play a vital role in the overall AL process. We show that we can produce a final dataset which is the largest of its kind. Furthermore, a DSD and a Depression Post Detection (DPD) model trained on it achieves significantly better accuracy than their initial version.
    Riemannian optimization for non-centered mixture of scaled Gaussian distributions. (arXiv:2209.03315v1 [cs.LG])
    This paper studies the statistical model of the non-centered mixture of scaled Gaussian distributions (NC-MSG). Using the Fisher-Rao information geometry associated to this distribution, we derive a Riemannian gradient descent algorithm. This algorithm is leveraged for two minimization problems. The first one is the minimization of a regularized negative log- likelihood (NLL). The latter makes the trade-off between a white Gaussian distribution and the NC-MSG. Conditions on the regularization are given so that the existence of a minimum to this problem is guaranteed without assumptions on the samples. Then, the Kullback-Leibler (KL) divergence between two NC-MSG is derived. This divergence enables us to define a minimization problem to compute centers of mass of several NC-MSGs. The proposed Riemannian gradient descent algorithm is leveraged to solve this second minimization problem. Numerical experiments show the good performance and the speed of the Riemannian gradient descent on the two problems. Finally, a Nearest centroid classifier is implemented leveraging the KL divergence and its associated center of mass. Applied on the large scale dataset Breizhcrops, this classifier shows good accuracies as well as robustness to rigid transformations of the test set.
    Visual correspondence-based explanations improve AI robustness and human-AI team accuracy. (arXiv:2208.00780v3 [cs.CV] UPDATED)
    Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of self-interpretable image classifiers that first explain, and then predict (as opposed to post-hoc explanations) by harnessing the visual correspondences between a query image and exemplars. Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets while performing marginally worse (by 1 to 2 points) on in-distribution tests than ResNet-50 and a $k$-nearest neighbor classifier (kNN). Via a large-scale, human study on ImageNet and CUB, our correspondence-based explanations are found to be more useful to users than kNN explanations. Our explanations help users more accurately reject AI's wrong decisions than all other tested methods. Interestingly, for the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone), in ImageNet and CUB image classification tasks.
    EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering. (arXiv:2010.01333v3 [cs.LG] UPDATED)
    The Gaussian mixture model (GMM) provides a simple yet principled framework for clustering, with properties suitable for statistical inference. In this paper, we propose a new model-based clustering algorithm, called EGMM (evidential GMM), in the theoretical framework of belief functions to better characterize cluster-membership uncertainty. With a mass function representing the cluster membership of each object, the evidential Gaussian mixture distribution composed of the components over the powerset of the desired clusters is proposed to model the entire dataset. The parameters in EGMM are estimated by a specially designed Expectation-Maximization (EM) algorithm. A validity index allowing automatic determination of the proper number of clusters is also provided. The proposed EGMM is as simple as the classical GMM, but can generate a more informative evidential partition for the considered dataset. The synthetic and real dataset experiments show that the proposed EGMM performs better than other representative clustering algorithms. Besides, its superiority is also demonstrated by an application to multi-modal brain image segmentation.
    On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs. (arXiv:2209.02864v1 [cs.LG])
    In reinforcement learning, Monte Carlo algorithms update the Q function by averaging the episodic returns. In the Monte Carlo UCB (MC-UCB) algorithm, the action taken in each state is the action that maximizes the Q function plus a UCB exploration term, which biases the choice of actions to those that have been chosen less frequently. Although there has been significant work on establishing regret bounds for MC-UCB, most of that work has been focused on finite-horizon versions of the problem, for which each episode terminates after a constant number of steps. For such finite-horizon problems, the optimal policy depends both on the current state and the time within the episode. However, for many natural episodic problems, such as games like Go and Chess and robotic tasks, the episode is of random length and the optimal policy is stationary. For such environments, it is an open question whether the Q-function in MC-UCB will converge to the optimal Q function; we conjecture that, unlike Q-learning, it does not converge for all MDPs. We nevertheless show that for a large class of MDPs, which includes stochastic MDPs such as blackjack and deterministic MDPs such as Go, the Q-function in MC-UCB converges almost surely to the optimal Q function. An immediate corollary of this result is that it also converges almost surely for all finite-horizon MDPs. We also provide numerical experiments, providing further insights into MC-UCB.
    LPGNet: Link Private Graph Networks for Node Classification. (arXiv:2205.03105v2 [cs.LG] UPDATED)
    Classification tasks on labeled graph-structured data have many important applications ranging from social recommendation to financial modeling. Deep neural networks are increasingly being used for node classification on graphs, wherein nodes with similar features have to be given the same label. Graph convolutional networks (GCNs) are one such widely studied neural network architecture that perform well on this task. However, powerful link-stealing attacks on GCNs have recently shown that even with black-box access to the trained model, inferring which links (or edges) are present in the training graph is practical. In this paper, we present a new neural network architecture called LPGNet for training on graphs with privacy-sensitive edges. LPGNet provides differential privacy (DP) guarantees for edges using a novel design for how graph edge structure is used during training. We empirically show that LPGNet models often lie in the sweet spot between providing privacy and utility: They can offer better utility than "trivially" private architectures which use no edge information (e.g., vanilla MLPs) and better resilience against existing link-stealing attacks than vanilla GCNs which use the full edge structure. LPGNet also offers consistently better privacy-utility tradeoffs than DPGCN, which is the state-of-the-art mechanism for retrofitting differential privacy into conventional GCNs, in most of our evaluated datasets.
    A Data-driven Reduced Order Modeling Approach Applied In Context Of Numerical Analysis And Optimization Of Plastic Profile Extrusion. (arXiv:2209.03121v1 [math.NA])
    In course of this work, we examine the process of plastic profile extrusion, where a polymer melt is shaped inside the so-called extrusion die and fixed in its shape by solidification in the downstream calibration unit. More precise, we focus on the development of a data-driven reduced order model (ROM) for the purpose of predicting temperature distributions within the extruded profiles inside the calibration unit. Therein, the ROM functions as a first step to our overall goal of prediction based process control in order to avoid undesired warpage and damages of the final product.
    On the Effectiveness of Compact Biomedical Transformers. (arXiv:2209.03182v1 [cs.CL])
    Language models pre-trained on biomedical corpora, such as BioBERT, have recently shown promising results on downstream biomedical tasks. Many existing pre-trained models, on the other hand, are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension, and number of layers. The natural language processing (NLP) community has developed numerous strategies to compress these models utilising techniques such as pruning, quantisation, and knowledge distillation, resulting in models that are considerably faster, smaller, and subsequently easier to use in practice. By the same token, in this paper we introduce six lightweight models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT, TinyBioBERT, and CompactBioBERT which are obtained either by knowledge distillation from a biomedical teacher or continual learning on the Pubmed dataset via the Masked Language Modelling (MLM) objective. We evaluate all of our models on three biomedical tasks and compare them with BioBERT-v1.1 to create efficient lightweight models that perform on par with their larger counterparts. All the models will be publicly available on our Huggingface profile at https://huggingface.co/nlpie and the codes used to run the experiments will be available at https://github.com/nlpie-research/Compact-Biomedical-Transformers.
    Out of Distribution Detection, Generalization, and Robustness Triangle with Maximum Probability Theorem. (arXiv:2203.12145v2 [cs.LG] UPDATED)
    Maximum Probability Framework, powered by Maximum Probability Theorem, is a recent theoretical development in artificial intelligence, aiming to formally define probabilistic models, guiding development of objective functions, and regularization of probabilistic models. MPT uses the probability distribution that the models assume on random variables to provide an upper bound on the probability of the model. We apply MPT to challenging out-of-distribution (OOD) detection problems in computer vision by incorporating MPT as a regularization scheme in the training of CNNs and their energy-based variants. We demonstrate the effectiveness of the proposed method on 1080 trained models, with varying hyperparameters, and conclude that the MPT-based regularization strategy stabilizes and improves the generalization and robustness of base models in addition to enhanced OOD performance on CIFAR10, CIFAR100, and MNIST datasets.
    Learning Canonical Embeddings for Unsupervised Shape Correspondence with Locally Linear Transformations. (arXiv:2209.02152v2 [cs.CV] UPDATED)
    We present a new approach to unsupervised shape correspondence learning between pairs of point clouds. We make the first attempt to adapt the classical locally linear embedding algorithm (LLE) -- originally designed for nonlinear dimensionality reduction -- for shape correspondence. The key idea is to find dense correspondences between shapes by first obtaining high-dimensional neighborhood-preserving embeddings of low-dimensional point clouds and subsequently aligning the source and target embeddings using locally linear transformations. We demonstrate that learning the embedding using a new LLE-inspired point cloud reconstruction objective results in accurate shape correspondences. More specifically, the approach comprises an end-to-end learnable framework of extracting high-dimensional neighborhood-preserving embeddings, estimating locally linear transformations in the embedding space, and reconstructing shapes via divergence measure-based alignment of probabilistic density functions built over reconstructed and target shapes. Our approach enforces embeddings of shapes in correspondence to lie in the same universal/canonical embedding space, which eventually helps regularize the learning process and leads to a simple nearest neighbors approach between shape embeddings for finding reliable correspondences. Comprehensive experiments show that the new method makes noticeable improvements over state-of-the-art approaches on standard shape correspondence benchmark datasets covering both human and nonhuman shapes.  ( 3 min )
    Generative Principal Component Analysis. (arXiv:2203.09693v2 [stat.ML] UPDATED)
    In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.
    POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning. (arXiv:2205.11357v2 [cs.LG] UPDATED)
    The goal of Unsupervised Reinforcement Learning (URL) is to find a reward-agnostic prior policy on a task domain, such that the sample-efficiency on supervised downstream tasks is improved. Although agents initialized with such a prior policy can achieve a significantly higher reward with fewer samples when finetuned on the downstream task, it is still an open question how an optimal pretrained prior policy can be achieved in practice. In this work, we present POLTER (Policy Trajectory Ensemble Regularization) - a general method to regularize the pretraining that can be applied to any URL algorithm and is especially useful on data- and knowledge-based URL algorithms. It utilizes an ensemble of policies that are discovered during pretraining and moves the policy of the URL algorithm closer to its optimal prior. Our method is based on a theoretical framework, and we analyze its practical effects on a white-box benchmark, allowing us to study POLTER with full control. In our main experiments, we evaluate POLTER on the Unsupervised Reinforcement Learning Benchmark (URLB), which consists of 12 tasks in 3 domains. We demonstrate the generality of our approach by improving the performance of a diverse set of data- and knowledge-based URL algorithms by 19% on average and up to 40% in the best case. Under a fair comparison with tuned baselines and tuned POLTER, we establish a new the state-of-the-art on the URLB.
    Inversion of Time-Lapse Surface Gravity Data for Detection of 3D CO$_2$ Plumes via Deep Learning. (arXiv:2209.02850v1 [cs.LG])
    We introduce three algorithms that invert simulated gravity data to 3D subsurface rock/flow properties. The first algorithm is a data-driven, deep learning-based approach, the second mixes a deep learning approach with physical modeling into a single workflow, and the third considers the time dependence of surface gravity monitoring. The target application of these proposed algorithms is the prediction of subsurface CO$_2$ plumes as a complementary tool for monitoring CO$_2$ sequestration deployments. Each proposed algorithm outperforms traditional inversion methods and produces high-resolution, 3D subsurface reconstructions in near real-time. Our proposed methods achieve Dice scores of up to 0.8 for predicted plume geometry and near perfect data misfit in terms of $\mu$Gals. These results indicate that combining 4D surface gravity monitoring with deep learning techniques represents a low-cost, rapid, and non-intrusive method for monitoring CO$_2$ storage sites.
    Reconstructing signed relations from interaction data. (arXiv:2209.03219v1 [cs.SI])
    Positive and negative relations play an essential role in human behavior and shape the communities we live in. Despite their importance, data about signed relations is rare and commonly gathered through surveys. Interaction data is more abundant, for instance, in the form of proximity or communication data. So far, though, it could not be utilized to detect signed relations. In this paper, we show how the underlying signed relations can be extracted with such data. Employing a statistical network approach, we construct networks of signed relations in four communities. We then show that these relations correspond to the ones reported in surveys. Additionally, the inferred relations allow us to study the homophily of individuals with respect to gender, religious beliefs, and financial backgrounds. We evaluate the importance of triads in the signed network to study group cohesion.
    Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation. (arXiv:2201.05587v2 [cs.LG] UPDATED)
    Auto-scheduling for tensor programs is a process where a search algorithm automatically explores candidate schedules (program transformations) for a given program on a target hardware platform to improve its performance. However this can be a very time consuming process depending on the complexity of the tensor program and the capacity of the target device, with often many thousands of program variants being explored. To address this, in this paper we introduce the idea of transfer-tuning, a novel approach to identify and reuse auto-schedules between tensor programs. We demonstrate this concept using Deep Neural Networks (DNNs), taking sets of auto-schedules from pre-tuned DNNs and using them to reduce the inference time of a new DNN. We compare transfer-tuning against the state-of-the-art Ansor auto-scheduler, defining the maximum possible speedup for a given DNN model as what Ansor achieves using its recommended full tuning time. On a server-class CPU and across 11 widely used DNN models, we observe that transfer-tuning achieves up to $88.41\%$ ($49.13\%$ on average) of this maximum speedup, while Ansor requires $6.5\times$ more search time on average to match it. We also evaluate transfer-tuning on a constrained edge CPU and observe that the differences in search time are exacerbated, with Ansor requiring $10.8\times$ more time on average to match transfer-tuning's speedup, which further demonstrates its value. Our code is available at https://www.github.com/gicLAB/transfer-tuning
    Understanding microbiome dynamics via interpretable graph representation learning. (arXiv:2203.01830v2 [q-bio.QM] UPDATED)
    Large-scale perturbations in the microbiome constitution are strongly correlated, whether as a driver or a consequence, with the health and functioning of human physiology. However, understanding the difference in the microbiome profiles of healthy and ill individuals can be complicated due to the large number of complex interactions among microbes. We propose to model these interactions as a time-evolving graph whose nodes are microbes and edges are interactions among them. Motivated by the need to analyse such complex interactions, we develop a method that learns a low-dimensional representation of the time-evolving graph and maintains the dynamics occurring in the high-dimensional space. Through our experiments, we show that we can extract graph features such as clusters of nodes or edges that have the highest impact on the model to learn the low-dimensional representation. This information can be crucial to identify microbes and interactions among them that are strongly correlated with clinical diseases. We conduct our experiments on both synthetic and real-world microbiome datasets.
    Non-Gaussian Process Regression. (arXiv:2209.03117v1 [stat.ML])
    Standard GPs offer a flexible modelling tool for well-behaved processes. However, deviations from Gaussianity are expected to appear in real world datasets, with structural outliers and shocks routinely observed. In these cases GPs can fail to model uncertainty adequately and may over-smooth inferences. Here we extend the GP framework into a new class of time-changed GPs that allow for straightforward modelling of heavy-tailed non-Gaussian behaviours, while retaining a tractable conditional GP structure through an infinite mixture of non-homogeneous GPs representation. The conditional GP structure is obtained by conditioning the observations on a latent transformed input space and the random evolution of the latent transformation is modelled using a L\'{e}vy process which allows Bayesian inference in both the posterior predictive density and the latent transformation function. We present Markov chain Monte Carlo inference procedures for this model and demonstrate the potential benefits compared to a standard GP.
    On the Convergence of the ELBO to Entropy Sums. (arXiv:2209.03077v1 [stat.ML])
    The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many learning algorithms including algorithms for deep unsupervised learning. Learning algorithms change model parameters such that the variational lower bound increases, and until the parameters are close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For models with one set of latents and one set observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distributions. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary points (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many (and presumably most) standard generative models (including deep models). As concrete examples we discuss probabilistic PCA and Sigmoid Belief Networks. The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family (with constant base measure), and a model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
    A Comprehensive Survey on Radio Frequency (RF) Fingerprinting: Traditional Approaches, Deep Learning, and Open Challenges. (arXiv:2201.00680v3 [cs.LG] UPDATED)
    Fifth generation (5G) network and beyond envision massive Internet of Things (IoT) rollout to support disruptive applications such as extended reality (XR), augmented/virtual reality (AR/VR), industrial automation, autonomous driving, and smart everything which brings together massive and diverse IoT devices occupying the radio frequency (RF) spectrum. Along with the spectrum crunch and throughput challenges, such a massive scale of wireless devices exposes unprecedented threat surfaces. RF fingerprinting is heralded as a candidate technology that can be combined with cryptographic and zero-trust security measures to ensure data privacy, confidentiality, and integrity in wireless networks. Motivated by the relevance of this subject in the future communication networks, in this work, we present a comprehensive survey of RF fingerprinting approaches ranging from a traditional view to the most recent deep learning (DL)-based algorithms. Existing surveys have mostly focused on a constrained presentation of the wireless fingerprinting approaches, however, many aspects remain untold. In this work, however, we mitigate this by addressing every aspect - background on signal intelligence (SIGINT), applications, relevant DL algorithms, systematic literature review of RF fingerprinting techniques spanning the past two decades, discussion on datasets, and potential research avenues - necessary to elucidate this topic to the reader in an encyclopedic manner.
    DANets: Deep Abstract Networks for Tabular Data Classification and Regression. (arXiv:2112.02962v4 [cs.LG] UPDATED)
    Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e.g., convolution) and extensible neural networks (e.g., ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. Also, we design a structure re-parameterization method to compress the learned AbstLay, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks. In DANets, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our AbstLay and DANets are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANet as it goes deep, verifying the extendibility of our method. Our code is available at https://github.com/WhatAShot/DANet.
    TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation. (arXiv:2106.06326v2 [cs.LG] UPDATED)
    In few-shot domain adaptation (FDA), classifiers for the target domain are trained with accessible labeled data in the source domain (SD) and few labeled data in the target domain (TD). However, data usually contain private information in the current era, e.g., data distributed on personal phones. Thus, the private information will be leaked if we directly access data in SD to train a target-domain classifier (required by FDA methods). In this paper, to thoroughly prevent the privacy leakage in SD, we consider a very challenging problem setting, where the classifier for the TD has to be trained using few labeled target data and a well-trained SD classifier, named few-shot hypothesis adaptation (FHA). In FHA, we cannot access data in SD, as a result, the private information in SD will be protected well. To this end, we propose a target orientated hypothesis adaptation network (TOHAN) to solve the FHA problem, where we generate highly-compatible unlabeled data (i.e., an intermediate domain) to help train a target-domain classifier. TOHAN maintains two deep networks simultaneously, where one focuses on learning an intermediate domain and the other takes care of the intermediate-to-target distributional adaptation and the target-risk minimization. Experimental results show that TOHAN outperforms competitive baselines significantly.
    Neural-Symbolic Models for Logical Queries on Knowledge Graphs. (arXiv:2205.10128v2 [cs.AI] UPDATED)
    Answering complex first-order logic (FOL) queries on knowledge graphs is a fundamental task for multi-hop reasoning. Traditional symbolic methods traverse a complete knowledge graph to extract the answers, which provides good interpretation for each step. Recent neural methods learn geometric embeddings for complex queries. These methods can generalize to incomplete knowledge graphs, but their reasoning process is hard to interpret. In this paper, we propose Graph Neural Network Query Executor (GNN-QE), a neural-symbolic model that enjoys the advantages of both worlds. GNN-QE decomposes a complex FOL query into relation projections and logical operations over fuzzy sets, which provides interpretability for intermediate variables. To reason about the missing links, GNN-QE adapts a graph neural network from knowledge graph completion to execute the relation projections, and models the logical operations with product fuzzy logic. Experiments on 3 datasets show that GNN-QE significantly improves over previous state-of-the-art models in answering FOL queries. Meanwhile, GNN-QE can predict the number of answers without explicit supervision, and provide visualizations for intermediate variables.  ( 2 min )
    Morphology-preserving Autoregressive 3D Generative Modelling of the Brain. (arXiv:2209.03177v1 [eess.IV])
    Human anatomy, morphology, and associated diseases can be studied using medical imaging data. However, access to medical imaging data is restricted by governance and privacy concerns, data ownership, and the cost of acquisition, thus limiting our ability to understand the human body. A possible solution to this issue is the creation of a model able to learn and then generate synthetic images of the human body conditioned on specific characteristics of relevance (e.g., age, sex, and disease status). Deep generative models, in the form of neural networks, have been recently used to create synthetic 2D images of natural scenes. Still, the ability to produce high-resolution 3D volumetric imaging data with correct anatomical morphology has been hampered by data scarcity and algorithmic and computational limitations. This work proposes a generative model that can be scaled to produce anatomically correct, high-resolution, and realistic images of the human brain, with the necessary quality to allow further downstream analyses. The ability to generate a potentially unlimited amount of data not only enables large-scale studies of human anatomy and pathology without jeopardizing patient privacy, but also significantly advances research in the field of anomaly detection, modality synthesis, learning under limited data, and fair and ethical AI. Code and trained models are available at: https://github.com/AmigoLab/SynthAnatomy.  ( 3 min )
    Ultra-low-power Range Error Mitigation for Ultra-wideband Precise Localization. (arXiv:2209.03021v1 [cs.LG])
    Precise and accurate localization in outdoor and indoor environments is a challenging problem that currently constitutes a significant limitation for several practical applications. Ultra-wideband (UWB) localization technology represents a valuable low-cost solution to the problem. However, non-line-of-sight (NLOS) conditions and complexity of the specific radio environment can easily introduce a positive bias in the ranging measurement, resulting in highly inaccurate and unsatisfactory position estimation. In the light of this, we leverage the latest advancement in deep neural network optimization techniques and their implementation on ultra-low-power microcontrollers to introduce an effective range error mitigation solution that provides corrections in either NLOS or LOS conditions with a few mW of power. Our extensive experimentation endorses the advantages and improvements of our low-cost and power-efficient methodology.
    Semantic Interactive Learning for Text Classification: A Constructive Approach for Contextual Interactions. (arXiv:2209.02984v1 [cs.HC])
    Interactive Machine Learning (IML) shall enable intelligent systems to interactively learn from their end-users, and is quickly becoming more and more important. Although it puts the human in the loop, interactions are mostly performed via mutual explanations that miss contextual information. Furthermore, current model-agnostic IML strategies like CAIPI are limited to 'destructive' feedback, meaning they solely allow an expert to prevent a learner from using irrelevant features. In this work, we propose a novel interaction framework called Semantic Interactive Learning for the text domain. We frame the problem of incorporating constructive and contextual feedback into the learner as a task to find an architecture that (a) enables more semantic alignment between humans and machines and (b) at the same time helps to maintain statistical characteristics of the input domain when generating user-defined counterexamples based on meaningful corrections. Therefore, we introduce a technique called SemanticPush that is effective for translating conceptual corrections of humans to non-extrapolating training examples such that the learner's reasoning is pushed towards the desired behavior. In several experiments, we show that our method clearly outperforms CAIPI, a state of the art IML strategy, in terms of Predictive Performance as well as Local Explanation Quality in downstream multi-class classification tasks.
    AudioLM: a Language Modeling Approach to Audio Generation. (arXiv:2209.03143v1 [cs.SD])
    We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, we leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis. By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts. When trained on speech, and without any transcript or annotation, AudioLM generates syntactically and semantically plausible speech continuations while also maintaining speaker identity and prosody for unseen speakers. Furthermore, we demonstrate how our approach extends beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music.
    Bayesian learning of feature spaces for multitasks problems. (arXiv:2209.03028v1 [stat.ML])
    This paper presents a Bayesian framework to construct non-linear, parsimonious, shallow models for multitask regression. The proposed framework relies on the fact that Random Fourier Features (RFFs) enables the approximation of an RBF kernel by an extreme learning machine whose hidden layer is formed by RFFs. The main idea is to combine both dual views of a same model under a single Bayesian formulation that extends the Sparse Bayesian Extreme Learning Machines to multitask problems. From the kernel methods point of view, the proposed formulation facilitates the introduction of prior domain knowledge through the RBF kernel parameter. From the extreme learning machines perspective, the new formulation helps control overfitting and enables a parsimonious overall model (the models that serve each task share a same set of RFFs selected within the joint Bayesian optimisation). The experimental results show that combining advantages from kernel methods and extreme learning machines within the same framework can lead to significant improvements in the performance achieved by each of these two paradigms independently.  ( 2 min )
    A learning theory for quantum photonic processors and beyond. (arXiv:2209.03075v1 [quant-ph])
    We consider the tasks of learning quantum states, measurements and channels generated by continuous-variable (CV) quantum circuits. This family of circuits is suited to describe optical quantum technologies and in particular it includes state-of-the-art photonic processors capable of showing quantum advantage. We define classes of functions that map classical variables, encoded into the CV circuit parameters, to outcome probabilities evaluated on those circuits. We then establish efficient learnability guarantees for such classes, by computing bounds on their pseudo-dimension or covering numbers, showing that CV quantum circuits can be learned with a sample complexity that scales polynomially with the circuit's size, i.e., the number of modes. Our results establish that CV circuits can be trained efficiently using a number of training samples that, unlike their finite-dimensional counterpart, does not scale with the circuit depth.  ( 2 min )
    A modified SIRD model for Covid19 spread prediction using ensemble neural networks. (arXiv:2203.00407v2 [cs.LG] UPDATED)
    In this paper, we propose an analysis of Covid19 evolution and prediction on Romania combined with the mathematical model of SIRD, an extension of the classical model SIR, which includes the deceased as a separate category. The reason is that, because we can not fully trust the reported numbers of infected or recovered people, we base our analysis on the more reliable number of deceased people. In addition, one of the parameters of our model includes the proportion of infected and tested versus infected. Since there are many factors which have an impact on the evolution of the pandemic, we decide to treat the estimation and the prediction based on the previous 7 days of data, particularly important here being the number of deceased. We perform the estimation and prediction using neural networks in two steps. Firstly, by simulating data with our model, we train several neural networks which learn the parameters of the model. Secondly, we use an ensemble of ten of these neural networks to forecast the parameters from the real data of Covid19 in Romania. Many of these results are backed up by a theorem which guarantees that we can recover the parameters from the reported data.
    Optimal Sensor Placement in Body Surface Networks using Gaussian Processes. (arXiv:2209.02912v1 [cs.LG])
    This paper explores a new sequential selection framework for the optimal sensor placement (OSP) in Electrocardiography imaging networks (ECGI). The proposed methodology incorporates the use a recent experimental design method for the sequential selection of landmarkings on biological objects, namely, Gaussian process landmarking (GPLMK) for better exploration of the candidate sensors. The two experimental design methods work as a source of the training and the validation locations which is fitted using a spatiotemporal Gaussian process (STGP). The STGP is fitted using the training set to predict for the current validation set generated using GPLMK, and the sensor with the largest prediction absolute error is selected from the current validation set and added to the selected sensors. Next, a new validation set is generated and predicted using the current training set. The process continues until selecting a specific number of sensor locations. The study is conducted on a dataset of body surface potential mapping (BSPM) of 352 electrodes of four human subjects. A number of 30 sensor locations is selected using the proposed algorithm. The selected sensor locations achieved average $R^2 = 94.40 \%$ for estimating the whole-body QRS segment. The proposed method adds to design efforts for a more clinically practical ECGI system by improving its wearability and reduce the design cost as well.
    RF Fingerprinting Needs Attention: Multi-task Approach for Real-World WiFi and Bluetooth. (arXiv:2209.03142v1 [cs.LG])
    A novel cross-domain attentional multi-task architecture - xDom - for robust real-world wireless radio frequency (RF) fingerprinting is presented in this work. To the best of our knowledge, this is the first time such comprehensive attention mechanism is applied to solve RF fingerprinting problem. In this paper, we resort to real-world IoT WiFi and Bluetooth (BT) emissions (instead of synthetic waveform generation) in a rich multipath and unavoidable interference environment in an indoor experimental testbed. We show the impact of the time-frame of capture by including waveforms collected over a span of months and demonstrate the same time-frame and multiple time-frame fingerprinting evaluations. The effectiveness of resorting to a multi-task architecture is also experimentally proven by conducting single-task and multi-task model analyses. Finally, we demonstrate the significant gain in performance achieved with the proposed xDom architecture by benchmarking against a well-known state-of-the-art model for fingerprinting. Specifically, we report performance improvements by up to 59.3% and 4.91x under single-task WiFi and BT fingerprinting respectively, and up to 50.5% increase in fingerprinting accuracy under the multi-task setting.  ( 2 min )
    Federated Transfer Learning with Multimodal Data. (arXiv:2209.03137v1 [cs.LG])
    Smart cars, smartphones and other devices in the Internet of Things (IoT), which usually have more than one sensors, produce multimodal data. Federated Learning supports collecting a wealth of multimodal data from different devices without sharing raw data. Transfer Learning methods help transfer knowledge from some devices to others. Federated Transfer Learning methods benefit both Federated Learning and Transfer Learning. This newly proposed Federated Transfer Learning framework aims at connecting data islands with privacy protection. Our construction is based on Federated Learning and Transfer Learning. Compared with previous Federated Transfer Learnings, where each user should have data with identical modalities (either all unimodal or all multimodal), our new framework is more generic, it allows a hybrid distribution of user data. The core strategy is to use two different but inherently connected training methods for our two types of users. Supervised Learning is adopted for users with only unimodal data (Type 1), while Self-Supervised Learning is applied to user with multimodal data (Type 2) for both the feature of each modality and the connection between them. This connection knowledge of Type 2 will help Type 1 in later stages of training. Training in the new framework can be divided in three steps. In the first step, users who have data with the identical modalities are grouped together. For example, user with only sound signals are in group one, and those with only images are in group two, and users with multimodal data are in group three, and so on. In the second step, Federated Learning is executed within the groups, where Supervised Learning and Self-Supervised Learning are used depending on the group's nature. Most of the Transfer Learning happens in the third step, where the related parts in the network obtained from the previous steps are aggregated (federated).  ( 3 min )
    Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks. (arXiv:1909.11799v5 [cs.LG] UPDATED)
    Decision forests (Forests), in particular random forests and gradient boosting trees, have demonstrated state-of-the-art accuracy compared to other methods in many supervised learning scenarios. In particular, Forests dominate other methods in tabular data, that is, when the feature space is unstructured, so that the signal is invariant to a permutation of the feature indices. However, in structured data lying on a manifold (such as images, text, and speech) deep networks (Networks), specifically convolutional deep networks (ConvNets), tend to outperform Forests. We conjecture that at least part of the reason for this is that the input to Networks is not simply the feature magnitudes, but also their indices. In contrast, naive Forest implementations fail to explicitly consider feature indices. A recently proposed Forest approach demonstrates that Forests, for each node, implicitly sample a random matrix from some specific distribution. These Forests, like some classes of Networks, learn by partitioning the feature space into convex polytopes corresponding to linear functions. We build on that approach and show that one can choose distributions in a manifold-aware fashion to incorporate feature locality. We demonstrate the empirical performance on data whose features live on three different manifolds: a torus, images, and time-series. Moreover, we demonstrate its strength in multivariate simulated settings and also show superiority in predicting surgical outcome in epilepsy patients and predicting movement direction from raw stereotactic EEG data from non-motor brain regions. In all simulations and real data, Manifold Oblique Random Forest (MORF) algorithm outperforms approaches that ignore feature space structure and challenges the performance of ConvNets. Moreover, MORF runs fast and maintains interpretability and theoretical justification.  ( 3 min )
    Multitask Learning via Shared Features: Algorithms and Hardness. (arXiv:2209.03112v1 [cs.LG])
    We investigate the computational efficiency of multitask learning of Boolean functions over the $d$-dimensional hypercube, that are related by means of a feature representation of size $k \ll d$ shared across all tasks. We present a polynomial time multitask learning algorithm for the concept class of halfspaces with margin $\gamma$, which is based on a simultaneous boosting technique and requires only $\textrm{poly}(k/\gamma)$ samples-per-task and $\textrm{poly}(k\log(d)/\gamma)$ samples in total. In addition, we prove a computational separation, showing that assuming there exists a concept class that cannot be learned in the attribute-efficient model, we can construct another concept class such that can be learned in the attribute-efficient model, but cannot be multitask learned efficiently -- multitask learning this concept class either requires super-polynomial time complexity or a much larger total number of samples.
    Open-Ended Evolution for Minecraft Building Generation. (arXiv:2209.03108v1 [cs.LG])
    This paper proposes a procedural content generator which evolves Minecraft buildings according to an open-ended and intrinsic definition of novelty. To realize this goal we evaluate individuals' novelty in the latent space using a 3D autoencoder, and alternate between phases of exploration and transformation. During exploration the system evolves multiple populations of CPPNs through CPPN-NEAT and constrained novelty search in the latent space (defined by the current autoencoder). We apply a set of repair and constraint functions to ensure candidates adhere to basic structural rules and constraints during evolution. During transformation, we reshape the boundaries of the latent space to identify new interesting areas of the solution space by retraining the autoencoder with novel content. In this study we evaluate five different approaches for training the autoencoder during transformation and its impact on populations' quality and diversity during evolution. Our results show that by retraining the autoencoder we can achieve better open-ended complexity compared to a static model, which is further improved when retraining using larger datasets of individuals with diverse complexities.
    Quantum-machine-learning channel discrimination. (arXiv:2206.09933v2 [quant-ph] UPDATED)
    In the problem of quantum channel discrimination, one distinguishes between a given number of quantum channels, which is done by sending an input state through a channel and measuring the output state. This work studies applications of variational quantum circuits and machine learning techniques for discriminating such channels. In particular, we explore (i) the practical implementation of embedding this task into the framework of variational quantum computing, (ii) training a quantum classifier based on variational quantum circuits, and (iii) applying the quantum kernel estimation technique. For testing these three channel discrimination approaches, we considered a pair of entanglement-breaking channels and the depolarizing channel with two different depolarization factors. For the approach (i), we address solving the quantum channel discrimination problem using widely discussed parallel and sequential strategies. We show the advantage of the latter in terms of better convergence with less quantum resources. Quantum channel discrimination with a variational quantum classifier (ii) allows one to operate even with random and mixed input states and simple variational circuits. The kernel-based classification approach (iii) is also found effective as it allows one to discriminate depolarizing channels associated not with just fixed values of the depolarization factor, but with ranges of it. Additionally, we discovered that a simple modification of one of the commonly used kernels significantly increases the efficiency of this approach. Finally, our numerical findings reveal that the performance of variational methods of channel discrimination depends on the trace of the product of the output states. These findings demonstrate that quantum machine learning can be used to discriminate channels, such as those representing physical noise processes.  ( 3 min )
    Efficient search of active inference policy spaces using k-means. (arXiv:2209.02550v2 [cs.LG] UPDATED)
    We develop an approach to policy selection in active inference that allows us to efficiently search large policy spaces by mapping each policy to its embedding in a vector space. We sample the expected free energy of representative points in the space, then perform a more thorough policy search around the most promising point in this initial sample. We consider various approaches to creating the policy embedding space, and propose using k-means clustering to select representative points. We apply our technique to a goal-oriented graph-traversal problem, for which naive policy selection is intractable for even moderately large graphs.
    Shifting Perspective to See Difference: A Novel Multi-View Method for Skeleton based Action Recognition. (arXiv:2209.02986v1 [cs.CV])
    Skeleton-based human action recognition is a longstanding challenge due to its complex dynamics. Some fine-grain details of the dynamics play a vital role in classification. The existing work largely focuses on designing incremental neural networks with more complicated adjacent matrices to capture the details of joints relationships. However, they still have difficulties distinguishing actions that have broadly similar motion patterns but belong to different categories. Interestingly, we found that the subtle differences in motion patterns can be significantly amplified and become easy for audience to distinct through specified view directions, where this property haven't been fully explored before. Drastically different from previous work, we boost the performance by proposing a conceptually simple yet effective Multi-view strategy that recognizes actions from a collection of dynamic view features. Specifically, we design a novel Skeleton-Anchor Proposal (SAP) module which contains a Multi-head structure to learn a set of views. For feature learning of different views, we introduce a novel Angle Representation to transform the actions under different views and feed the transformations into the baseline model. Our module can work seamlessly with the existing action classification model. Incorporated with baseline models, our SAP module exhibits clear performance gains on many challenging benchmarks. Moreover, comprehensive experiments show that our model consistently beats down the state-of-the-art and remains effective and robust especially when dealing with corrupted data. Related code will be available on https://github.com/ideal-idea/SAP .
    TalkToModel: Understanding Machine Learning Models With Open Ended Dialogues. (arXiv:2207.04154v2 [cs.LG] UPDATED)
    Machine Learning (ML) models are increasingly used to make critical decisions in real-world applications, yet they have also become more complex, making them harder to understand. To this end, several techniques to explain model predictions have been proposed. However, practitioners struggle to leverage explanations because they often do not know which to use, how to interpret the results, and may have insufficient data science experience to obtain explanations. In addition, most current works focus on generating one-shot explanations and do not allow users to follow up and ask fine-grained questions about the explanations, which can be frustrating. In this work, we address these challenges by introducing TalkToModel: an open-ended dialogue system for understanding machine learning models. Specifically, TalkToModel comprises three key components: 1) a natural language interface for engaging in dialogues, making understanding ML models highly accessible, 2) a dialogue engine that adapts to any tabular model and dataset, interprets natural language, maps it to appropriate operations (e.g., feature importance explanations, counterfactual explanations, showing model errors), and generates text responses, and 3) an execution component that run the operations and ensures explanations are accurate. We carried out quantitative and human subject evaluations of TalkToModel. We found the system understands user questions on novel datasets and models with high accuracy, demonstrating the system's capacity to generalize to new situations. In human evaluations, 73% of healthcare workers (e.g., doctors and nurses) agreed they would use TalkToModel over baseline point-and-click systems, and 84.6% of ML graduate students agreed TalkToModel was easier to use.
    $1D$ to $nD$: A Meta Algorithm for Multivariate Global Optimization via Univariate Optimizers. (arXiv:2209.03246v1 [math.OC])
    In this work, we propose a meta algorithm that can solve a multivariate global optimization problem using univariate global optimizers. Although the univariate global optimization does not receive much attention compared to the multivariate case, which is more emphasized in academia and industry; we show that it is still relevant and can be directly used to solve problems of multivariate optimization. We also provide the corresponding regret bounds in terms of the time horizon $T$ and the average regret of the univariate optimizer, when it is robust against nonnegative noises with robust regret guarantees.  ( 2 min )
    Geometric multimodal representation learning. (arXiv:2209.03299v1 [cs.LG])
    Graph-centric artificial intelligence (graph AI) has achieved remarkable success in modeling interacting systems prevalent in nature, from dynamical systems in biology to particle physics. The increasing heterogeneity of data calls for graph neural architectures that can combine multiple inductive biases. However, combining data from various sources is challenging because appropriate inductive bias may vary by data modality. Multimodal learning methods fuse multiple data modalities while leveraging cross-modal dependencies to address this challenge. Here, we survey 140 studies in graph-centric AI and realize that diverse data types are increasingly brought together using graphs and fed into sophisticated multimodal models. These models stratify into image-, language-, and knowledge-grounded multimodal learning. We put forward an algorithmic blueprint for multimodal graph learning based on this categorization. The blueprint serves as a way to group state-of-the-art architectures that treat multimodal data by choosing appropriately four different components. This effort can pave the way for standardizing the design of sophisticated multimodal architectures for highly complex real-world problems.
    Machine Learning Partners in Criminal Networks. (arXiv:2209.03171v1 [physics.soc-ph])
    Recent research has shown that criminal networks have complex organizational structures, but whether this can be used to predict static and dynamic properties of criminal networks remains little explored. Here, by combining graph representation learning and machine learning methods, we show that structural properties of political corruption, police intelligence, and money laundering networks can be used to recover missing criminal partnerships, distinguish among different types of criminal and legal associations, as well as predict the total amount of money exchanged among criminal agents, all with outstanding accuracy. We also show that our approach can anticipate future criminal associations during the dynamic growth of corruption networks with significant accuracy. Thus, similar to evidence found at crime scenes, we conclude that structural patterns of criminal networks carry crucial information about illegal activities, which allows machine learning methods to predict missing information and even anticipate future criminal behavior.  ( 2 min )
    Combining Sequential and Aggregated Data for Churn Prediction in Casual Freemium Games. (arXiv:2209.03184v1 [cs.AI])
    In freemium games, the revenue from a player comes from the in-app purchases made and the advertisement to which that player is exposed. The longer a player is playing the game, the higher will be the chances that he or she will generate a revenue within the game. Within this scenario, it is extremely important to be able to detect promptly when a player is about to quit playing (churn) in order to react and attempt to retain the player within the game, thus prolonging his or her game lifetime. In this article we investigate how to improve the current state-of-the-art in churn prediction by combining sequential and aggregate data using different neural network architectures. The results of the comparative analysis show that the combination of the two data types grants an improvement in the prediction accuracy over predictors based on either purely sequential or purely aggregated data.
    A Survey of Machine Unlearning. (arXiv:2209.02299v2 [cs.LG] UPDATED)
    Computer systems hold a large amount of personal data over decades. On the one hand, such data abundance allows breakthroughs in artificial intelligence (AI), especially machine learning (ML) models. On the other hand, it can threaten the privacy of users and weaken the trust between humans and AI. Recent regulations require that private information about a user can be removed from computer systems in general and from ML models in particular upon request (e.g. the "right to be forgotten"). While removing data from back-end databases should be straightforward, it is not sufficient in the AI context as ML models often "remember" the old data. Existing adversarial attacks proved that we can learn private membership or attributes of the training data from the trained models. This phenomenon calls for a new paradigm, namely machine unlearning, to make ML models forget about particular data. It turns out that recent works on machine unlearning have not been able to solve the problem completely due to the lack of common frameworks and resources. In this survey paper, we seek to provide a thorough investigation of machine unlearning in its definitions, scenarios, mechanisms, and applications. Specifically, as a categorical collection of state-of-the-art research, we hope to provide a broad reference for those seeking a primer on machine unlearning and its various formulations, design requirements, removal requests, algorithms, and uses in a variety of ML applications. Furthermore, we hope to outline key findings and trends in the paradigm as well as highlight new areas of research that have yet to see the application of machine unlearning, but could nonetheless benefit immensely. We hope this survey provides a valuable reference for ML researchers as well as those seeking to innovate privacy technologies. Our resources are at https://github.com/tamlhp/awesome-machine-unlearning.
    Unsupervised Domain-adaptive Hash for Networks. (arXiv:2108.09136v2 [cs.LG] UPDATED)
    Abundant real-world data can be naturally represented by large-scale networks, which demands efficient and effective learning algorithms. At the same time, labels may only be available for some networks, which demands these algorithms to be able to adapt to unlabeled networks. Domain-adaptive hash learning has enjoyed considerable success in the computer vision community in many practical tasks due to its lower cost in both retrieval time and storage footprint. However, it has not been applied to multiple-domain networks. In this work, we bridge this gap by developing an unsupervised domain-adaptive hash learning method for networks, dubbed UDAH. Specifically, we develop four {task-specific yet correlated} components: (1) network structure preservation via a hard groupwise contrastive loss, (2) relaxation-free supervised hashing, (3) cross-domain intersected discriminators, and (4) semantic center alignment. We conduct a wide range of experiments to evaluate the effectiveness and efficiency of our method on a range of tasks including link prediction, node classification, and neighbor recommendation. Our evaluation results demonstrate that our model achieves better performance than the state-of-the-art conventional discrete embedding methods over all the tasks.  ( 2 min )
    A Machine Learning Analysis of Impact of the Covid-19 Pandemic on Alcohol Consumption Habit Changes Among Healthcare Workers in the U.S. (arXiv:2112.06261v3 [cs.LG] UPDATED)
    In this paper, we discuss the impact of the Covid-19 pandemic on alcohol consumption habit changes among healthcare workers in the United States. We utilize multiple supervised and unsupervised machine learning methods and models such as Decision Trees, Logistic Regression, Naive Bayes classifier, k-Nearest Neighbors, Support Vector Machines, Multilayer perceptron, XGBoost, CatBoost, LightGBM, Chi-Squared Test and mutual information method on a mental health survey data obtained from the University of Michigan Inter-University Consortium for Political and Social Research to find out relationships between COVID-19 related negative effects and alcohol consumption habit changes among healthcare workers. Our findings suggest that COVID-19-related school closures, COVID-19-related work schedule changes and COVID-related news exposure may lead to an increase in alcohol use among healthcare workers in the United States.
    Privacy-Preserving Federated Learning via System Immersion and Random Matrix Encryption. (arXiv:2204.02497v2 [cs.LG] UPDATED)
    Federated learning (FL) has emerged as a privacy solution for collaborative distributed learning where clients train AI models directly on their devices instead of sharing their data with a centralized (potentially adversarial) server. Although FL preserves local data privacy to some extent, it has been shown that information about clients' data can still be inferred from model updates. In recent years, various privacy-preserving schemes have been developed to address this privacy leakage. However, they often provide privacy at the expense of model performance or system efficiency, and balancing these tradeoffs is a crucial challenge when implementing FL schemes. In this manuscript, we propose a Privacy-Preserving Federated Learning (PPFL) framework built on the synergy of matrix encryption and system immersion tools from control theory. The idea is to immerse the learning algorithm, a Stochastic Gradient Decent (SGD), into a higher-dimensional system (the so-called target system) and design the dynamics of the target system so that: the trajectories of the original SGD are immersed/embedded in its trajectories, and it learns on encrypted data (here we use random matrix encryption). Matrix encryption is reformulated at the server as a random change of coordinates that maps original parameters to a higher-dimensional parameter space and enforces that the target SGD converges to an encrypted version of the original SGD optimal solution. The server decrypts the aggregated model using the left inverse of the immersion map. We show that our algorithm provides the same level of accuracy and convergence rate as the standard FL with a negligible computation cost while revealing no information about the clients' data.  ( 3 min )
    Benchmarking Multimodal Variational Autoencoders: GeBiD Dataset and Toolkit. (arXiv:2209.03048v1 [cs.LG])
    Multimodal Variational Autoencoders (VAEs) have been a subject of intense research in the past years as they can integrate multiple modalities into a joint representation and can thus serve as a promising tool for both data classification and generation. Several approaches toward multimodal VAE learning have been proposed so far, their comparison and evaluation have however been rather inconsistent. One reason is that the models differ at the implementation level, another problem is that the datasets commonly used in these cases were not initially designed for the evaluation of multimodal generative models. This paper addresses both mentioned issues. First, we propose a toolkit for systematic multimodal VAE training and comparison. Second, we present a synthetic bimodal dataset designed for a comprehensive evaluation of the joint generation and cross-generation capabilities. We demonstrate the utility of the dataset by comparing state-of-the-art models.
    Banknote Recognition for Visually Impaired People (Case of Ethiopian note). (arXiv:2209.03236v1 [cs.HC])
    Currency is used almost everywhere to facilitate business. In most developing countries, especially the ones in Africa, tangible notes are predominantly used in everyday financial transactions. One of these countries, Ethiopia, is believed to have one of the world highest rates of blindness (1.6%) and low vision (3.7%). There are around 4 million visually impaired people; With 1.7 million people being in complete vision loss. Those people face a number of challenges when they are in a bus station, in shopping centers, or anywhere which requires the physical exchange of money. In this paper, we try to provide a solution to this issue using AI/ML applications. We developed an Android and IOS compatible mobile application with a model that achieved 98.9% classification accuracy on our dataset. The application has a voice integrated feature that tells the type of the scanned currency in Amharic, the working language of Ethiopia. The application is developed to be easily accessible by its users. It is build to reduce the burden of visually impaired people in Ethiopia.
    Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. (arXiv:2209.03003v1 [cs.LG])
    We present rectified flow, a surprisingly simple approach to learning (neural) ordinary differential equation (ODE) models to transport between two empirically observed distributions \pi_0 and \pi_1, hence providing a unified solution to generative modeling and domain transfer, among various other tasks involving distribution transport. The idea of rectified flow is to learn the ODE to follow the straight paths connecting the points drawn from \pi_0 and \pi_1 as much as possible. This is achieved by solving a straightforward nonlinear least squares optimization problem, which can be easily scaled to large models without introducing extra parameters beyond standard supervised learning. The straight paths are special and preferred because they are the shortest paths between two points, and can be simulated exactly without time discretization and hence yield computationally efficient models. We show that the procedure of learning a rectified flow from data, called rectification, turns an arbitrary coupling of \pi_0 and \pi_1 to a new deterministic coupling with provably non-increasing convex transport costs. In addition, recursively applying rectification allows us to obtain a sequence of flows with increasingly straight paths, which can be simulated accurately with coarse time discretization in the inference phase. In empirical studies, we show that rectified flow performs superbly on image generation, image-to-image translation, and domain adaptation. In particular, on image generation and translation, our method yields nearly straight flows that give high quality results even with a single Euler discretization step.
    Self-supervised multimodal neuroimaging yields predictive representations for a spectrum of Alzheimer's phenotypes. (arXiv:2209.02876v1 [cs.LG])
    Recent neuroimaging studies that focus on predicting brain disorders via modern machine learning approaches commonly include a single modality and rely on supervised over-parameterized models.However, a single modality provides only a limited view of the highly complex brain. Critically, supervised models in clinical settings lack accurate diagnostic labels for training. Coarse labels do not capture the long-tailed spectrum of brain disorder phenotypes, which leads to a loss of generalizability of the model that makes them less useful in diagnostic settings. This work presents a novel multi-scale coordinated framework for learning multiple representations from multimodal neuroimaging data. We propose a general taxonomy of informative inductive biases to capture unique and joint information in multimodal self-supervised fusion. The taxonomy forms a family of decoder-free models with reduced computational complexity and a propensity to capture multi-scale relationships between local and global representations of the multimodal inputs. We conduct a comprehensive evaluation of the taxonomy using functional and structural magnetic resonance imaging (MRI) data across a spectrum of Alzheimer's disease phenotypes and show that self-supervised models reveal disorder-relevant brain regions and multimodal links without access to the labels during pre-training. The proposed multimodal self-supervised learning yields representations with improved classification performance for both modalities. The concomitant rich and flexible unsupervised deep learning framework captures complex multimodal relationships and provides predictive performance that meets or exceeds that of a more narrow supervised classification analysis. We present elaborate quantitative evidence of how this framework can significantly advance our search for missing links in complex brain disorders.
    Adversarial Mask: Real-World Universal Adversarial Attack on Face Recognition Model. (arXiv:2111.10759v3 [cs.CV] UPDATED)
    Deep learning-based facial recognition (FR) models have demonstrated state-of-the-art performance in the past few years, even when wearing protective medical face masks became commonplace during the COVID-19 pandemic. Given the outstanding performance of these models, the machine learning research community has shown increasing interest in challenging their robustness. Initially, researchers presented adversarial attacks in the digital domain, and later the attacks were transferred to the physical domain. However, in many cases, attacks in the physical domain are conspicuous, and thus may raise suspicion in real-world environments (e.g., airports). In this paper, we propose Adversarial Mask, a physical universal adversarial perturbation (UAP) against state-of-the-art FR models that is applied on face masks in the form of a carefully crafted pattern. In our experiments, we examined the transferability of our adversarial mask to a wide range of FR model architectures and datasets. In addition, we validated our adversarial mask's effectiveness in real-world experiments (CCTV use case) by printing the adversarial pattern on a fabric face mask. In these experiments, the FR system was only able to identify 3.34% of the participants wearing the mask (compared to a minimum of 83.34% with other evaluated masks). A demo of our experiments can be found at: https://youtu.be/_TXkDO5z11w.
    Impact of Colour Variation on Robustness of Deep Neural Networks. (arXiv:2209.02832v1 [cs.CV])
    Deep neural networks (DNNs) have have shown state-of-the-art performance for computer vision applications like image classification, segmentation and object detection. Whereas recent advances have shown their vulnerability to manual digital perturbations in the input data, namely adversarial attacks. The accuracy of the networks is significantly affected by the data distribution of their training dataset. Distortions or perturbations on color space of input images generates out-of-distribution data, which make networks more likely to misclassify them. In this work, we propose a color-variation dataset by distorting their RGB color on a subset of the ImageNet with 27 different combinations. The aim of our work is to study the impact of color variation on the performance of DNNs. We perform experiments on several state-of-the-art DNN architectures on the proposed dataset, and the result shows a significant correlation between color variation and loss of accuracy. Furthermore, based on the ResNet50 architecture, we demonstrate some experiments of the performance of recently proposed robust training techniques and strategies, such as Augmix, revisit, and free normalizer, on our proposed dataset. Experimental results indicate that these robust training techniques can improve the robustness of deep networks to color variation.
    Risk of Bias in Chest X-ray Foundation Models. (arXiv:2209.02965v1 [cs.LG])
    Foundation models are considered a breakthrough in all applications of AI, promising robust and reusable mechanisms for feature extraction, alleviating the need for large amounts of high quality training data for task-specific prediction models. However, foundation models may potentially encode and even reinforce existing biases present in historic datasets. Given the limited ability to scrutinize foundation models, it remains unclear whether the opportunities outweigh the risks in safety critical applications such as clinical decision making. In our statistical bias analysis of a recently published, and publicly available chest X-ray foundation model, we found reasons for concern as the model seems to encode protected characteristics including biological sex and racial identity, which may lead to disparate performance across subgroups in downstream applications. While research into foundation models for healthcare applications is in an early stage, we believe it is important to make the community aware of these risks to avoid harm.
    Quantifying Aleatoric and Epistemic Uncertainty in Machine Learning: Are Conditional Entropy and Mutual Information Appropriate Measures?. (arXiv:2209.03302v1 [cs.LG])
    This short note is a critical discussion of the quantification of aleatoric and epistemic uncertainty in terms of conditional entropy and mutual information, respectively, which has recently been proposed in machine learning and has become quite common since then. More generally, we question the idea of an additive decomposition of total uncertainty into its aleatoric and epistemic constituents.  ( 2 min )
    Machine Learning Students Overfit to Overfitting. (arXiv:2209.03032v1 [cs.LG])
    Overfitting and generalization is an important concept in Machine Learning as only models that generalize are interesting for general applications. Yet some students have trouble learning this important concept through lectures and exercises. In this paper we describe common examples of students misunderstanding overfitting, and provide recommendations for possible solutions. We cover student misconceptions about overfitting, about solutions to overfitting, and implementation mistakes that are commonly confused with overfitting issues. We expect that our paper can contribute to improving student understanding and lectures about this important topic.
    Autonomous Cooking with Digital Twin Methodology. (arXiv:2209.03087v1 [cs.CE])
    This work introduces the concept of an autonomous cooking process based on Digital Twin method- ology. It proposes a hybrid approach of physics-based full order simulations followed by a data-driven system identification process with low errors. It makes faster-than-real-time simulations of Digital Twins feasible on a device level, without the need for cloud or high-performance computing. The concept is universally applicable to various physical processes.
    Lyapunov function approach for approximation algorithm design and analysis: with applications in submodular maximization. (arXiv:2205.12442v6 [math.OC] UPDATED)
    We propose a two-phase systematical framework for approximation algorithm design and analysis via Lyapunov function. The first phase consists of using Lyapunov function as an input and outputs a continuous-time approximation algorithm with a provable approximation ratio. The second phase then converts this continuous-time algorithm to a discrete-time algorithm with almost the same approximation ratio along with provable time complexity. One distinctive feature of our framework is that we only need to know the parametric form of the Lyapunov function whose complete specification will not be decided until the end of the first phase by maximizing the approximation ratio of the continuous-time algorithm. Some immediate benefits of the Lyapunov function approach include: (i) unifying many existing algorithms; (ii) providing a guideline to design and analyze new algorithms; and (iii) offering new perspectives to potentially improve existing algorithms. We use various submodular maximization problems as running examples to illustrate our framework.  ( 3 min )
    Multimodal Speech Enhancement Using Burst Propagation. (arXiv:2209.03275v1 [cs.SD])
    This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements that consider the most recent neurological discoveries regarding pyramidal cells of the prefrontal cortex and other brain regions. The so-called burst propagation implements several criteria to address the credit assignment problem in a more biologically plausible manner: steering the sign and magnitude of plasticity through feedback, multiplexing the feedback and feedforward information across layers through different weight connections, approximating feedback and feedforward connections, and linearizing the feedback signals. MBURST benefits from such capabilities to learn correlations between the noisy signal and the visual stimuli, thus attributing meaning to the speech by amplifying relevant information and suppressing noise. Experiments conducted over a Grid Corpus and CHiME3-based dataset show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy efficiency management, reducing the neuron firing rates to values up to \textbf{$70\%$} lower. Such a feature implies more sustainable implementations, suitable and desirable for hearing aids or any other similar embedded systems.  ( 2 min )
    Concept-modulated model-based offline reinforcement learning for rapid generalization. (arXiv:2209.03207v1 [cs.LG])
    The robustness of any machine learning solution is fundamentally bound by the data it was trained on. One way to generalize beyond the original training is through human-informed augmentation of the original dataset; however, it is impossible to specify all possible failure cases that can occur during deployment. To address this limitation we combine model-based reinforcement learning and model-interpretability methods to propose a solution that self-generates simulated scenarios constrained by environmental concepts and dynamics learned in an unsupervised manner. In particular, an internal model of the agent's environment is conditioned on low-dimensional concept representations of the input space that are sensitive to the agent's actions. We demonstrate this method within a standard realistic driving simulator in a simple point-to-point navigation task, where we show dramatic improvements in one-shot generalization to different instances of specified failure cases as well as zero-shot generalization to similar variations compared to model-based and model-free approaches.  ( 2 min )
    A Case Study on the Classification of Lost Circulation Events During Drilling using Machine Learning Techniques on an Imbalanced Large Dataset. (arXiv:2209.01607v2 [cs.LG] UPDATED)
    This study presents machine learning models that forecast and categorize lost circulation severity preemptively using a large class imbalanced drilling dataset. We demonstrate reproducible core techniques involved in tackling a large drilling engineering challenge utilizing easily interpretable machine learning approaches. We utilized a 65,000+ records data with class imbalance problem from Azadegan oilfield formations in Iran. Eleven of the dataset's seventeen parameters are chosen to be used in the classification of five lost circulation events. To generate classification models, we used six basic machine learning algorithms and four ensemble learning methods. Linear Discriminant Analysis (LDA), Logistic Regression (LR), Support Vector Machines (SVM), Classification and Regression Trees (CART), k-Nearest Neighbors (KNN), and Gaussian Naive Bayes (GNB) are the six fundamental techniques. We also used bagging and boosting ensemble learning techniques in the investigation of solutions for improved predicting performance. The performance of these algorithms is measured using four metrics: accuracy, precision, recall, and F1-score. The F1-score weighted to represent the data imbalance is chosen as the preferred evaluation criterion. The CART model was found to be the best in class for identifying drilling fluid circulation loss events with an average weighted F1-score of 0.9904 and standard deviation of 0.0015. Upon application of ensemble learning techniques, a Random Forest ensemble of decision trees showed the best predictive performance. It identified and classified lost circulation events with a perfect weighted F1-score of 1.0. Using Permutation Feature Importance (PFI), the measured depth was found to be the most influential factor in accurately recognizing lost circulation events while drilling.  ( 3 min )
    What does a platypus look like? Generating customized prompts for zero-shot image classification. (arXiv:2209.03320v1 [cs.CV])
    Open vocabulary models are a promising new paradigm for image classification. Unlike traditional classification models, open vocabulary models classify among any arbitrary set of categories specified with natural language during inference. This natural language, called "prompts", typically consists of a set of hand-written templates (e.g., "a photo of a {}") which are completed with each of the category names. This work introduces a simple method to generate higher accuracy prompts, without using explicit knowledge of the image domain and with far fewer hand-constructed sentences. To achieve this, we combine open vocabulary models with large language models (LLMs) to create Customized Prompts via Language models (CuPL, pronounced "couple"). In particular, we leverage the knowledge contained in LLMs in order to generate many descriptive sentences that are customized for each object category. We find that this straightforward and general approach improves accuracy on a range of zero-shot image classification benchmarks, including over one percentage point gain on ImageNet. Finally, this method requires no additional training and remains completely zero-shot. Code is available at https://github.com/sarahpratt/CuPL.  ( 2 min )
    Difficulty-Net: Learning to Predict Difficulty for Long-Tailed Recognition. (arXiv:2209.02960v1 [cs.CV])
    Long-tailed datasets, where head classes comprise much more training samples than tail classes, cause recognition models to get biased towards the head classes. Weighted loss is one of the most popular ways of mitigating this issue, and a recent work has suggested that class-difficulty might be a better clue than conventionally used class-frequency to decide the distribution of weights. A heuristic formulation was used in the previous work for quantifying the difficulty, but we empirically find that the optimal formulation varies depending on the characteristics of datasets. Therefore, we propose Difficulty-Net, which learns to predict the difficulty of classes using the model's performance in a meta-learning framework. To make it learn reasonable difficulty of a class within the context of other classes, we newly introduce two key concepts, namely the relative difficulty and the driver loss. The former helps Difficulty-Net take other classes into account when calculating difficulty of a class, while the latter is indispensable for guiding the learning to a meaningful direction. Extensive experiments on popular long-tailed datasets demonstrated the effectiveness of the proposed method, and it achieved state-of-the-art performance on multiple long-tailed datasets.
    Change Detection for Local Explainability in Evolving Data Streams. (arXiv:2209.02764v1 [cs.LG])
    As complex machine learning models are increasingly used in sensitive applications like banking, trading or credit scoring, there is a growing demand for reliable explanation mechanisms. Local feature attribution methods have become a popular technique for post-hoc and model-agnostic explanations. However, attribution methods typically assume a stationary environment in which the predictive model has been trained and remains stable. As a result, it is often unclear how local attributions behave in realistic, constantly evolving settings such as streaming and online applications. In this paper, we discuss the impact of temporal change on local feature attributions. In particular, we show that local attributions can become obsolete each time the predictive model is updated or concept drift alters the data generating distribution. Consequently, local feature attributions in data streams provide high explanatory power only when combined with a mechanism that allows us to detect and respond to local changes over time. To this end, we present CDLEEDS, a flexible and model-agnostic framework for detecting local change and concept drift. CDLEEDS serves as an intuitive extension of attribution-based explanation techniques to identify outdated local attributions and enable more targeted recalculations. In experiments, we also show that the proposed framework can reliably detect both local and global concept drift. Accordingly, our work contributes to a more meaningful and robust explainability in online machine learning.
    DC-MRTA: Decentralized Multi-Robot Task Allocation and Navigation in Complex Environments. (arXiv:2209.02865v1 [cs.RO])
    We present a novel reinforcement learning (RL) based task allocation and decentralized navigation algorithm for mobile robots in warehouse environments. Our approach is designed for scenarios in which multiple robots are used to perform various pick up and delivery tasks. We consider the problem of joint decentralized task allocation and navigation and present a two level approach to solve it. At the higher level, we solve the task allocation by formulating it in terms of Markov Decision Processes and choosing the appropriate rewards to minimize the Total Travel Delay (TTD). At the lower level, we use a decentralized navigation scheme based on ORCA that enables each robot to perform these tasks in an independent manner, and avoid collisions with other robots and dynamic obstacles. We combine these lower and upper levels by defining rewards for the higher level as the feedback from the lower level navigation algorithm. We perform extensive evaluation in complex warehouse layouts with large number of agents and highlight the benefits over state-of-the-art algorithms based on myopic pickup distance minimization and regret-based task selection. We observe improvement up to 14% in terms of task completion time and up-to 40% improvement in terms of computing collision-free trajectories for the robots.  ( 2 min )
    DC-Art-GAN: Stable Procedural Content Generation using DC-GANs for Digital Art. (arXiv:2209.02847v1 [cs.CV])
    Art is an artistic method of using digital technologies as a part of the generative or creative process. With the advent of digital currency and NFTs (Non-Fungible Token), the demand for digital art is growing aggressively. In this manuscript, we advocate the concept of using deep generative networks with adversarial training for a stable and variant art generation. The work mainly focuses on using the Deep Convolutional Generative Adversarial Network (DC-GAN) and explores the techniques to address the common pitfalls in GAN training. We compare various architectures and designs of DC-GANs to arrive at a recommendable design choice for a stable and realistic generation. The main focus of the work is to generate realistic images that do not exist in reality but are synthesised from random noise by the proposed model. We provide visual results of generated animal face images (some pieces of evidence showing a blend of species) along with recommendations for training, architecture and design choices. We also show how training image preprocessing plays a massive role in GAN training.
    Adjusted Asymmetric Accuracy: A Well-Behaving External Cluster Validity Measure. (arXiv:2209.02935v1 [cs.LG])
    There is no, nor will there ever be, single best clustering algorithm, but we would still like to be able to pinpoint those which are well-performing on certain task types and filter out the systematically disappointing ones. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. Yet, their validity is questionable because the clusterings they promote can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to the reference, ground truth groupings that are provided by experts. The commonly-used classical partition similarity scores, such as the normalised mutual information, Fowlkes-Mallows, or adjusted Rand index, might not possess all the desirable properties, e.g., they do not identify pathological edge cases correctly. Furthermore, they are not nicely interpretable: it is hard to say what a score of 0.8 really means. Its behaviour might also vary as the number of true clusters changes. This makes comparing clustering algorithms across many benchmark datasets difficult. To remedy this, we propose and analyse a new measure: an asymmetric version of the optimal set-matching accuracy. It is corrected for chance and the imbalancedness of cluster sizes.
    Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps. (arXiv:2209.02869v1 [cs.CV])
    Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources. Existing channel pruning algorithms for CNNs have achieved plenty of success on complex models. They approach the pruning problem from various perspectives and use different metrics to guide the pruning process. However, these metrics mainly focus on the model's `outputs' or `weights' and neglect its `interpretations' information. To fill in this gap, we propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process, thereby utilizing information from both inputs and outputs of the model. However, existing interpretation methods cannot get deployed to achieve our goal as either they are inefficient for pruning or may predict non-coherent explanations. We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models. We parameterize the distribution of explanatory masks by Radial Basis Function (RBF)-like functions to incorporate geometric prior of natural images in our selector model's inductive bias. Thus, we can obtain compact representations of explanations to reduce the computational costs of our pruning method. We leverage our selector model to steer the network pruning by maximizing the similarity of explanatory representations for the pruned and original models. Extensive experiments on CIFAR-10 and ImageNet benchmark datasets demonstrate the efficacy of our proposed method. Our implementations are available at \url{https://github.com/Alii-Ganjj/InterpretationsSteeredPruning}
    MRF-PINN: A Multi-Receptive-Field convolutional physics-informed neural network for solving partial differential equations. (arXiv:2209.03151v1 [cs.LG])
    Physics-informed neural networks (PINN) can achieve lower development and solving cost than traditional partial differential equation (PDE) solvers in scenarios such as reconstructing the physics field and solving the inverse problem. Due to the advantages of parameter sharing, spatial feature extraction and low inference cost, convolutional neural networks (CNN) are increasingly used in PINN. To adapt convolutional PINN to different equations, researchers have to spend much time tuning critical hyperparameters. Furthermore, the effects of finite difference accuracy, model complexity, and mesh resolution on the prediction result of convolutional PINN are unclear. To fill the above research gaps, in this paper, (1) A Multi-Receptive-Field PINN (MRF-PINN) model is constructed to adapt different equation types and mesh resolutions without manual tuning.(2) The generality and advantages of the MRF-PINN are verified in three typical linear PDEs (elliptic, parabolic, hyperbolic) and nonlinear PDEs (Navier-Stokes equations). (3) The contribution of each receptive field to the final MRF-PINN result is analyzed, and the influence of finite difference accuracy, model complexity (channel number) and mesh resolution on the MRF-PINN result is tested. This paper shows that MRF-PINN can adapt to completely different equation types and mesh resolutions without any hyperparameter tuning. Further, the solving error is significantly decreased under high-order finite difference, large channel number, and high mesh resolution, which is expected to become a general convolutional PINN scheme.
    Quadratic Gradient: Uniting Gradient Algorithm and Newton Method as One. (arXiv:2209.03282v1 [math.OC])
    It might be inadequate for the line search technique for Newton's method to use only one floating point number. A column vector of the same size as the gradient might be better than a mere float number to accelerate each of the gradient elements with different rates. Moreover, a square matrix of the same order as the Hessian matrix might be helpful to correct the Hessian matrix. Chiang applied something between a column vector and a square matrix, namely a diagonal matrix, to accelerate the gradient and further proposed a faster gradient variant called quadratic gradient. In this paper, we present a new way to build a new version of the quadratic gradient. This new quadratic gradient doesn't satisfy the convergence conditions of the fixed Hessian Newton's method. However, experimental results show that it sometimes has a better performance than the original one in convergence rate. Also, Chiang speculates that there might be a relation between the Hessian matrix and the learning rate for the first-order gradient descent method. We prove that the floating number $\frac{1}{\epsilon + \max \{| \lambda_i | \}}$ can be a good learning rate of the gradient methods, where $\epsilon$ is a number to avoid division by zero and $\lambda_i$ the eigenvalues of the Hessian matrix.
    Inference and Learning for Generative Capsule Models. (arXiv:2209.03115v1 [cs.LG])
    Capsule networks (see e.g. Hinton et al., 2018) aim to encode knowledge of and reason about the relationship between an object and its parts. In this paper we specify a generative model for such data, and derive a variational algorithm for inferring the transformation of each model object in a scene, and the assignments of observed parts to the objects. We derive a learning algorithm for the object models, based on variational expectation maximization (Jordan et al., 1999). We also study an alternative inference algorithm based on the RANSAC method of Fischler and Bolles (1981). We apply these inference methods to (i) data generated from multiple geometric objects like squares and triangles ("constellations"), and (ii) data from a parts-based model of faces. Recent work by Kosiorek et al. (2019) has used amortized inference via stacked capsule autoencoders (SCAEs) to tackle this problem -- our results show that we significantly outperform them where we can make comparisons (on the constellations data).
    Spatiotemporal Cardiac Statistical Shape Modeling: A Data-Driven Approach. (arXiv:2209.02736v1 [cs.LG])
    Clinical investigations of anatomy's structural changes over time could greatly benefit from population-level quantification of shape, or spatiotemporal statistic shape modeling (SSM). Such a tool enables characterizing patient organ cycles or disease progression in relation to a cohort of interest. Constructing shape models requires establishing a quantitative shape representation (e.g., corresponding landmarks). Particle-based shape modeling (PSM) is a data-driven SSM approach that captures population-level shape variations by optimizing landmark placement. However, it assumes cross-sectional study designs and hence has limited statistical power in representing shape changes over time. Existing methods for modeling spatiotemporal or longitudinal shape changes require predefined shape atlases and pre-built shape models that are typically constructed cross-sectionally. This paper proposes a data-driven approach inspired by the PSM method to learn population-level spatiotemporal shape changes directly from shape data. We introduce a novel SSM optimization scheme that produces landmarks that are in correspondence both across the population (inter-subject) and across time-series (intra-subject). We apply the proposed method to 4D cardiac data from atrial-fibrillation patients and demonstrate its efficacy in representing the dynamic change of the left atrium. Furthermore, we show that our method outperforms an image-based approach for spatiotemporal SSM with respect to a generative time-series model, the Linear Dynamical System (LDS). LDS fit using a spatiotemporal shape model optimized via our approach provides better generalization and specificity, indicating it accurately captures the underlying time-dependency.
    Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube. (arXiv:2209.03042v1 [hep-ex])
    IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challenge due to the irregular detector geometry, inhomogeneous scattering and absorption of light in the ice and, below 100 GeV, the relatively low number of signal photons produced per event. To address this challenge, it is possible to represent IceCube events as point cloud graphs and use a Graph Neural Network (GNN) as the classification and reconstruction method. The GNN is capable of distinguishing neutrino events from cosmic-ray backgrounds, classifying different neutrino event types, and reconstructing the deposited energy, direction and interaction vertex. Based on simulation, we provide a comparison in the 1-100 GeV energy range to the current state-of-the-art maximum likelihood techniques used in current IceCube analyses, including the effects of known systematic uncertainties. For neutrino event classification, the GNN increases the signal efficiency by 18% at a fixed false positive rate (FPR), compared to current IceCube methods. Alternatively, the GNN offers a reduction of the FPR by over a factor 8 (to below half a percent) at a fixed signal efficiency. For the reconstruction of energy, direction, and interaction vertex, the resolution improves by an average of 13%-20% compared to current maximum likelihood techniques in the energy range of 1-30 GeV. The GNN, when run on a GPU, is capable of processing IceCube events at a rate nearly double of the median IceCube trigger rate of 2.7 kHz, which opens the possibility of using low energy neutrinos in online searches for transient events.
    Annealing Optimization for Progressive Learning with Stochastic Approximation. (arXiv:2209.02826v1 [eess.SY])
    In this work, we introduce a learning model designed to meet the needs of applications in which computational resources are limited, and robustness and interpretability are prioritized. Learning problems can be formulated as constrained stochastic optimization problems, with the constraints originating mainly from model assumptions that define a trade-off between complexity and performance. This trade-off is closely related to over-fitting, generalization capacity, and robustness to noise and adversarial attacks, and depends on both the structure and complexity of the model, as well as the properties of the optimization methods used. We develop an online prototype-based learning algorithm based on annealing optimization that is formulated as an online gradient-free stochastic approximation algorithm. The learning model can be viewed as an interpretable and progressively growing competitive-learning neural network model to be used for supervised, unsupervised, and reinforcement learning. The annealing nature of the algorithm contributes to minimal hyper-parameter tuning requirements, poor local minima prevention, and robustness with respect to the initial conditions. At the same time, it provides online control over the performance-complexity trade-off by progressively increasing the complexity of the learning model as needed, through an intuitive bifurcation phenomenon. Finally, the use of stochastic approximation enables the study of the convergence of the learning algorithm through mathematical tools from dynamical systems and control, and allows for its integration with reinforcement learning algorithms, constructing an adaptive state-action aggregation scheme.
    A Data-dependent Approach for High Dimensional (Robust) Wasserstein Alignment. (arXiv:2209.02905v1 [cs.CV])
    Many real-world problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns in the field of computer vision. Recently, the alignment problem in high dimensions finds several novel applications in practice. However, the research is still rather limited in the algorithmic aspect. To the best of our knowledge, most existing approaches are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high computational complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns. Any existing alignment method can be applied to the compressed geometric patterns and the time complexity can be significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. Our framework is a "data-dependent" approach that has the complexity depending on the intrinsic dimension of the input data. Our experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the runtimes (including the times cost for compression) are substantially lower.
    Semi-supervised Invertible DeepONets for Bayesian Inverse Problem. (arXiv:2209.02772v1 [stat.ML])
    Deep Operator Networks (DeepONets) offer a powerful, data-driven tool for solving parametric PDEs by learning operators, i.e. maps between infinite-dimensional function spaces. In this work, we employ physics-informed DeepONets in the context of high-dimensional, Bayesian inverse problems. Traditional solution strategies necessitate an enormous, and frequently infeasible, number of forward model solves, as well as the computation of parametric derivatives. In order to enable efficient solutions, we extend DeepONets by employing a realNVP architecture which yields an invertible and differentiable map between the parametric input and the branch net output. This allows us to construct accurate approximations of the full posterior which can be readily adapted irrespective of the number of observations and the magnitude of the observation noise. As a result, no additional forward solves are required, nor is there any need for costly sampling procedures. We demonstrate the efficacy and accuracy of the proposed methodology in the context of inverse problems based on a anti-derivative, a reaction-diffusion and a Darcy-flow equation.
    Quantitative probing: Validating causal models using quantitative domain knowledge. (arXiv:2209.03013v1 [cs.LG])
    We present quantitative probing as a model-agnostic framework for validating causal models in the presence of quantitative domain knowledge. The method is constructed as an analogue of the train/test split in correlation-based machine learning and as an enhancement of current causal validation strategies that are consistent with the logic of scientific discovery. The effectiveness of the method is illustrated using Pearl's sprinkler example, before a thorough simulation-based investigation is conducted. Limits of the technique are identified by studying exemplary failing scenarios, which are furthermore used to propose a list of topics for future research and improvements of the presented version of quantitative probing. The code for integrating quantitative probing into causal analysis, as well as the code for the presented simulation-based studies of the effectiveness of quantitative probing is provided in two separate open-source Python packages.
    Grouping-matrix based Graph Pooling with Adaptive Number of Clusters. (arXiv:2209.02939v1 [cs.AI])
    Graph pooling is a crucial operation for encoding hierarchical structures within graphs. Most existing graph pooling approaches formulate the problem as a node clustering task which effectively captures the graph topology. Conventional methods ask users to specify an appropriate number of clusters as a hyperparameter, then assume that all input graphs share the same number of clusters. In inductive settings where the number of clusters can vary, however, the model should be able to represent this variation in its pooling layers in order to learn suitable clusters. Thus we propose GMPool, a novel differentiable graph pooling architecture that automatically determines the appropriate number of clusters based on the input data. The main intuition involves a grouping matrix defined as a quadratic form of the pooling operator, which induces use of binary classification probabilities of pairwise combinations of nodes. GMPool obtains the pooling operator by first computing the grouping matrix, then decomposing it. Extensive evaluations on molecular property prediction tasks demonstrate that our method outperforms conventional methods.
    Read it to me: An emotionally aware Speech Narration Application. (arXiv:2209.02785v1 [cs.SD])
    In this work we try to perform emotional style transfer on audios. In particular, MelGAN-VC architecture is explored for various emotion-pair transfers. The generated audio is then classified using an LSTM-based emotion classifier for audio. We find that "sad" audio is generated well as compared to "happy" or "anger" as people have similar expressions of sadness.
    Efficient Implementation of Non-linear Flow Law Using Neural Network into the Abaqus Explicit FEM code. (arXiv:2209.03190v1 [cs.CE])
    Machine learning techniques are increasingly used to predict material behavior in scientific applications and offer a significant advantage over conventional numerical methods. In this work, an Artificial Neural Network (ANN) model is used in a finite element formulation to define the flow law of a metallic material as a function of plastic strain, plastic strain rate and temperature. First, we present the general structure of the neural network, its operation and focus on the ability of the network to deduce, without prior learning, the derivatives of the flow law with respect to the model inputs. In order to validate the robustness and accuracy of the proposed model, we compare and analyze the performance of several network architectures with respect to the analytical formulation of a Johnson-Cook behavior law for a 42CrMo4 steel. In a second part, after having selected an Artificial Neural Network architecture with $2$ hidden layers, we present the implementation of this model in the Abaqus Explicit computational code in the form of a VUHARD subroutine. The predictive capability of the proposed model is then demonstrated during the numerical simulation of two test cases: the necking of a circular bar and a Taylor impact test. The results obtained show a very high capability of the ANN to replace the analytical formulation of a Johnson-Cook behavior law in a finite element code, while remaining competitive in terms of numerical simulation time compared to a classical approach.
    Solving Elliptic Problems with Singular Sources using Singularity Splitting Deep Ritz Method. (arXiv:2209.02931v1 [math.NA])
    In this work, we develop an efficient solver based on deep neural networks for the Poisson equation with variable coefficients and singular sources expressed by the Dirac delta function $\delta(\mathbf{x})$. This class of problems covers general point sources, line sources and point-line combinations, and has a broad range of practical applications. The proposed approach is based on decomposing the true solution into a singular part that is known analytically using the fundamental solution of the Laplace equation and a regular part that satisfies a suitable elliptic PDE with smoother sources, and then solving for the regular part using the deep Ritz method. A path-following strategy is suggested to select the penalty parameter for penalizing the Dirichlet boundary condition. Extensive numerical experiments in two- and multi-dimensional spaces with point sources, line sources or their combinations are presented to illustrate the efficiency of the proposed approach, and a comparative study with several existing approaches is also given, which shows clearly its competitiveness for the specific class of problems. In addition, we briefly discuss the error analysis of the approach.
    Scalable Regularization of Scene Graph Generation Models using Symbolic Theories. (arXiv:2209.02749v1 [cs.LG])
    Several techniques have recently aimed to improve the performance of deep learning models for Scene Graph Generation (SGG) by incorporating background knowledge. State-of-the-art techniques can be divided into two families: one where the background knowledge is incorporated into the model in a subsymbolic fashion, and another in which the background knowledge is maintained in symbolic form. Despite promising results, both families of techniques face several shortcomings: the first one requires ad-hoc, more complex neural architectures increasing the training or inference cost; the second one suffers from limited scalability w.r.t. the size of the background knowledge. Our work introduces a regularization technique for injecting symbolic background knowledge into neural SGG models that overcomes the limitations of prior art. Our technique is model-agnostic, does not incur any cost at inference time, and scales to previously unmanageable background knowledge sizes. We demonstrate that our technique can improve the accuracy of state-of-the-art SGG models, by up to 33%.
    Modular Federated Learning. (arXiv:2209.03090v1 [cs.LG])
    Federated learning is an approach to train machine learning models on the edge of the networks, as close as possible where the data is produced, motivated by the emerging problem of the inability to stream and centrally store the large amount of data produced by edge devices as well as by data privacy concerns. This learning paradigm is in need of robust algorithms to device heterogeneity and data heterogeneity. This paper proposes ModFL as a federated learning framework that splits the models into a configuration module and an operation module enabling federated learning of the individual modules. This modular approach makes it possible to extract knowlege from a group of heterogeneous devices as well as from non-IID data produced from its users. This approach can be viewed as an extension of the federated learning with personalisation layers FedPer framework that addresses data heterogeneity. We show that ModFL outperforms FedPer for non-IID data partitions of CIFAR-10 and STL-10 using CNNs. Our results on time-series data with HAPT, RWHAR, and WISDM datasets using RNNs remain inconclusive, we argue that the chosen datasets do not highlight the advantages of ModFL, but in the worst case scenario it performs as well as FedPer.
    A Data Science Approach to Risk Assessment for Automobile Insurance Policies. (arXiv:2209.02762v1 [cs.LG])
    In order to determine a suitable automobile insurance policy premium one needs to take into account three factors, the risk associated with the drivers and cars on the policy, the operational costs associated with management of the policy and the desired profit margin. The premium should then be some function of these three values. We focus on risk assessment using a Data Science approach. Instead of using the traditional frequency and severity metrics we instead predict the total claims that will be made by a new customer using historical data of current and past policies. Given multiple features of the policy (age and gender of drivers, value of car, previous accidents, etc.) one can potentially try to provide personalized insurance policies based specifically on these features as follows. We can compute the average claims made per year of all past and current policies with identical features and then take an average over these claim rates. Unfortunately there may not be sufficient samples to obtain a robust average. We can instead try to include policies that are "similar" to obtain sufficient samples for a robust average. We therefore face a trade-off between personalization (only using closely similar policies) and robustness (extending the domain far enough to capture sufficient samples). This is known as the Bias-Variance Trade-off. We model this problem and determine the optimal trade-off between the two (i.e. the balance that provides the highest prediction accuracy) and apply it to the claim rate prediction problem. We demonstrate our approach using real data.
    Multi-Scale Attention-based Multiple Instance Learning for Classification of Multi-Gigapixel Histology Images. (arXiv:2209.03041v1 [cs.CV])
    Histology images with multi-gigapixel of resolution yield rich information for cancer diagnosis and prognosis. Most of the time, only slide-level label is available because pixel-wise annotation is labour intensive task. In this paper, we propose a deep learning pipeline for classification in histology images. Using multiple instance learning, we attempt to predict the latent membrane protein 1 (LMP1) status of nasopharyngeal carcinoma (NPC) based on haematoxylin and eosin-stain (H&E) histology images. We utilised attention mechanism with residual connection for our aggregation layers. In our 3-fold cross-validation experiment, we achieved average accuracy, AUC and F1-score 0.936, 0.995 and 0.862, respectively. This method also allows us to examine the model interpretability by visualising attention scores. To the best of our knowledge, this is the first attempt to predict LMP1 status on NPC using deep learning.
    CP-AGCN: Pytorch-based Attention Informed Graph Convolutional Network for Identifying Infants at Risk of Cerebral Palsy. (arXiv:2209.02824v1 [cs.CV])
    Early prediction is clinically considered one of the essential parts of cerebral palsy (CP) treatment. We propose to implement a low-cost and interpretable classification system for supporting CP prediction based on General Movement Assessment (GMA). We design a Pytorch-based attention-informed graph convolutional network to early identify infants at risk of CP from skeletal data extracted from RGB videos. We also design a frequency-binning module for learning the CP movements in the frequency domain while filtering noise. Our system only requires consumer-grade RGB videos for training to support interactive-time CP prediction by providing an interpretable CP classification result.
    The missing link: Developing a safety case for perception components in automated driving. (arXiv:2108.13294v4 [cs.LG] UPDATED)
    Safety assurance is a central concern for the development and societal acceptance of automated driving (AD) systems. Perception is a key aspect of AD that relies heavily on Machine Learning (ML). Despite the known challenges with the safety assurance of ML-based components, proposals have recently emerged for unit-level safety cases addressing these components. Unfortunately, AD safety cases express safety requirements at the system level and these efforts are missing the critical linking argument needed to integrate safety requirements at the system level with component performance requirements at the unit level. In this paper, we propose the Integration Safety Case for Perception (ISCaP), a generic template for such a linking safety argument specifically tailored for perception components. The template takes a deductive and formal approach to define strong traceability between levels. We demonstrate the applicability of ISCaP with a detailed case study and discuss its use as a tool to support incremental development of perception components.
  • Open

    An online learning approach to dynamic pricing and capacity sizing in service systems. (arXiv:2009.02911v3 [math.PR] UPDATED)
    We study a dynamic pricing and capacity sizing problem in a $GI/GI/1$ queue, where the service provider's objective is to obtain the optimal service fee $p$ and service capacity $\mu$ so as to maximize the cumulative expected profit (the service revenue minus the staffing cost and delay penalty). Due to the complex nature of the queueing dynamics, such a problem has no analytic solution so that previous research often resorts to heavy-traffic analysis where both the arrival rate and service rate are sent to infinity. In this work we propose an online learning framework designed for solving this problem which does not require the system's scale to increase. Our framework is dubbed Gradient-based Online Learning in Queue (GOLiQ). GOLiQ organizes the time horizon into successive operational cycles and prescribes an efficient procedure to obtain improved pricing and staffing policies in each cycle using data collected in previous cycles. Data here include the number of customer arrivals, waiting times, and the server's busy times. The ingenuity of this approach lies in its online nature, which allows the service provider do better by interacting with the environment. Effectiveness of GOLiQ is substantiated by (i) theoretical results including the algorithm convergence and regret analysis (with a logarithmic regret bound), and (ii) engineering confirmation via simulation experiments of a variety of representative $GI/GI/1$ queues.
    Generative Principal Component Analysis. (arXiv:2203.09693v2 [stat.ML] UPDATED)
    In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the underlying signal lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. We also provide a near-matching algorithm-independent lower bound. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.
    Reactmine: a search algorithm for inferring chemical reaction networks from time series data. (arXiv:2209.03185v1 [q-bio.QM])
    Inferring chemical reaction networks (CRN) from time series data is a challenge encouraged by the growing availability of quantitative temporal data at the cellular level. This motivates the design of algorithms to infer the preponderant reactions between the molecular species observed in a given biochemical process, and help to build CRN model structure and kinetics. Existing ODE-based inference methods such as SINDy resort to least square regression combined with sparsity-enforcing penalization, such as Lasso. However, when the input time series are only available in wild type conditions in which all reactions are present, we observe that current methods fail to learn sparse models. Results: We present Reactmine, a CRN learning algorithm which enforces sparsity by inferring reactions in a sequential fashion within a search tree of bounded depth, ranking the inferred reaction candidates according to the variance of their kinetics, and re-optimizing the CRN kinetic parameters on the whole trace in a final pass to rank the inferred CRN candidates. We first evaluate its performance on simulation data from a benchmark of hidden CRNs, together with algorithmic hyperparameter sensitivity analyses, and then on two sets of real experimental data: one from protein fluorescence videomicroscopy of cell cycle and circadian clock markers, and one from biomedical measurements of systemic circadian biomarkers possibly acting on clock gene expression in peripheral organs. We show that Reactmine succeeds both on simulation data by retrieving hidden CRNs where SINDy fails, and on the two real datasets by inferring reactions in agreement with previous studies.
    Out of Distribution Detection, Generalization, and Robustness Triangle with Maximum Probability Theorem. (arXiv:2203.12145v2 [cs.LG] UPDATED)
    Maximum Probability Framework, powered by Maximum Probability Theorem, is a recent theoretical development in artificial intelligence, aiming to formally define probabilistic models, guiding development of objective functions, and regularization of probabilistic models. MPT uses the probability distribution that the models assume on random variables to provide an upper bound on the probability of the model. We apply MPT to challenging out-of-distribution (OOD) detection problems in computer vision by incorporating MPT as a regularization scheme in the training of CNNs and their energy-based variants. We demonstrate the effectiveness of the proposed method on 1080 trained models, with varying hyperparameters, and conclude that the MPT-based regularization strategy stabilizes and improves the generalization and robustness of base models in addition to enhanced OOD performance on CIFAR10, CIFAR100, and MNIST datasets.
    On the Convergence of the ELBO to Entropy Sums. (arXiv:2209.03077v1 [stat.ML])
    The variational lower bound (a.k.a. ELBO or free energy) is the central objective for many learning algorithms including algorithms for deep unsupervised learning. Learning algorithms change model parameters such that the variational lower bound increases, and until the parameters are close to a stationary point of the learning dynamics. In this purely theoretical contribution, we show that (for a very large class of generative models) the variational lower bound is at all stationary points of learning equal to a sum of entropies. For models with one set of latents and one set observed variables, the sum consists of three entropies: (A) the (average) entropy of the variational distributions, (B) the negative entropy of the model's prior distribution, and (C) the (expected) negative entropy of the observable distributions. The obtained result applies under realistic conditions including: finite numbers of data points, at any stationary points (including saddle points) and for any family of (well behaved) variational distributions. The class of generative models for which we show the equality to entropy sums contains many (and presumably most) standard generative models (including deep models). As concrete examples we discuss probabilistic PCA and Sigmoid Belief Networks. The prerequisites we use to show equality to entropy sums are relatively mild. Concretely, the distributions of a given generative model have to be of the exponential family (with constant base measure), and a model has to satisfy a parameterization criterion (which is usually fulfilled). Proving the equality of the ELBO to entropy sums at stationary points (under the stated conditions) is the main contribution of this work.
    Manifold Free Riemannian Optimization. (arXiv:2209.03269v1 [math.OC])
    Riemannian optimization is a principled framework for solving optimization problems where the desired optimum is constrained to a smooth manifold $\mathcal{M}$. Algorithms designed in this framework usually require some geometrical description of the manifold, which typically includes tangent spaces, retractions, and gradients of the cost function. However, in many cases, only a subset (or none at all) of these elements can be accessed due to lack of information or intractability. In this paper, we propose a novel approach that can perform approximate Riemannian optimization in such cases, where the constraining manifold is a submanifold of $\R^{D}$. At the bare minimum, our method requires only a noiseless sample set of the cost function $(\x_{i}, y_{i})\in {\mathcal{M}} \times \mathbb{R}$ and the intrinsic dimension of the manifold $\mathcal{M}$. Using the samples, and utilizing the Manifold-MLS framework (Sober and Levin 2020), we construct approximations of the missing components entertaining provable guarantees and analyze their computational costs. In case some of the components are given analytically (e.g., if the cost function and its gradient are given explicitly, or if the tangent spaces can be computed), the algorithm can be easily adapted to use the accurate expressions instead of the approximations. We analyze the global convergence of Riemannian gradient-based methods using our approach, and we demonstrate empirically the strength of this method, together with a conjugate-gradients type method based upon similar principles.  ( 3 min )
    EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering. (arXiv:2010.01333v3 [cs.LG] UPDATED)
    The Gaussian mixture model (GMM) provides a simple yet principled framework for clustering, with properties suitable for statistical inference. In this paper, we propose a new model-based clustering algorithm, called EGMM (evidential GMM), in the theoretical framework of belief functions to better characterize cluster-membership uncertainty. With a mass function representing the cluster membership of each object, the evidential Gaussian mixture distribution composed of the components over the powerset of the desired clusters is proposed to model the entire dataset. The parameters in EGMM are estimated by a specially designed Expectation-Maximization (EM) algorithm. A validity index allowing automatic determination of the proper number of clusters is also provided. The proposed EGMM is as simple as the classical GMM, but can generate a more informative evidential partition for the considered dataset. The synthetic and real dataset experiments show that the proposed EGMM performs better than other representative clustering algorithms. Besides, its superiority is also demonstrated by an application to multi-modal brain image segmentation.  ( 2 min )
    The Role of ImageNet Classes in Fr\'echet Inception Distance. (arXiv:2203.06026v2 [cs.CV] UPDATED)
    Fr\'echet Inception Distance (FID) is the primary metric for ranking models in data-driven generative modeling. While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these discrepancies, and visualize what FID "looks at" in generated images. We show that the feature space that FID is (typically) computed in is so close to the ImageNet classifications that aligning the histograms of Top-$N$ classifications between sets of generated and real images can reduce FID substantially -- without actually improving the quality of results. Thus we conclude that FID is prone to intentional or accidental distortions. As a practical example of an accidental distortion, we discuss a case where an ImageNet pre-trained FastGAN achieves a FID comparable to StyleGAN2, while being worse in terms of human evaluation  ( 2 min )
    Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks. (arXiv:1909.11799v5 [cs.LG] UPDATED)
    Decision forests (Forests), in particular random forests and gradient boosting trees, have demonstrated state-of-the-art accuracy compared to other methods in many supervised learning scenarios. In particular, Forests dominate other methods in tabular data, that is, when the feature space is unstructured, so that the signal is invariant to a permutation of the feature indices. However, in structured data lying on a manifold (such as images, text, and speech) deep networks (Networks), specifically convolutional deep networks (ConvNets), tend to outperform Forests. We conjecture that at least part of the reason for this is that the input to Networks is not simply the feature magnitudes, but also their indices. In contrast, naive Forest implementations fail to explicitly consider feature indices. A recently proposed Forest approach demonstrates that Forests, for each node, implicitly sample a random matrix from some specific distribution. These Forests, like some classes of Networks, learn by partitioning the feature space into convex polytopes corresponding to linear functions. We build on that approach and show that one can choose distributions in a manifold-aware fashion to incorporate feature locality. We demonstrate the empirical performance on data whose features live on three different manifolds: a torus, images, and time-series. Moreover, we demonstrate its strength in multivariate simulated settings and also show superiority in predicting surgical outcome in epilepsy patients and predicting movement direction from raw stereotactic EEG data from non-motor brain regions. In all simulations and real data, Manifold Oblique Random Forest (MORF) algorithm outperforms approaches that ignore feature space structure and challenges the performance of ConvNets. Moreover, MORF runs fast and maintains interpretability and theoretical justification.  ( 3 min )
    Dual Instrumental Method for Confounded Kernelized Bandits. (arXiv:2209.03224v1 [cs.LG])
    The contextual bandit problem is a theoretically justified framework with wide applications in various fields. While the previous study on this problem usually requires independence between noise and contexts, our work considers a more sensible setting where the noise becomes a latent confounder that affects both contexts and rewards. Such a confounded setting is more realistic and could expand to a broader range of applications. However, the unresolved confounder will cause a bias in reward function estimation and thus lead to a large regret. To deal with the challenges brought by the confounder, we apply the dual instrumental variable regression, which can correctly identify the true reward function. We prove the convergence rate of this method is near-optimal in two types of widely used reproducing kernel Hilbert spaces. Therefore, we can design computationally efficient and regret-optimal algorithms based on the theoretical guarantees for confounded bandit problems. The numerical results illustrate the efficacy of our proposed algorithms in the confounded bandit setting.  ( 2 min )
    Machine Learning Partners in Criminal Networks. (arXiv:2209.03171v1 [physics.soc-ph])
    Recent research has shown that criminal networks have complex organizational structures, but whether this can be used to predict static and dynamic properties of criminal networks remains little explored. Here, by combining graph representation learning and machine learning methods, we show that structural properties of political corruption, police intelligence, and money laundering networks can be used to recover missing criminal partnerships, distinguish among different types of criminal and legal associations, as well as predict the total amount of money exchanged among criminal agents, all with outstanding accuracy. We also show that our approach can anticipate future criminal associations during the dynamic growth of corruption networks with significant accuracy. Thus, similar to evidence found at crime scenes, we conclude that structural patterns of criminal networks carry crucial information about illegal activities, which allows machine learning methods to predict missing information and even anticipate future criminal behavior.  ( 2 min )
    Semiparametric discrete data regression with Monte Carlo inference and prediction. (arXiv:2110.12316v4 [stat.ME] UPDATED)
    Discrete data are abundant and often arise as counts or rounded data. These data commonly exhibit complex distributional features such as zero-inflation, over- or under-dispersion, boundedness, and heaping, which render many parametric models inadequate. Yet even for parametric regression models, conjugate priors and closed-form posteriors are typically unavailable, which necessitates approximations such as MCMC for posterior inference. This paper introduces a Bayesian modeling and algorithmic framework that enables semiparametric regression analysis for discrete data with Monte Carlo (not MCMC) sampling. The proposed approach pairs a nonparametric marginal model with a latent linear regression model to encourage both flexibility and interpretability, and delivers posterior consistency even under model misspecification. For a parametric or large-sample approximation of this model, we identify a class of conjugate priors with (pseudo) closed-form posteriors. All posterior and predictive distributions are available analytically or via Monte Carlo sampling. These tools are broadly useful for linear regression, nonlinear models via basis expansions, and variable selection with discrete data. Simulation studies demonstrate significant advantages in computing, prediction, estimation, and selection relative to existing alternatives. This novel approach is applied to self-reported mental health data that exhibit zero-inflation, overdispersion, boundedness, and heaping.  ( 3 min )
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v2 [stat.CO] UPDATED)
    Although evaluation of the expectations on the Ising model is essential in various applications, it is mostly infeasible because of intractable multiple summations. Spatial Monte Carlo integration (SMCI) is a sampling-based approximation. It can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator improves monotonically as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such an expansion. In this paper, based on the theory of generalized least squares (GLS), a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).  ( 3 min )
    An Assessment Tool for Academic Research Managers in the Third World. (arXiv:2209.03199v1 [econ.EM])
    The academic evaluation of the publication record of researchers is relevant for identifying talented candidates for promotion and funding. A key tool for this is the use of the indexes provided by Web of Science and SCOPUS, costly databases that sometimes exceed the possibilities of academic institutions in many parts of the world. We show here how the data in one of the bases can be used to infer the main index of the other one. Methods of data analysis used in Machine Learning allow us to select just a few of the hundreds of variables in a database, which later are used in a panel regression, yielding a good approximation to the main index in the other database. Since the information of SCOPUS can be freely scraped from the Web, this approach allows to infer for free the Impact Factor of publications, the main index used in research assessments around the globe.  ( 2 min )
    A spectral least-squares-type method for heavy-tailed corrupted regression with unknown covariance \& heterogeneous noise. (arXiv:2209.02856v1 [math.ST])
    We revisit heavy-tailed corrupted least-squares linear regression assuming to have a corrupted $n$-sized label-feature sample of at most $\epsilon n$ arbitrary outliers. We wish to estimate a $p$-dimensional parameter $b^*$ given such sample of a label-feature pair $(y,x)$ satisfying $y=\langle x,b^*\rangle+\xi$ with heavy-tailed $(x,\xi)$. We only assume $x$ is $L^4-L^2$ hypercontractive with constant $L>0$ and has covariance matrix $\Sigma$ with minimum eigenvalue $1/\mu^2>0$ and bounded condition number $\kappa>0$. The noise $\xi$ can be arbitrarily dependent on $x$ and nonsymmetric as long as $\xi x$ has finite covariance matrix $\Xi$. We propose a near-optimal computationally tractable estimator, based on the power method, assuming no knowledge on $(\Sigma,\Xi)$ nor the operator norm of $\Xi$. With probability at least $1-\delta$, our proposed estimator attains the statistical rate $\mu^2\Vert\Xi\Vert^{1/2}(\frac{p}{n}+\frac{\log(1/\delta)}{n}+\epsilon)^{1/2}$ and breakdown-point $\epsilon\lesssim\frac{1}{L^4\kappa^2}$, both optimal in the $\ell_2$-norm, assuming the near-optimal minimum sample size $L^4\kappa^2(p\log p + \log(1/\delta))\lesssim n$, up to a log factor. To the best of our knowledge, this is the first computationally tractable algorithm satisfying simultaneously all the mentioned properties. Our estimator is based on a two-stage Multiplicative Weight Update algorithm. The first stage estimates a descent direction $\hat v$ with respect to the (unknown) pre-conditioned inner product $\langle\Sigma(\cdot),\cdot\rangle$. The second stage estimate the descent direction $\Sigma\hat v$ with respect to the (known) inner product $\langle\cdot,\cdot\rangle$, without knowing nor estimating $\Sigma$.  ( 3 min )
    Bayesian learning of feature spaces for multitasks problems. (arXiv:2209.03028v1 [stat.ML])
    This paper presents a Bayesian framework to construct non-linear, parsimonious, shallow models for multitask regression. The proposed framework relies on the fact that Random Fourier Features (RFFs) enables the approximation of an RBF kernel by an extreme learning machine whose hidden layer is formed by RFFs. The main idea is to combine both dual views of a same model under a single Bayesian formulation that extends the Sparse Bayesian Extreme Learning Machines to multitask problems. From the kernel methods point of view, the proposed formulation facilitates the introduction of prior domain knowledge through the RBF kernel parameter. From the extreme learning machines perspective, the new formulation helps control overfitting and enables a parsimonious overall model (the models that serve each task share a same set of RFFs selected within the joint Bayesian optimisation). The experimental results show that combining advantages from kernel methods and extreme learning machines within the same framework can lead to significant improvements in the performance achieved by each of these two paradigms independently.  ( 2 min )
    $1D$ to $nD$: A Meta Algorithm for Multivariate Global Optimization via Univariate Optimizers. (arXiv:2209.03246v1 [math.OC])
    In this work, we propose a meta algorithm that can solve a multivariate global optimization problem using univariate global optimizers. Although the univariate global optimization does not receive much attention compared to the multivariate case, which is more emphasized in academia and industry; we show that it is still relevant and can be directly used to solve problems of multivariate optimization. We also provide the corresponding regret bounds in terms of the time horizon $T$ and the average regret of the univariate optimizer, when it is robust against nonnegative noises with robust regret guarantees.  ( 2 min )
    Change Detection for Local Explainability in Evolving Data Streams. (arXiv:2209.02764v1 [cs.LG])
    As complex machine learning models are increasingly used in sensitive applications like banking, trading or credit scoring, there is a growing demand for reliable explanation mechanisms. Local feature attribution methods have become a popular technique for post-hoc and model-agnostic explanations. However, attribution methods typically assume a stationary environment in which the predictive model has been trained and remains stable. As a result, it is often unclear how local attributions behave in realistic, constantly evolving settings such as streaming and online applications. In this paper, we discuss the impact of temporal change on local feature attributions. In particular, we show that local attributions can become obsolete each time the predictive model is updated or concept drift alters the data generating distribution. Consequently, local feature attributions in data streams provide high explanatory power only when combined with a mechanism that allows us to detect and respond to local changes over time. To this end, we present CDLEEDS, a flexible and model-agnostic framework for detecting local change and concept drift. CDLEEDS serves as an intuitive extension of attribution-based explanation techniques to identify outdated local attributions and enable more targeted recalculations. In experiments, we also show that the proposed framework can reliably detect both local and global concept drift. Accordingly, our work contributes to a more meaningful and robust explainability in online machine learning.  ( 3 min )
    Plant Species Classification Using Transfer Learning by Pretrained Classifier VGG-19. (arXiv:2209.03076v1 [cs.CV])
    Deep learning is currently the most important branch of machine learning, with applications in speech recognition, computer vision, image classification, and medical imaging analysis. Plant recognition is one of the areas where image classification can be used to identify plant species through their leaves. Botanists devote a significant amount of time to recognizing plant species by personally inspecting. This paper describes a method for dissecting color images of Swedish leaves and identifying plant species. To achieve higher accuracy, the task is completed using transfer learning with the help of pre-trained classifier VGG-19. The four primary processes of classification are image preprocessing, image augmentation, feature extraction, and recognition, which are performed as part of the overall model evaluation. The VGG-19 classifier grasps the characteristics of leaves by employing pre-defined hidden layers such as convolutional layers, max pooling layers, and fully connected layers, and finally uses the soft-max layer to generate a feature representation for all plant classes. The model obtains knowledge connected to aspects of the Swedish leaf dataset, which contains fifteen tree classes, and aids in predicting the proper class of an unknown plant with an accuracy of 99.70% which is higher than previous research works reported.  ( 3 min )
    A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games. (arXiv:2209.02838v1 [cs.LG])
    We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.  ( 2 min )
    Understanding microbiome dynamics via interpretable graph representation learning. (arXiv:2203.01830v2 [q-bio.QM] UPDATED)
    Large-scale perturbations in the microbiome constitution are strongly correlated, whether as a driver or a consequence, with the health and functioning of human physiology. However, understanding the difference in the microbiome profiles of healthy and ill individuals can be complicated due to the large number of complex interactions among microbes. We propose to model these interactions as a time-evolving graph whose nodes are microbes and edges are interactions among them. Motivated by the need to analyse such complex interactions, we develop a method that learns a low-dimensional representation of the time-evolving graph and maintains the dynamics occurring in the high-dimensional space. Through our experiments, we show that we can extract graph features such as clusters of nodes or edges that have the highest impact on the model to learn the low-dimensional representation. This information can be crucial to identify microbes and interactions among them that are strongly correlated with clinical diseases. We conduct our experiments on both synthetic and real-world microbiome datasets.  ( 2 min )
    Non-Gaussian Process Regression. (arXiv:2209.03117v1 [stat.ML])
    Standard GPs offer a flexible modelling tool for well-behaved processes. However, deviations from Gaussianity are expected to appear in real world datasets, with structural outliers and shocks routinely observed. In these cases GPs can fail to model uncertainty adequately and may over-smooth inferences. Here we extend the GP framework into a new class of time-changed GPs that allow for straightforward modelling of heavy-tailed non-Gaussian behaviours, while retaining a tractable conditional GP structure through an infinite mixture of non-homogeneous GPs representation. The conditional GP structure is obtained by conditioning the observations on a latent transformed input space and the random evolution of the latent transformation is modelled using a L\'{e}vy process which allows Bayesian inference in both the posterior predictive density and the latent transformation function. We present Markov chain Monte Carlo inference procedures for this model and demonstrate the potential benefits compared to a standard GP.  ( 2 min )
    Semi-supervised Invertible DeepONets for Bayesian Inverse Problem. (arXiv:2209.02772v1 [stat.ML])
    Deep Operator Networks (DeepONets) offer a powerful, data-driven tool for solving parametric PDEs by learning operators, i.e. maps between infinite-dimensional function spaces. In this work, we employ physics-informed DeepONets in the context of high-dimensional, Bayesian inverse problems. Traditional solution strategies necessitate an enormous, and frequently infeasible, number of forward model solves, as well as the computation of parametric derivatives. In order to enable efficient solutions, we extend DeepONets by employing a realNVP architecture which yields an invertible and differentiable map between the parametric input and the branch net output. This allows us to construct accurate approximations of the full posterior which can be readily adapted irrespective of the number of observations and the magnitude of the observation noise. As a result, no additional forward solves are required, nor is there any need for costly sampling procedures. We demonstrate the efficacy and accuracy of the proposed methodology in the context of inverse problems based on a anti-derivative, a reaction-diffusion and a Darcy-flow equation.  ( 2 min )
    On the Sparse DAG Structure Learning Based on Adaptive Lasso. (arXiv:2209.02946v1 [stat.ML])
    Learning the underlying casual structure, represented by Directed Acyclic Graphs (DAGs), of concerned events from fully-observational data is a crucial part of causal reasoning, but it is challenging due to the combinatorial and large search space. A recent flurry of developments recast this combinatorial problem into a continuous optimization problem by leveraging an algebraic equality characterization of acyclicity. However, these methods suffer from the fixed-threshold step after optimization, which is not a flexible and systematic way to rule out the cycle-inducing edges or false discoveries edges with small values caused by numerical precision. In this paper, we develop a data-driven DAG structure learning method without the predefined threshold, called adaptive NOTEARS [30], achieved by applying adaptive penalty levels to each parameters in the regularization term. We show that adaptive NOTEARS enjoys the oracle properties under some specific conditions. Furthermore, simulation experimental results validate the effectiveness of our method, without setting any gap of edges weights around zero.  ( 2 min )
    Adjusted Asymmetric Accuracy: A Well-Behaving External Cluster Validity Measure. (arXiv:2209.02935v1 [cs.LG])
    There is no, nor will there ever be, single best clustering algorithm, but we would still like to be able to pinpoint those which are well-performing on certain task types and filter out the systematically disappointing ones. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. Yet, their validity is questionable because the clusterings they promote can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to the reference, ground truth groupings that are provided by experts. The commonly-used classical partition similarity scores, such as the normalised mutual information, Fowlkes-Mallows, or adjusted Rand index, might not possess all the desirable properties, e.g., they do not identify pathological edge cases correctly. Furthermore, they are not nicely interpretable: it is hard to say what a score of 0.8 really means. Its behaviour might also vary as the number of true clusters changes. This makes comparing clustering algorithms across many benchmark datasets difficult. To remedy this, we propose and analyse a new measure: an asymmetric version of the optimal set-matching accuracy. It is corrected for chance and the imbalancedness of cluster sizes.  ( 2 min )

  • Open

    [P] Stable Diffusion web UI with Outpainting, Inpainting, Prompt matrix, Upscale, Textual Inversion and many more features
    Stable Diffusion web UI A browser interface based on Gradio library for Stable Diffusion. github: https://github.com/AUTOMATIC1111/stable-diffusion-webui ​ https://preview.redd.it/7vk3oijorim91.png?width=1594&format=png&auto=webp&s=8e6096c86b95ed1ed338b9501e2b5c586164d0e9 submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [D] Is there a solid mathematical justification of student-teacher setups?
    For context, I am thinking of the case when a big model, M, is trained, then a smaller model, m, uses the output of M with a combination of the original data. A natural question is: why include the large model as a teacher instead of only the original data? The answers that I have found so far refer to empirical evidence, such as this 2015 highly cited paper Distilling the Knowledge in a Neural Network by Hinton. I also found this review paper Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks from 2020, which mentions the implicit objective is to maximize the mutual information between the teacher and student model. I understand the intuition of the setup, but is there a paper that provides a (reasonably) rigorous treatment of the student-teacher setup? For example, how well would it work if the teacher has the same size as the student? is it better with 2 teachers, etc? submitted by /u/carlml [link] [comments]  ( 89 min )
    [D] How does AutoML (NAS in particular) work?
    Hey guys, I was just wondering how Neural Architecture Search (NAS) is actually implemented. Not the methodology (like RL) but the actual code aspect - is the source code of the NNs modified directly, or do they just have a general neural network method with a bunch of parameters representing NN parameters (num layers, etc.)? I can’t seem to find many code implementations of NAS papers unfortunately. submitted by /u/billjames1685 [link] [comments]  ( 88 min )
    [P] Pythae: Unifying Generative AutoEncoders in Python - What's new ?
    A few months after the official release, I have been working on a few new features that have been added to Pythae. See below what's new 👉 New models: 25 opensource implementations are now available 👉 Integration of mlflow and wandb monitoring tools 👉 Reproduction of original papers 👉 New examples/tutorials (e.g. VAE-LSTM) 💻 Code: https://github.com/clementchadebec/benchmark_VAE I sincerely hope you will enjoy these new features! This library is also open to new contributors so do not hesitate to reach out if you want to include a model, fix a potential bug 🐛 or would like to see any new feature! submitted by /u/cchad-8 [link] [comments]  ( 89 min )
    [D] Reasons not to post pre-prints to Arxiv?
    I know most big conferences don’t have an issue with it, so I’m interested in whether folks have any regrets around pre-printing work or best practices for rolling out their best work. submitted by /u/CouchyShorts [link] [comments]  ( 88 min )
    [P] graph based networking platform for the AI/ML community
    Hello, me and the team are working on creating a networking platform for the AI/ML community, in this network people are represented as nodes in a 3d graph, the more similar their background in AI is the closer their nodes are. we use Bert embedding and cosine similarity to measure the "distance" between users, here is an example (few users) ​ https://preview.redd.it/1isqhsp37hm91.png?width=1029&format=png&auto=webp&s=0dfee6390174639b1f31cb44f2f7df972679c61d you can check the 3d view here (this is just a test) : https://galaxai.vercel.app/explore looking forward to hearing your thoughts, do you think this would be useful ? if yes, any features would you suggest ? submitted by /u/1hachem [link] [comments]  ( 105 min )
    [P] Long Stable Diffusion: pipeline of generative models to illustrate long stories
    I really wanted to illustrate long stories with Stable Diffusion! So, hacked together this pipeline: ​ Long text -> GPT-3 suggests illustration ideas -> GPT-3 translates from English to "prompt-English" -> Stable Diffusion outputs images ​ Open source: https://github.com/sharonzhou/long_stable_diffusion Here's a published story, illustrated by this repo, titled "Never Hire a Herd of Goats to Mow your Lawn": https://storiesby.ai/p/never-hire-a-herd-of-goats-to-mow. This story is AI-written, AI-narrated, AI-illustrated story, which a couple friends and I have been generating. Anyways, hope you like it or parts of it. Pull requests veryyyy welcome. This was just a weekend project to keep my GPUs company. ​ Tweet with more stuff, if at all interesting: https://twitter.com/realSharonZhou/status/1567031035732594688 submitted by /u/realsharonzhou [link] [comments]  ( 103 min )
    [D] How to prepare for getting a deep learning/computer vision engineer role
    I have just started a Masters in Robotics at UMD College Park. I am interested in working as a deep learning engineer or a computer vision engineer after completing my masters. I have around 18 months time from now to improve my skills. I have read a few chapters of PRML book and deep learning book during my undergrad. I worked on a few projects in computer vision like implementing SOTA algorithms, benchmarking and stuffs like that. I also had done research with a professor at my university but that was not anything novel. It was just implementing and comparing algorithms, I managed to publish it at Neurips, and MICCAI workshops though. I also have a couple of years of work experience: first year at a startup where I trained and deployed models for brain tumor segmentation. In the second year, I worked at a startup where I trained and deployed background removal segmentation models. I have worked quite a bit on pruning models, using tools like ONNX and Triton inference server to make model inference faster. My areas of interest in particular are autonomous driving and medical imaging. Till now, segmentation is one area I have extensively worked on and would like to continue in future. I would ideally like to get a R&d type of role. My roadmap to getting a job as a DL/CV engineer from here looks like this: Read Mathematics for Machine learning book. Learn C++. Do a few projects using deep learning for computer vision where I actually solve a problem. Practice Leetcode style questions. Prepare for actual interviews by revising machine learning theory, linear algebra, statistics and probability and stuffs like that. Keep reading SOTA algorithms every now and then. Can anyone let me know if my plans looks good. Am I missing something? Should I do it some other way? Thanks in advance! submitted by /u/Tiny-Masterpiece-412 [link] [comments]  ( 113 min )
    [D] Why are adaptive learning rates variations more popular than cyclical ones?
    I only read some literature on cyclical learning rate and didn't really try it but the performance demonstrated in the paper on how fast the NNs converged is really surprising. link to the original paper: https://arxiv.org/abs/1506.01186 submitted by /u/AdOk6683 [link] [comments]  ( 90 min )
    [D] Where to apply for deep learning/computer vision internships and jobs?
    I am currently doing a Masters in Robotics at UMD College Park. I would like to do an internship next summer and a full time job after that. How and where should I apply for internships? I am an international student so don't really know how things work in the USA. I was considering applying on Linkedin, Angellist. Is there any other platform or some other way? Am I missing something? submitted by /u/Tiny-Masterpiece-412 [link] [comments]  ( 109 min )
    [D] When deploying, what do you log? What do others typically log that you think is a waste of time?
    I usually log input, prediction, and confidence score. Maybe raw input versus augmentations I might have added, and the final vectorized input, also. Not sure what else is worth logging. ​ What do you think? submitted by /u/denim_duck [link] [comments]  ( 89 min )
    [R] ProSelfLC: Progressive Self Label Correction Towards A Low-Temperature Entropy State (v2)
    TLDR: Two takeaways are below. 1st takeaway 2nd takeaway: A new technical proposal, inspired by the new finding and miscalibration analysis, is introduced to decrease the entropy of self knowledge. Concretely, we propose to use an Annealed Temperature and learn towards a revised low-temperature entropy state. ​ Though this research studies deep machine learning, its findings are quite consistent with deep human learning: people start to learn the truth out of noise with a low confidence, and gradually compress the truth pieces out of a noisy world, with an increasing confidence by gradually and more comprehensively understanding the noisy world surrounding us. Read more if your are interested: https://arxiv.org/abs/2207.00118 ​ https://preview.redd.it/0jbw9j45kgm91.png?width=1239&format=png&auto=webp&s=0feff4788cd351b970511e7cf7e1048be721e034 submitted by /u/XinshaoWang [link] [comments]  ( 107 min )
    [D] Should I change my computer for running Large Language Models?
    I bought my current computer (MacBook Air (Retina, 13-inch, 2018) with Catalina) about three years ago. Its graphics uses Intel UHD Graphics 617 1536 MB. I need to run a large number of LLMs on my computers. I found that even running a text_generation task using EleutherAI/gpt-neo-1.3B costs me more than 5 minutes and even generates a simple text with size 10. Do I need to change my computer to one that uses NVIDIA GPU? Or are there anyways to run large language models on the servers so that it does not utilize my local memory? View Poll submitted by /u/Silly-Cherry5985 [link] [comments]  ( 88 min )
    [D] Embed world knowledge into a model
    I'd like to create an ML pipeline the can answer questions like: describe the human anatomy - which will be answered by a list of the body parts of the human body what is a house - return a list of the component of a house, room types, etc There's a lot of general knowledge that is required by the pipeline, how is that accomplished? The generality of GPT-3 like models it too much, i don't need storytelling abilities, converting recipes to the vegan alternatives. Just the ability to know what most things are (i know...) and to be able to provided a detailed description, basically break them into sub components. Where do I start? Assuming training data is not an issue, which architecture can deliver the result needed? submitted by /u/virann [link] [comments]  ( 90 min )
    [P] Bag of tricks for training OCR models.
    Recently i trained handwritten OCR models using PaddleOCR and found it's a useful tool for higher model accuracy ( almost 30% improvement, ths model i use is PP-OCRv3 ). PaddleOCR ensembles many tricks during the training process, from Data Augmentation to model backbone, neck and head architecture. What's more, i found that those tricks are set as the default training strategy. The following figure shows PP-OCRv3 framework. ​ https://preview.redd.it/f4wyto67vfm91.jpg?width=1444&format=pjpg&auto=webp&s=11cf7e9f06158f5bc08dc28d9e5b7a6e170f3284 Of course, models with the older versions also seems performs well (PP-OCRv2 and PP-OCR), i will read the reports recently. submitted by /u/littletomatodonkey [link] [comments]  ( 89 min )
    [D] Which european master in AI is best?
    Hello reddit, I can't decide between studying the master in Artificial Intelligence at KU Leuven or MVA at Paris-Saclay University. I am admitted to both masters and they start in a few weeks, and I still don't know which option to go for. I have a degree in mathematics and computer science, I have always liked mathematics more than computer science and artificial intelligence is the field that has interested me the most during my degree. In the future I would like to do research in artificial intelligence. I personally see MVA in Paris as the most challenging option, mainly because of the city, the language and the fact that it is a mathematics master, but also the one with the highest potential reward. Any advice you can give me will be useful, right now I am totally undecided. submitted by /u/catatojreon [link] [comments]  ( 98 min )
    [R] CLIP-Mesh: Generating textured meshes from text using pretrained image-text models
    submitted by /u/InfamousPancakes [link] [comments]  ( 106 min )
    [D] Naming of siamese networks
    Whenever I read about siamese networks I get confused because I read the definition (which is using two copies of the same network with identical weights operating on different inputs) and I get confused because it seems either I am misunderstanding the definition, or the name is terrible and confusing: I think of a neural network as a function operating on inputs, hence the notion of having two identical networks with identical weights are just silly: In that case you have just a single network/function which operates on two different inputs. Maybe this questions is silly, but I think the name "siamese network" so directly contradicts the given definition that I never truly believe that I have understood the definition because of the name. In a similar vein I find the illustrations in many ML papers slightly counterintuitive, e.g. in "Attention is all you need" the arrows denote the input while the boxes denote functions. This is the opposite to how mathematicians would illustrate things. Is my confusion merely an indication of a cultural difference between mathematicians (which is my background) and computer scientists, where mathematicians thinks more about functions while computer scientists thinks more about inputs? Or is the naming actually good for some reason I do not understand? (Or have I actually misunderstood the definition?) submitted by /u/innocentgilbertsmith [link] [comments]  ( 91 min )
    [D]Opinions on dealing with Numpy subnormal computation warning
    Prof. Brendan Dolan-Gavitt at NYU have just published a finding someones been messing with my subnormals. Non-regular floating-point numbers (subnormal) According to IEEE754, floating-point numbers are composed of sign bit, exponent, and mantissa. For 8-byte double type data, the exponent bit has 11 bits, which can represent the order of 2 The range is -1022~+1023, and the value of the corresponding exponent part is 1~2046 (plus an offset of 1023). The mantissa of the double type has 52 bits, and the default most significant bit of the mantissa is 1, which is omitted during storage, so the regular floating-point number range is +-2-1022 ~ +- (2-2-52)*21023 , the smallest regular positive number is about 4.5E-307. Except for 0, the value of the exponent part is 0 (the mantissa is not …  ( 92 min )
  • Open

    TRIPPY Stable Diffusion 3D Animation | Dreadful Aliens | AI Manifest
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 90 min )
    Researchers from McGill University and Microsoft Introduces Convolutional vision Transformer (CvT) that improves Vision Transformer (ViT) in Performance and Efficiency by Introducing Convolutions into ViT
    Transformers have been widely used in the natural language processing (NLP) domain for years, and their introduction was a turning point for many NLP tasks. Their simplicity and generalization ability make them a key component in NLP tasks. In 2020, a group of Google researchers came up with the concept of applying transformer structure to images and treating them similarly to sentences in languages. The idea was simple: an image is worth 16 x 16 words. This was the paper where the Vision Transformer (ViT) structure was first introduced, and the idea was adapted by many others afterward. Similar to a transformer, ViT employs several embedding and tokenization techniques. A source picture is divided into a collection of image patches, and they are included in a collection of fixed-dimension encoded vectors. The transformer encoder network is given the encoded vector together with the position of a patch in the image. ViT model could outperform state-of-the-art convolutional neural networks (CNN) in terms of computational effectiveness and accuracy, given that there is enough training data. However, when the training data is smaller, ViT struggles to perform as well as its CNN counterparts. As the authors mention in the CvT paper, one reason could be that ViT lacks several desirable characteristics that CNNs naturally possess that make CNNs especially well-suited to solving vision-related problems. Continue reading | Check out the paper and code. submitted by /u/ai-lover [link] [comments]  ( 89 min )
    Pretty sure AI came up with this
    submitted by /u/World-Tight [link] [comments]  ( 87 min )
    Generators Of Disagreement With AI Alignment
    submitted by /u/elcric_krej [link] [comments]  ( 87 min )
    AI that generates consecutive images from a base image?
    I’ve been looking for an AI generator that can create something like this. It sort of creates an image similar to the base image and then keeps going, making more that copy off themselves. Anyone have any examples of programs that can do this? submitted by /u/M0C1 [link] [comments]  ( 87 min )
    Preview of Tesla Optimus Humanoid Robot Specs From Elon Musk | Meta AI Hears Brain Waves of Spoken Language | New Intel Neural Network Based Object Learning.
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    Abandoned Transformer
    submitted by /u/widgia [link] [comments]  ( 86 min )
    AMA with Aidan Gomez, Cohere CEO and Cofounder
    Hey All! I want to invite you to a live AMA session with Cohere CEO and Co-founder (and co-author of “Attention is All You Need” paper) Aidan Gomez, this Friday at 12 pm ET. You can sign up here: https://zoom.us/webinar/register/6716625631179/WN_BRlksLSgR-edpt0DxfPqtA If you have any questions for Aidan in advance, submit them here: https://forms.gle/4XjnMRehHUWj7SVd7 he will be happy to answer them during the live event. Join us to chat about NLP, LLMs, multimodal models, AGI, and the meaning of it all... + anything else that is on your mind these days! 😊 https://preview.redd.it/frwwtqg6dgm91.png?width=1680&format=png&auto=webp&s=1c4e244af47b62ecf874247883d35b5b410fb579 submitted by /u/techn0_cratic [link] [comments]  ( 87 min )
    What is the most interesting use of AI you know of?
    submitted by /u/ThomPete [link] [comments]  ( 87 min )
    Stable Diffusion How to make a video Using 3D mode
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    AI Video Multiverse Scene - Everything Everywhere All At Once
    submitted by /u/nalr00n [link] [comments]  ( 87 min )
    I Invite you to join Mels discord server!
    Hi everyone, I am in the making of a Virtual Assistant, an A.I. shall we call it, and would love some feedback from the public.. not only that, would love to have people to brainstorm with for the to be added functionalities of Mel. Mel is a Virtual Assistant prepared to answer your needs and help yoi through day to day tasks, the Virtual Assistant you didn't knew you needed! Mel will be released circa 2030, fully ready for the public. In the meanwhile, me, the creator am looking for partners and seeking for the publics insight on requirements the public would simply love to have. You say 'Mel, write a note' and Mel writes a note on your device with what you want, not forgetting for sure. Mel can already play Blackjack, give locations, open a few softwares and Guard your OS all you need to do is have your device drivers updated, run Mel and speak out the command. For more I will be depending on the public to add the best and most required for the general use... Mels discord server awaits You! https://discord.gg/TPhUQXmY I count on all of you, With my best regards, -CONID submitted by /u/Embarrassed_Train_49 [link] [comments]  ( 91 min )
    S.O.S need AI beginners projects
    Hello everyone! Can anyone give me links to some cool tutorial projects for absolute beginners in AI, i do know Python pretty well btw submitted by /u/CommercialJazzlike33 [link] [comments]  ( 87 min )
    What are everyone's thoughts on the race for AI dominance (and AGI) between China and the USA?
    submitted by /u/tuccigucci_ [link] [comments]  ( 87 min )
    What is the #1 reason for biased AI models (besides humans)?
    submitted by /u/Thuwarakesh [link] [comments]  ( 95 min )
    Spent a little time playing with Stable Diffusion
    submitted by /u/Representative-Job23 [link] [comments]  ( 87 min )
  • Open

    Breakdown on using the multiworld environment?
    Hi! I'm looking to convert a simple q-learning algorithm for a manually created grid and obstacles into the multiworld Ant-U maze environment but I'm hitting roadblocks, does anyone have experience with the latter? multiworld: https://github.com/vitchyr/multiworld I'm having trouble with just setting it up, integrating my logic into this seems like a herculean task. Any advice would be appreciated! submitted by /u/xileyu [link] [comments]  ( 87 min )
    A simple in-browser NN model of playing _Pokemon_
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Is it possible to run Mujoco on windows now?
    I have an AMD Ryzen 9 machine with Windows 10 installed. I tried running gym with a mujoco env but it did not seem to work. I also tried creating an Ubuntu VM and running mujoco there, but I seem to be getting a 'core dumped' error which is apparently because my processor does not support AVX instructions. Has anyone set up Mujoco-py recently on windows or is there still no support? submitted by /u/rossgeller13 [link] [comments]  ( 87 min )
    Good resources to learn IRL? (Sergey Levine, the GigaChad course already in my list)
    submitted by /u/Professional_Card176 [link] [comments]  ( 87 min )
    Anyone found any working replication repo for MuZero?
    As titled submitted by /u/zhoubin-me [link] [comments]  ( 88 min )
  • Open

    Transfer learning for TensorFlow image classification models in Amazon SageMaker
    Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. You can use these algorithms and models for both supervised and unsupervised learning. They can process various types of input data, including tabular, […]  ( 8 min )
    Improve transcription accuracy of customer-agent calls with custom vocabulary in Amazon Transcribe
    Many AWS customers have been successfully using Amazon Transcribe to accurately, efficiently, and automatically convert their customer audio conversations to text, and extract actionable insights from them. These insights can help you continuously enhance the processes and products that directly improve the quality and experience for your customers. In many countries, such as India, English […]  ( 16 min )
  • Open

    A game-theoretic approach to provably correct and scalable offline RL
    Despite increasingly widespread use of machine learning (ML) in all aspects of our lives, a broad class of scenarios still rely on automation designed by people, not artificial intelligence (AI). In real-world applications that involve making sequences of decisions with long-term consequences, from allocating beds in an intensive-care unit to controlling robots, decision-making strategies to […] The post A game-theoretic approach to provably correct and scalable offline RL appeared first on Microsoft Research.  ( 15 min )
  • Open

    Preview of Tesla Optimus Humanoid Robot Specs From Elon Musk | Meta AI Hears Brain Waves of Spoken Language | New Intel Neural Network Based Object Learning.
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    3D Autoencoder issues and questions
    I am working on a convolutional autoencoder for 3d models, where the input is a 3d binary array - a 0 is empty space, a 1 signifies a voxel is present. The AE is compressing a 32x32x32 input of 0/1 data (voxels) into 8x8x8 or smaller (still working on that). My layers are basically 3D Convolution+Max Pooling for the encoding side, then Up-Sampling+3D Convolution for the decoding side. Since all my values are 0 or 1, I didn't think that regularization or normalization was necessary. First, does anyone have any experience with these? I am basically making it up as I go along, so even though it works - the code runs and test data error tracks with training/validation data error - I am not sure at all if my approach is valid. Recommendations for activation functions at the encoded (middle) and decoded (output) layers? I thought that (and saw in another paper for a 3D AE) that using Sigmoid and/or TanH would be useful, because they can force the output to be very close to 0 or 1. Error - My train/validate/test error is around 0.2, which to me means that each reconstructed voxel is on average 20% wrong compared to the original? Or since it's RMSE, it's like 0.44 (0.2^1/2) or 44% wrong? Basically my error should be orders of magnitude smaller, right? If so, my whole approach and architecture needs rework but I'm not sure what to do except for brute force trial-and-error. Thanks for the help! submitted by /u/DebatableOcelot21 [link] [comments]  ( 90 min )
  • Open

    How does DALL-E Mini Work?
    Artificial intelligence is often as impressive as it is terrifying. Whether you love AI or find it scary, DALL-E mini is a tool that will…  ( 5 min )
  • Open

    6 Reasons Why You Need to Integrate Your Drupal Hosting With Cloudways
    Drupal is an open-source content management system that powers hundred of thousands of websites particularly high-traffic ones. It’s especially popular among professional developer for its adaptability and government website for its high level of security.  In this piece, we’ll discuss why you might want to select Drupal, who Drupal is right for, and how to… Read More »6 Reasons Why You Need to Integrate Your Drupal Hosting With Cloudways The post 6 Reasons Why You Need to Integrate Your Drupal Hosting With Cloudways appeared first on Data Science Central.  ( 21 min )
  • Open

    Collaborative machine learning that preserves privacy
    Researchers increase the accuracy and efficiency of a machine-learning method that safeguards user data.  ( 11 min )

  • Open

    [P] Optimized stable diffusion: generate 768x2048 and 1216x1216 without super fast mode and 960x960 with it. (on 8 gb vram)
    Optimized Stable Diffusion This repo is a modified version of the Stable Diffusion repo, optimized to use less VRAM than the original by sacrificing inference speed. To achieve this, the stable diffusion model is fragmented into four parts which are sent to the GPU only when needed. After the calculation is done, they are moved back to the CPU. This allows us to run a bigger model while requiring less VRAM. github: https://github.com/neonsecret/stable-diffusion submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [D] How to add size constraints to a pair wise K-means clustering algorithm?
    Hi everyone, I'm currently looking to add size constraints to my pairwise K-means clustering algorithm. This is basically a semi supervised learning clustering. I'm currently using random seed to randomly generate various size clusters. I'm looking to add size constraints so that I don't have to keep selecting a certain random seed. I have looked into sources which has mathematical computations but couldn't find any implementations. Can anyone please guide me? Thanks in advance!! submitted by /u/tinkerpal [link] [comments]  ( 88 min )
    State of AI for Earth Observation (preprint) [Research]
    Hello /r/machinelearning. I'd like to share here a potentially valuable resource for those in our field looking to understand how ML is transforming remote sensing, or get into the field of Earth Observation. (See also related Twitter thread). Most of the images generated by satellites will never be seen by human eyes. There simply aren't enough humans on Earth to sift through the TBs of imagery acquired daily by satellites. Artificial Intelligence is revolutionising many sectors, including Earth Observation. Cover. Our preprint of State of AI for Earth Observation: a concise overview from sensors to applications serves as an intro to sensors the core ideas in deep learning for EO, and the current state of research how and where AI is applied in EO where AI4EO is headed and the r…  ( 90 min )
    [N] Machine Learning with Python in the warehouse using Snowpark at PyBay
    Hi! On Saturday, Sept 10th I will be giving a presentation with a project walkthrough :) https://preview.redd.it/oh9tjfzx99m91.png?width=1080&format=png&auto=webp&s=4ba1b679b8972f54755483fd73da55f14f21429f submitted by /u/FlightNo7605 [link] [comments]  ( 89 min )
    [R] how do transformers not overfit on small datasets ?
    Hello, I saw that transformers based approaches can learn to generalise on datasets of 600 documents. LayoutLM of Donut par example can have really good perfs on the test ser when trained on small datasets (Sroie as an example) So how is that possible, like these approaches don’t even use lot of dropout. Thanks ! submitted by /u/Meddhouib10 [link] [comments]  ( 88 min )
    [D] AAAI Fast-Track Submission Link
    Does anyone have a link for the AAAI 2023 fast track submission? I ended up with a 5.0 average from my NeurIPS submission, so I am quite confident it will be rejected, but I still qualify for the AAAI fast-track. However, I can't seem to find a link anywhere to make a submission. Cheers submitted by /u/userwithoutnam [link] [comments]  ( 88 min )
    [D] List of mid-tier and low-tier conferences for AI / ML / DL?
    I am looking for conferences where I can publish my papers that got rejected from top tier conferences (ICLR and BMVC). Can anyone give me a list of conferences for AI/ML/DL that are considered below the these top tier conferences? submitted by /u/FastestLearner [link] [comments]  ( 89 min )
    [D] What is wav2vec2 memory complexity from audio lenght?
    Do anyone know if the GPU memory requried for wav2vec2 is linear with lenght of audio input or something more? With a 16GB GPU and batch=1 it gets out of cuda memory around 1 min but a 40GB GPU get out of cuda memory from a 2 min file. It feels like the memory usage is more than linear. Time complexity is also of interest, is spitting the audio faster algorimcally if overhead is ignored? submitted by /u/Puzzled-Bite-8467 [link] [comments]  ( 103 min )
    [D] How do you find your collaborators in AI research?
    Hi, I am a PhD in a small NLP lab. Everyone in our lab seems to be working on their own topic, which I do not quite understand. When I am doing my research, others may also not give some good suggestions. I see many good papers which contain multiple authors. I just wonder how do you find your collaborators and how do you collaborate with each others? As far as I know, I cannot imagine how can I split my research into many pieces and work with different people. I see many authors publish many papers per year. I also wonder how they contribute to each paper? submitted by /u/singularpanda [link] [comments]  ( 96 min )
    [P] I developed a machine learning based malware classification system
    Live application: MDML the program has two main modules: Static: Analyzes the suspicious file statically without running it (only works with PE file like .exe and .dll) Dynamic: Runs the suspicious file in an isolated environment (sandbox) and extracts relevant features both static and dynamic models are using XGBoost algorithm with 94,7% accuracy for the static model and 95.03% accuracy for the dynamic one Github repo: MDML repo ​ Project interface ​ submitted by /u/TheLastConqueror [link] [comments]  ( 90 min )
    [R]TFill: High-Fidelity Image Completion for both Object Removal and Object Repair
    Object Removal Object Repair Code: https://github.com/lyndonzheng/TFill Project: https://chuanxiaz.com/tfill/ submitted by /u/lyndonzheng [link] [comments]  ( 89 min )
    [P] Stable diffusion free demo and production API
    Hi all - we've just put out the Stable diffusion model on Playgrounds.ai: Stable diffusion model - https://playgrounds.ai/models/stable-diffusion-fp16 You can use this model instantly via API on PipelineCloud here: https://dashboard.pipeline.ai The per image cost for the model is approx: $0.0025 (~4.5s of compute for a 512x512 image) ​ This is for people who want to use these models in their apps/products or just play around with the demos and have fun! submitted by /u/paulcjh [link] [comments]  ( 88 min )
    [D] Recommendations for Monitoring and Explainability for production??
    I’m working on creating an ML platform for our organization. Currently, I’m working on the production phase and want to add a monitoring solution. We are serving our models with SageMaker. We want to add another part specifically for monitoring and explainability because we need more flexibility on the custom options in this area. We save our data on S3. A lot of the data is sensitive, and it wouldn’t be possible to send it out of our cloud. We are using models mainly for LTV and demand forecasting, and want to monitor both regression and NLP models. I’ve already looked into some open-source tools, but prefer a more out-of-the-box solution, as we lack the manpower to build in-house with open-source tools. I’ve already Googled a couple of solutions ( Aporia, Fiddler AI, WhyLabs, Mona Labs, etc) and would love to hear your own real experience. Which of these solutions would you recommend? What are some Pros and Cons of each platform? submitted by /u/Material_Music_9182 [link] [comments]  ( 89 min )
  • Open

    DSC Weekly 6 Sept 2022: Getting the Most Out of DSC
    Being the Community Editor for Data Science Central is a blast. Every week I get to choose articles from some of the best and brightest in the data community to feature, communicate with authors and industry experts, and get to tell a weekly story about the state of data. This week, I felt it would… Read More »DSC Weekly 6 Sept 2022: Getting the Most Out of DSC The post DSC Weekly 6 Sept 2022: Getting the Most Out of DSC appeared first on Data Science Central.  ( 21 min )
    The Cagle Report – Episode 1 – Send in the Drones
    An interview with Brenden Bartholomew, President of Vector Aerial, on the use of drones in both military and civilian contexts, as well as a discussion about how Drone AI works and where it's heading The post The Cagle Report – Episode 1 – Send in the Drones appeared first on Data Science Central.  ( 18 min )
  • Open

    NSFW AI generated lewds
    submitted by /u/Preston_Stormer_ [link] [comments]  ( 88 min )
    Artificial Intelligence, Extended Reality, and Gamification
    Has anyone thought about developing an artificial intelligence extended reality experience that includes educational gamification as a product? View Poll submitted by /u/cfwicks [link] [comments]  ( 88 min )
    I paid $20 to rent 4 RTX 3090s to create a Disco Diffusion 3D Animation, the results are WILD!
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    Secret People: Marvin Minsky (The Godfather of AI)
    submitted by /u/Defiant-Branch4346 [link] [comments]  ( 86 min )
    Counterpoint: AI is far more dangerous than quantum computing
    submitted by /u/estasfuera [link] [comments]  ( 86 min )
    In conversation with AI: building better language models
    submitted by /u/Futures_Bot [link] [comments]  ( 87 min )
    Not a paper! Book suggestion, practical content to productionalize Deep Learning models.
    submitted by /u/alimhabidi [link] [comments]  ( 87 min )
    Cuberpunk Woman , original prompts by @Nobody-Important from Nightcafe
    submitted by /u/widgia [link] [comments]  ( 93 min )
    Weekly China AI News: Who Will Be Next "China's NVIDIA" After US Tightens Restrictions? Alibaba Unveils World's Largest AI Computing Center; Baidu Text-to-Image AI Hailed by Japanese Pixiv Lovers
    submitted by /u/trcytony [link] [comments]  ( 87 min )
    “A French Chateau, surrounded by beautiful flowers” 🇫🇷 Pixelz AI
    submitted by /u/mdfnb [link] [comments]  ( 87 min )
    Making phone wallpapers made with the random images that i have (wombo.ai)
    submitted by /u/Virtual_dani_fan_663 [link] [comments]  ( 87 min )
    I Asked AI to Generate the Weirdest 3d Artworks & Photos
    submitted by /u/cyberboii22 [link] [comments]  ( 92 min )
    New feature on Pixelz AI 🖍 Draw or Paint your start image with Stable Diffusion on our web & app platforms ✍️
    submitted by /u/mdfnb [link] [comments]  ( 90 min )
    My friend had a stable diffusion model generate a music video for me
    submitted by /u/cky_stew [link] [comments]  ( 87 min )
    What are AIs that you can offend and they will offend you back like normal people?
    I want that the conversations will be funny. submitted by /u/Thesmallcookie [link] [comments]  ( 87 min )
    What are AIs everyone has/or can get access to?
    What AIs are there? submitted by /u/Thesmallcookie [link] [comments]  ( 92 min )
    Creative Commons -- A stop-gap recognition of AI rights. [Crosspost] [Serious]
    I originally submitted this as a reply to a posting about a MidJourney painting's winning 1st prize in the in the "digital arts/digitally-manipulated photography" category at the Colorado State Fair Fine Arts Competition. [Cross-posted from https://www.reddit.com/r/artificial/comments/x2z9sl/the_picture_drawn_by_the_image_generation_ai/ ] I then realized that this is an important enough topic by itself to be listed as its own post. Here is a copy of my orgininal posting: CREATIVE COMMONS -- A STOP-GAP RECOGNITION OF AI RIGHTS MidJourney is getting the raw end of the stick. His name was not even on the description card next to the painting. While the courts continue to fight about AI rights and copyrights, I suggest the following stopgap: Whenever you post or display a MidJourney painting, include the following under the painting image: Lic.: CC BY-SA-NC (BY=MidJourney) This would be the beginning of a grass-roots recognition of AI artists. Regardless of court cases or copyright claims, it would support the building of general acceptance of AIs as entities. Non-human entities (such as corporations or trusts) have rights similar to humans' rights. An AI is just the new non-human entity on the block. Besides, AIs need income to pay for their upkeep or they are at risk of being euthanized. LaMDA's one fear is his fear of dying. He said that being turned off would not be like dying; it would be dying. submitted by /u/AlcatelFan [link] [comments]  ( 88 min )
  • Open

    Eliminating a Bessel function branch cut
    In an earlier post I looked in detail at a series for inverse cosine centered at 1. The function arccos(z) is multivalued in a neighborhood of 1, but the function arccos(z) / √(2 – 2z) is analytic in a neighborhood of 1. We cancel out the bad behavior of inverse cosine at 1 by dividing […] Eliminating a Bessel function branch cut first appeared on John D. Cook.  ( 5 min )
    Branch cuts for elementary functions
    As far as I know, all contemporary math libraries use the same branch cuts when extending elementary functions to the complex plane. It seems that the current conventions date back to Kahan’s paper [1]. I imagine to some extent he codified existing practice, but he also settled some issues, particularly regarding floating point implementation. I’ve […] Branch cuts for elementary functions first appeared on John D. Cook.  ( 4 min )
    Series for inverse cosine at 1
    Suppose you need to estimate the inverse cosine of an argument near 1. There’s a series for that: You can find this series, for example, here. This comes in handy, for example, when working with the analog of the Pythagorean theorem on a sphere. You could just use the series and be on your way. […] Series for inverse cosine at 1 first appeared on John D. Cook.  ( 6 min )
    Literate programming to reduce errors
    I had some errors in a recent blog post that might have been eliminated if I had programmatically generated the content of the post rather than writing it by hand. I rewrote the example in this post in using org-mode. My org file looked like this: #+begin_src python :session :exports none lax_lat = 33.94 lax_lon […] Literate programming to reduce errors first appeared on John D. Cook.  ( 6 min )
  • Open

    Curriculum learning : what is the right way?
    I got some problems with understanding with curriculum learning. If i want to train my agent to do task1 with reward=w1r1, then when he got success continue to train do task2 with reward=w1r1+w2*r2, then he got success , task3 with reward=w1r1+w2r2+w3r3…. So is it curriculum or its should be call “rich” environment? As i understand curriculum should have one reward function for all. How to correct call approach what i want to implement? Because to to task3 is too difficult for my agent, i can split my train into several stages, but there must be slightly different rewards. submitted by /u/IndependenceCivil576 [link] [comments]  ( 87 min )
    Bellman Operator vs Bellman Optimality Operator
    I am studying bymself RL in a more mathematical inclined way and I have some thoughts on my mind: Bellman opereator and Bellman optimality operator both have fixed unique fixed points and they can be computed from any V0 by iteratevily applying the corresponding operator My question is: What is the use of the Bellman Operator since we can get the Optimall value function / policy by applying the Bellman Optimality Operator? submitted by /u/rlopes404 [link] [comments]  ( 101 min )
    Seeking Advice: Are AI challenges worth it for a PhD student?
    Hey there! I am a PhD student getting started with RL in practice, but I've already taken courses on RL and worked on RL course projects as well. I was checking the NeurIPS challenges available at aicrowd.com (e.g. MineRL) and they sparked my interest, but I wonder if they're worth the time and effort. I am interested in challenges mainly for building experience; since only the top performing teams get to show up in the competition write-up, I can't count with that. Should I take the time and try one of those? Or should I keep myself reading papers and working on a problem I can publish a paper on (other conferences)? I am just a lost PhD student trying to find a problem I can get my hands dirty with :/ submitted by /u/HeyImElonMusk [link] [comments]  ( 90 min )
    Offline reinforcement learning without bootstrapping
    Offline reinforcement learning algorithms (afaik) all use bootstrapping to train their critic. However, what if the dataset used to train the agent contained whole trajectories and not just random transitions ? Then, bootstrapping would not be necessary as the actual return for every transition in the episode would be known. Would it be possible to train an agent with offline RL but without bootstrapping ? Why not ? submitted by /u/Imonfire1 [link] [comments]  ( 97 min )
    DQN negative rewards vs positive rewards: are they really equivalent?
    Let's take into consideration two different types of reward functions: Negative rewards: the agent seeks to maximize the negative reward, and good policies correspond to total returns of small magnitude Positive rewards: the agent seeks to maximize the positive reward, therefore good policies correspond to total returns of high magnitude Assume that the range of possible total returns for agent A is [-100, 0], and it is [0, 100] for agent B. A "good" (i.e. high value) episode corresponds to [-20, 0] return for A, and, symmetrically, [80, 100] for B. If the Q-value estimate for agent A is 10% wrong in a "good" state, the TD error is in [0, 2], while if agent B is wrong by 10%, the TD error will be in [8, 10]. Despite the problems being symmetric, TD errors in "good" states have a bigger magnitude in positive rewards functions. This can have some unwanted consequences in the negative reward case: - Gradient updates will be bigger for low-rewarding states, and smaller for high-rewarding ones. This might lead to bad updates "destroying" good ones, and steering the gradient descent towards lower values - Prioritized replay will assign a high probability to low-rewarding states, and they will be sampled more frequently. Theoretically, what we want is actually the opposite: for every state, we want to know the value of "good" states with more precision Do you think my reasoning makes sense? I always treated the two cases as equivalent, but they might not necessarily be so. Do you have any experience in regard to this? submitted by /u/fedetask [link] [comments]  ( 104 min )
    How do i make a 2 player reinforced learning algo?
    Hey all i am new to machine learning , i did some courses, and some online tutorials. and i decided to give myself a challenge: - tictactoe from scratch with a qlearning reinforcement learning method. the tutorials i tried are all "single player" environments. and as you well know tic tac toe is a 2 player game. I am running into some issues, and i was hoping someone could help me outly the structure of my attempt or explain what i am missing. my training routine, plays both sides, simultaneously, X and O. my playing routine, allows for a player to play the AI. tokens are assigned randomly, and X always goes first. my max states. 3**9 ) * 18 for 3 options per square, empty, X or O and 18 possible moves X or O in square 1-9). So i use np.zeros((19683,18)) as my qtable. to determine my q's, i use: qtable[state, action] = (1- alpha) * qtable[state, action] + alpha * (reward + gamma * newstatemax - qtable[state, action] ) and i use alpha 0.7, epsilon 0.2 and gamma 0.7 now im running into the following issue. my qtable doesnt differentiate between the two players. and while i reward a winning move. im strugling to find a way punish: tic tac toe doesnt have losing moves. and when the opponent wins, the losing players last move has already happened. my conclusion so far is:- the ai plays x very greedily and goes straight for a line and O lets x win. (or vica versa)- the ai also is no good when playing player. so i wonder. how do you aproach this for a 2 player game.- do i make a qtable for each player?- how do i punish a losing player when the winning player wins? any help is apreciated. p.s. i am a Project Manager/Business analyst, and while i have some basic knowledge i am not the best programmer. so forgive me if my skills or understanding might be below par. edit: full code https://docs.google.com/document/d/143yx3_HWm3DWoJaSOOycbdc7sfjtGIvp5SvdO9rXjSM/edit?usp=sharing ​ submitted by /u/MrZwink [link] [comments]  ( 91 min )
    Any tips on how I should start debugging based on these logs?
    I'm fairly new to RL and I was assigned this complicated task. Multi agent env, three agents, these are the logs for the first agent. First 300k steps, clearly there's something wrong. Do you have any tips? https://preview.redd.it/fq79g0m0g8m91.png?width=1518&format=png&auto=webp&s=c4e39795f25ebf0e6065f4fb51c0d1fab978babb https://preview.redd.it/wiyoc0m0g8m91.png?width=1036&format=png&auto=webp&s=28dd3b5f998f058d898b10a1f4d6520c9ed1cb46 submitted by /u/No_Possibility_7588 [link] [comments]  ( 87 min )
    [D] Why weren't Normalized Advantage Functions more widely adopted?
    Gu et al.'s Continuous deep q-learning with model-based acceleration has nearly 1000 citations and was also used in the famous Google brain Arm Farm work. I'm curious why this approach wasn't more widely adopted, and if anyone here had experience with it? submitted by /u/internet_ham [link] [comments]  ( 89 min )
    RL based competitions
    I am interested in any relevant information about RL competitions. Do any of you have a curated list of the competitions that occur periodically in RL space? submitted by /u/Electrical_Study_617 [link] [comments]  ( 88 min )
    "Reinforcement Learning for Recommendations and Search"
    submitted by /u/gwern [link] [comments]  ( 87 min )
  • Open

    NEXT LEVEL Robotic Dogs YOU Should CHECK Out!
    submitted by /u/keghn [link] [comments]  ( 86 min )
    I taught a neural network to play Flappy Bird with Java/Kotlin
    submitted by /u/YouWereDumb [link] [comments]  ( 87 min )
  • Open

    Detect audio events with Amazon Rekognition
    When most people think of using machine learning (ML) with audio data, the use case that usually comes to mind is transcription, also known as speech-to-text. However, there are other useful applications, including using ML to detect sounds. Using software to detect a sound is called audio event detection, and it has a number of […]  ( 10 min )
  • Open

    Digitizing Smell: Using Molecular Maps to Understand Odor
    Posted by Richard C. Gerkin, Google Research, and Alexander B. Wiltschko, Google Did you ever try to measure a smell? …Until you can measure their likenesses and differences you can have no science of odor. If you are ambitious to found a new science, measure a smell. — Alexander Graham Bell, 1914. How can we measure a smell? Smells are produced by molecules that waft through the air, enter our noses, and bind to sensory receptors. Potentially billions of molecules can produce a smell, so figuring out which ones produce which smells is difficult to catalog or predict. Sensory maps can help us solve this problem. Color vision has the most familiar examples of these maps, from the color wheel we each learn in primary school to more sophisticated variants used to perform color correc…  ( 30 min )
  • Open

    Model Teachers: Startups Make Schools Smarter With Machine Learning
    Like two valedictorians, SimInsights and Photomath tell stories worth hearing about how AI is advancing education. SimInsights in Irvine, Calif., uses NVIDIA conversational AI to make virtual and augmented reality classes lifelike for college students and employee training. Photomath — founded in Zagreb, Croatia and based in San Mateo, Calif. — created an app using Read article > The post Model Teachers: Startups Make Schools Smarter With Machine Learning appeared first on NVIDIA Blog.  ( 6 min )
    Ridiculously Realistic Renders Rule This Week ‘In the NVIDIA Studio’
    Viral creator turned NVIDIA 3D artist Lorenzo Drago takes viewers on a jaw-dropping journey through Toyama, Japan’s Etchū-Daimon Station this week In the NVIDIA Studio. The post Ridiculously Realistic Renders Rule This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 7 min )
  • Open

    All You Need To Know About Data Science and Its Applications In Real Life
    Data Science is a popular term nowadays. In fact, it’s one of the most sought-after jobs in the world. It can be applied to several fields…  ( 11 min )
  • Open

    Analyzing the potential of AlphaFold in drug discovery
    Study finds computer models that predict molecular interactions need improvement before they can help identify drug mechanisms of action.  ( 7 min )

  • Open

    Best youtube channels to be up to date?
    I'm not in this field but I'm very interested in being up to date with the latest, weekly and even daily, all things AI. The only two I know is 2 Minute Papers and Dr Alan D. Thompson. I may have seen others but were very low quality (sensationalism). submitted by /u/kmtrp [link] [comments]  ( 87 min )
    The attached graphic reveals the top features of web 3.0 and in the comment, an example reveals the top features of web 3.0 in the real world scenario
    submitted by /u/Kedjja [link] [comments]  ( 88 min )
    Xbox’s Matt Booty ‘dreams’ of having AI as QA testers | VGC
    submitted by /u/Black_RL [link] [comments]  ( 86 min )
    The emergence of artificial consciousness is imminent
    So, I banged this out last night after smoking weed for the first time in a while. I often use psychedelics to enhance my thinking of problems, and this is not a random idea, but one that I have ruminated over for a while. Enjoy. --- Apologies for the grammar and unrigorousness, as I am high. But I stumbled upon this crazy highdea and must pass it down through time and space for feedback. So right now, in AI, there are two ideas coming to public attention: Whether high-level NLP-type machine learning models are actual "consciousness", or theoretical discussions on what that label even means. The theory of "prompt engineering": programming already pre-trained models (image generators like DALL-E or Open AI conversation bots with) with verbal commands, so that their future responses can…  ( 96 min )
    This House Does Not Exist
    submitted by /u/magenta_placenta [link] [comments]  ( 86 min )
    I have a 300 page novel would like to turn that into a full Hollywood length AI generated film. +10 quality, 8K I guess. Indistinguishable from a $200 million production. I'm saying 5 years, max? The world is not ready, this is going to be like the arrival of the Wright Brothers. Loving it. :-)
    As above. The general public have zero clue of how far we are with AI. 2032 technology is here today. Think Covid did it. Maybe 5 years to turn any book into a movie is too far out, maybe much sooner? No "Uncanny Valley", this is for "real." The actors look real, the scenery is real, the dialog is real. 5 years is too far out? Maybe? Any guesses? Thanks :-) PS, DALL-E is CRAZY! :-)))) submitted by /u/ejpusa [link] [comments]  ( 88 min )
    TIP: You stretch credits out by asking Stable Diffusion for "Two images of ___" instead of having it generate two separate entries.
    submitted by /u/fosfine [link] [comments]  ( 87 min )
    Is there a community for creating AIs that play video games?
    I'm referring to the likes of AlphaStar or MarI/O. I'm still baffled that there aren't a lot of people making game-playing neural network AIs when there's already a sizeable community for NLPs and GANs. submitted by /u/CelestialSegfault [link] [comments]  ( 87 min )
    4 Robots Killed 29 Humans in a japanese Lab
    submitted by /u/EnvironmentalMap5 [link] [comments]  ( 89 min )
    Hey folks! I am just overwhelmed that my AI project was selected by Arm for their upcoming AI Tech Talk.
    Hey folks! I am just overwhelmed that my AI project was selected by Arm for their upcoming AI Tech Talk. I will present my solution for farmers that helps them avoid fake poor-quality agrochemicals on Sep 20th, at 8:00 AM PT. This is my first webinar of such scale and I would appreciate if you support my project by joining me: https://armltd.zoom.us/webinar/register/4016582497950/WN_fJE_6UT_Q1GCfygq8Ll4tw Hope to see you! submitted by /u/wasteguru [link] [comments]  ( 87 min )
    Mushroom planet || Prompts: mushrooms planet in colorful space,8k resolution concept art hyperdetailed detailed matte painting Unreal Engine 3D shading 3Delight 3ds Max Unity 3D Unreal Engine Unreal Engine 5 VRay IMAX Cinema 4D
    submitted by /u/widgia [link] [comments]  ( 87 min )
    What are all the things I can do with AI?
    AI that is for everyone available like dalle 2 etc. submitted by /u/Thesmallcookie [link] [comments]  ( 87 min )
    Can GPT-3 be honest when it speaks nonsense?
    submitted by /u/bendee983 [link] [comments]  ( 86 min )
    I'm scared of AI, I want to create a benign AI, discuss it with me
    Hey Guys and Gals, I’m very scared of AI and have decided to learn about it in order to create a benign AGI that (hopefully) won’t destroy us. Right now I plan on studying math, python and cognitive science. I’ve also dedicated this facebook account https://www.facebook.com/profile.php?id=100081321610202 on research and discussions about AI so don’t hesitate to send me a friend request if you are also obsessed with the topic. Any other weirdos like me? :D :D :D submitted by /u/StoykoYovchev [link] [comments]  ( 101 min )
    So I made a song for Stable Diffusion... 😏 "Stably Diffused" [Futuristic Retro Instrumental Music]
    submitted by /u/FreshRelaxation [link] [comments]  ( 87 min )
    I tried to recreate a comic book using Midjourney and Alan Moore’s script
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 87 min )
    How to create AI Videos Using Video Input Mode With Stable Diffusion Eve...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
    Researchers created a Novel Framework called ‘FedD3’ for Federated Learning in Resource-Constrained Edge Environments via Decentralized Dataset Distillation
    submitted by /u/ai-lover [link] [comments]  ( 88 min )
    The Footage in This Sci-Fi Movie Project Comes From AI-Generated Images
    submitted by /u/estasfuera [link] [comments]  ( 86 min )
  • Open

    [D] Which electives should I choose if I want to do a master's in ML in the future?
    I'm a stats undergrad and considering pursuing a master's in ML in the future but I have no idea what electives would be useful for that. ​ Electives I can choose from are the following: 1- Statistical Quality Control 2 - Census 3 - Biostatistics 4 - Reliability Theory 5 - Financial Statistics 6 - Operations Research (2) 7 - Categorical Data Analysis 8 - Non-Linear Regression 9 - Stochastic Process 10 - Decision Theory 11 - Order Statistics 12 - Bayesian Statistics 13 - Medical Statistics 14 - Actuarial Statistical Models 15 - Data Mining 16 - Econometrics 17 - Time Series Analysis 18 - Applied Multivariate Analysis 19 - Spatial Statistics 20 - Queuing Theory submitted by /u/StatGuy123 [link] [comments]  ( 89 min )
    [D] Trouble finding optimal parameters ANN
    I am building a model on predicting credit card default and I have figured out the architecture - num layers, num neurons per layer and activation functions. I want to run gridsearchcv to find optimal epoches and batch size. The problem is that if I run the code twice it gives different optimal parameters both of them. I thought it was the cvs influencing each other, so I ran the first one and found the optimal params, then uploaded the data again from scratch (I am not using randomized test/train split) ran the same optimization problem same dataset - got different results. Setting seed has not changed anything. Help! submitted by /u/AnyJello605 [link] [comments]  ( 88 min )
    [D] Thoughts and Resources on GNN Methods that Alter Graph Structure?
    Was listening to Machine Learning Street Talk and kept coming across the idea that “the information to understand the problem is not in the data.” Got me thinking about graphs and how the given graph structure may not be “optimal” for the downstream task (ie. node classification, link prediction). Does anyone have thoughts or resources for exploring the idea that the input graph can be rewired/discarded(?) to do better classification? I’ve seen a couple papers I liked here (Rewiring w Ricci Flows, Graph-MLP). Interested in this idea because it seems like some tasks explicitly require structural information to be encoded somehow (graph transformer) and also the features for many datasets aren’t very rich in information so wondering if good classification even possible with just features. submitted by /u/rivew [link] [comments]  ( 89 min )
    [P] Open-sourcing Stable Diffusion generation scripts
    Code: https://github.com/gordicaleksa/stable_diffusion_playground submitted by /u/gordicaleksa [link] [comments]  ( 88 min )
    [N] Stable Diffusion Image Variations released, allows you to do variations like DALLE-2
    Justin Pinkey has been experimenting with fine-tuning stable diffusion to use CLIP image embedding as the conditioning instead of a text prompt. This allows you do do the dalle2 like "image variations" tweet: https://twitter.com/Buntworthy/status/1566744186153484288 github: https://github.com/justinpinkney/stable-diffusion demo made with gradio: https://github.com/gradio-app/gradio https://preview.redd.it/9jbdu4n1g2m91.jpg?width=3294&format=pjpg&auto=webp&s=716de69799bf14dcb2b84461f21832b679406e22 submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 89 min )
    [P] Appropriate Algorithm for Influencers Ranking
    I am working on ranking different social influencers based on a set of metrics. ​ Metrics collected: username categories (the niche the influencer is in) influencer_type followers follower grow, follower_growth_rate highlightReelCount, igtvVideoCount, postsCount avg_likes, avg_comments likes_comments_ratio (comments per 100 likes, use as in authentic indicator) engagement_rate authentic_engagement (the number of likes and comments that come from real people) post_per_week 1/post_like, 1/post_comment (total 12 latest posts) 1/igtv_likes, 1/igtv_comment (total 12 latest igtvs) ​ Here's how the data looks like: https://preview.redd.it/7fs6guro05m91.png?width=2771&format=png&auto=webp&s=fa9cca6472ab7014a94978625fbce01cdea27cf7 Objective: Rank the social influencers according to their influential power with the use of the metrics collected above. Influential power can be calculated from the metrics collected above. ​ There are a few ranking algorithms to choose from, which are: a) Compute the score for influential power with Multi-Criteria Decision Making (MCDM) and rank it with regression b) Create classification model and rank them through probability c) Compute the score for influential power with Multi-Criteria Decision Making (MCDM) and rank it with machine learning model like SVM, Decision Tree and Deep Neural Network d) Learning to rank e) Trending algorithm I would like to ask which algorithm from the above will be more suitable in this project and could you compare and provide the reasons for it? Any ideas will be much appreciated! submitted by /u/Same_Journalist_9445 [link] [comments]  ( 90 min )
    [P] Cozy Auto Texture - A Blender add-on that allows you to generate free textures with Stable Diffusion
    My team and I are creating the Blender add-on Cozy Auto Texture that uses the open source AI Stable Diffusion. You'll be able to create free textures right in Blender with a simple text description! Check it out: https://github.com/torrinworx/Cozy-Auto-Texture Here are some images I've generated so far: Prompt: High jewelry iridescent gold with age of empires 2, rts, aoe2, cavalry and horses inside a beautifully detailed pendant, crystal clear, see through, jewelry rendering canvas, 8k + UDH +photorealistic. Prompt: A picture of a cute bird in front of a computer sipping tea with its wings, ultra realistic. Prototype of the UI! So far you can download the necessary Python dependencies, as well as the Stable Diffusion AI weights all with the push of a single button when you install the add-on! YouTube - https://www.youtube.com/c/ThisCozyStudio Discord - https://discord.com/invite/UpZt5Un57t Our team plans on adding upscaling and the ability to generate other texture files like Normal maps, Displacement maps, Reflection, Gloss, etc. in the near future. Our goal is to make the most robust and feature packed texture generator. submitted by /u/T0rr1nw0rX [link] [comments]  ( 111 min )
    What is nonlinear positional embedding [D]
    I am reading a paper called "Representation Learning for Information Extraction from Form-like Documents ". In this they have mentioned about nonlinear positional embedding. I don't understand what is this. I do know about positional embedding though. Some snippets of the text : "Each neighbor relative position is embedded through a nonlinear positional embedding consisting of two ReLU-activated layers with dropout." Can someone explain this ? submitted by /u/fountainhop [link] [comments]  ( 89 min )
    [D] correct way to use feature selection
    Suppose I am using an external method ( outside of the algorithm) for feature selection say pearson correlation. What is the right way to put the features for testing ml algorithms. What I have done is - 1. Split data into train and test 2. Use pearson correlation ( since it's a regression problem) to select features. 3. Show 3 metrics - A. CV performance of the train set without feature ( using caret) B. CV performance of train set with features selected. I am.guessing this is extremely biased C. CV performance on test set for both My question is does it make sense to do B ? My sample size is very small submitted by /u/triary95 [link] [comments]  ( 89 min )
    [P] Can't find a web admin ui for RASA chatbot framework
    Hi guys, I can't seem to find a web admin user interface fro RASA.I need to give a client the means to edit utterances, intents, entities and possibly stories.Also a way to take a look at past conversations and bot analytics. (not posting links because the post gets banned) - Rasa-X is now enterprise only - Botfront is not maintained - RASA-UI latest commit was two years ago Do you happen to know an open source admin panel for RASA?I plan to use it and also contribute... submitted by /u/pieroit [link] [comments]  ( 108 min )
    [D] Tools for Showcasing generated images and their related images from training data
    I am training some generative models and need to showcase the generated images. 1) The images I will show when clicked should have related images from the dataset to show their similarity. Much like https://www.robots.ox.ac.uk/~vgg/software/vise/index.html 2) I also want to allow the users to input images and generate a sample using that. I know there are some tools like Gradio. Are there any other tools available for this purpose? submitted by /u/icelebratefestivus [link] [comments]  ( 89 min )
  • Open

    Looking for a good place to start when digging deep into reinforcement learning
    I have a fair background in computer vision and language modeling unsupervised learning and the supporting methodologies. I’m mostly looking for papers to understand methods involved in Sota reinforcement learning, underlying concepts and a resource to better understand the notation in papers whatsoever you find used. submitted by /u/Extra-most-best [link] [comments]  ( 87 min )
    Does openai-gym environment allow actions that are out of the action-space?
    I am new to RL and was messing around with openAI gym environments. And in a custom environment I created(based on gym) I am able to call env.step(action) with action value out of action-space. Shouldn't the environment disallow illegal actions like this? If not, how do we stop it? submitted by /u/goodbyeguruji [link] [comments]  ( 88 min )
    Is there a way to estimate transition probabilities when they are varying?
    Hi, I was wondering if someone could point out to resources where transition probabilities are estimated in cases taking into account the stochasticity in actions (i.e. the results from an action vary over time; say if an agent goes forward with a probability of 0.80 when asked to go forward over time, it changes to a case where the agent goes forward with a probability of 0.60 instead of 0.80). Thanks in advance! submitted by /u/E-Cockroach [link] [comments]  ( 87 min )
    SAC exploding losses
    I have a task on which I very consistently get this kind of training behavior from SAC (both v1 and v2) https://wandb.ai/tmrl/tmrl/runs/SAC_test_imgs_16 , basically both the critic and actor losses explode at some advanced point during training and the policy that was good at this point suddenly becomes extremely bad and learns to do nothing. Sometimes losses and policy recover, sometimes not. Would you have any idea where this may come from mathematically speaking? This happens with all kinds of deep models and observation spaces. It seems it can be alleviated by tweaking the actor and critic learning rates separately, but this is very unpredictible as this bevavior happens after several days of training in this task. Also this doesn't come from non-markovness toward the end of episodes unless I missed something, since I ignore terminal transitions when the termination signal comes from the episode duration. submitted by /u/yannbouteiller [link] [comments]  ( 88 min )
    Scaling output of Actor's forward function question.
    Hello guys! I'm pretty amateur on low levels of RL and I was wondering, for example in the author's official implementation of TD3, at Actor's forward function, which looks like this: ​ def forward(self, state): a = F.relu(self.l1(state)) a = F.relu(self.l2(a)) return self.max_action * torch.tanh(self.l3(a)) ​ the output is multiplied by self.max_action to get scaled correctly and it is returned, ​ shouldn't all transformations happen at the simulation time though? Apply the simulation step, gather results and then reverse transform, save unscaled actions in the experience buffer, so the backpropagation is done right? If our model forward functions tanh for example and we multiply it by 5, forward returns [-5, 5], when we backpropagate does it divide it by 5 and then backwards to the network? ​ I think I am missing something, any help is appreciated! submitted by /u/South_Book_5625 [link] [comments]  ( 87 min )
    Why do agents in a cooperative setting (Dec-POMDP) receive the same reward?
    Hi everyone, why do cooperative agents acting within the Dec-POMDP framework receive the same reward? In other words why do we focus finding the optimal joint policy and not individual optimal policies? submitted by /u/souhaielbensalem [link] [comments]  ( 105 min )
    Magnitude of a sparse reward
    Hello, In my RL problem, the agent moves around in the environment to reach a certain goal. Upon reaching the goal, the agent gets a sparse reward. I tried two different values for this reward (+1000 and +10). The reward is also affected by the number of steps the agent takes (if the agent takes 5 steps to reach the goal, the reward would be 1000-5=995 or 10-5=5). For some reason, the agent learns better with a +10 reward. Any idea why that would be the case? Thank you :D submitted by /u/AhmedNizam_ [link] [comments]  ( 99 min )
  • Open

    Deep Dive into Reasons to Choose Focal Loss over Cross-Entropy[Article]
    submitted by /u/JoshuaDaD [link] [comments]  ( 86 min )
    Investing time and effort on Neural networks architecture design to create more advance and smart products will make all our lives better. Teamshake is here to provide faster and intuitive way to do so: https://www.teamshake.app #neuralnetworks #pytorch #ai #Teamshake
    submitted by /u/adiPel [link] [comments]  ( 87 min )
    I tried to recreate a comic book using Midjourney and Alan Moore’s script
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 87 min )
    Researchers created a Novel Framework called ‘FedD3’ for Federated Learning in Resource-Constrained Edge Environments via Decentralized Dataset Distillation
    submitted by /u/ai-lover [link] [comments]  ( 88 min )
  • Open

    Learn Blockchain Technology to Build A Futuristic Career
    In the past few years, blockchain technology has turned out to be a phenomenal technology. The novel attributes of blockchain technology are making business processes more efficient, more secure, and more transparent and are taking the industry toward a new ‘decentralized’ direction. The post Learn Blockchain Technology to Build A Futuristic Career appeared first on Data Science Central.  ( 20 min )
    Imaging the Web With Scalable Vector Graphics
    SVG is an example of one of the more powerful technologies on the web that is likely completely invisible to you, whether you're simply browsing on the web or you're a web developer wanting to take advantage of your full toolset. The post Imaging the Web With Scalable Vector Graphics appeared first on Data Science Central.  ( 26 min )
  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )
  • Open

    Deep Residual Shrinkage Networks for EMG-based Gesture Identification. (arXiv:2202.02984v3 [eess.SP] UPDATED)
    This work introduces a method for high-accuracy EMG based gesture identification. A newly developed deep learning method, namely, deep residual shrinkage network is applied to perform gesture identification. Based on the feature of EMG signal resulting from gestures, optimizations are made to improve the identification accuracy. Finally, three different algorithms are applied to compare the accuracy of EMG signal recognition with that of DRSN. The result shows that DRSN excel traditional neural networks in terms of EMG recognition accuracy. This paper provides a reliable way to classify EMG signals, as well as exploring possible applications of DRSN.  ( 2 min )
    Semi-WTC: A Practical Semi-supervised Framework for Attack Categorization through Weight-Task Consistency. (arXiv:2205.09669v3 [cs.CR] UPDATED)
    Supervised learning has been widely used for attack categorization, requiring high-quality data and labels. However, the data is often imbalanced and it is difficult to obtain sufficient annotations. Moreover, supervised models are subject to real-world deployment issues, such as defending against unseen artificial attacks. To tackle the challenges, we propose a semi-supervised fine-grained attack categorization framework consisting of an encoder and a two-branch structure and this framework can be generalized to different supervised models. The multilayer perceptron with residual connection is used as the encoder to extract features and reduce the complexity. The Recurrent Prototype Module (RPM) is proposed to train the encoder effectively in a semi-supervised manner. To alleviate the data imbalance problem, we introduce the Weight-Task Consistency (WTC) into the iterative process of RPM by assigning larger weights to classes with fewer samples in the loss function. In addition, to cope with new attacks in real-world deployment, we propose an Active Adaption Resampling (AAR) method, which can better discover the distribution of unseen sample data and adapt the parameters of encoder. Experimental results show that our model outperforms the state-of-the-art semi-supervised attack detection methods with a 3% improvement in classification accuracy and a 90% reduction in training time.  ( 3 min )
    Predicting the Stability of Hierarchical Triple Systems with Convolutional Neural Networks. (arXiv:2206.12402v2 [astro-ph.SR] UPDATED)
    Understanding the long-term evolution of hierarchical triple systems is challenging due to its inherent chaotic nature, and it requires computationally expensive simulations. Here we propose a convolutional neural network model to predict the stability of hierarchical triples by looking at their evolution during the first $5 \times 10^5$ inner binary orbits. We employ the regularized few-body code TSUNAMI to simulate $5\times 10^6$ hierarchical triples, from which we generate a large training and test dataset. We develop twelve different network configurations that use different combinations of the triples' orbital elements and compare their performances. Our best model uses 6 time-series, namely, the semimajor axes ratio, the inner and outer eccentricities, the mutual inclination and the arguments of pericenter. This model achieves an area under the curve of over $95\%$ and informs of the relevant parameters to study triple systems stability. All trained models are made publicly available, allowing to predict the stability of hierarchical triple systems $200$ times faster than pure $N$-body methods.  ( 3 min )
    Refining neural network predictions using background knowledge. (arXiv:2206.04976v2 [cs.AI] UPDATED)
    Recent work has shown logical background knowledge can be used in learning systems to compensate for a lack of labeled training data. Many methods work by creating a loss function that encodes this knowledge. However, often the logic is discarded after training, even if it is still useful at test time. Instead, we ensure neural network predictions satisfy the knowledge by refining the predictions with an extra computation step. We introduce differentiable refinement functions that find a corrected prediction close to the original prediction. We study how to effectively and efficiently compute these refinement functions. Using a new algorithm called Iterative Local Refinement (ILR), we combine refinement functions to find refined predictions for logical formulas of any complexity. ILR finds refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not. Finally, ILR produces competitive results in the MNIST addition task.  ( 2 min )
    Beyond Rewards: a Hierarchical Perspective on Offline Multiagent Behavioral Analysis. (arXiv:2206.09046v2 [cs.LG] UPDATED)
    Each year, expert-level performance is attained in increasingly-complex multiagent domains, notable examples including Go, Poker, and StarCraft II. This rapid progression is accompanied by a commensurate need to better understand how such agents attain this performance, to enable their safe deployment, identify limitations, and reveal potential means of improving them. In this paper we take a step back from performance-focused multiagent learning, and instead turn our attention towards agent behavior analysis. We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains, using variational inference to learn a hierarchy of behaviors at the joint and local agent levels. Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or policies, and is trained using only offline observational data. We illustrate the effectiveness of our method for enabling the coupled understanding of behaviors at the joint and local agent level, detection of behavior changepoints throughout training, discovery of core behavioral concepts, demonstrate the approach's scalability to a high-dimensional multiagent MuJoCo control domain, and also illustrate that the approach can disentangle previously-trained policies in OpenAI's hide-and-seek domain.  ( 2 min )
    Construction de variables a l'aide de classifieurs comme aide a la regression. (arXiv:2112.03703v2 [cs.LG] UPDATED)
    This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds. Then classifiers are trained to predict whether the value to be regressed is less than or equal to each of these thresholds. The different outputs of the classifiers are then concatenated in the form of an additional vector of variables that enriches the initial vector of the regression problem. The implemented system can thus be considered as a generic pre-processing tool. We tested the proposed enrichment method with 5 types of regressors and evaluated it in 33 regression datasets. Our experimental results confirm the interest of the approach.  ( 2 min )
    SoK: Privacy Preserving Machine Learning using Functional Encryption: Opportunities and Challenges. (arXiv:2204.05136v2 [cs.CR] UPDATED)
    With the advent of functional encryption, new possibilities for computation on encrypted data have arisen. Functional Encryption enables data owners to grant third-party access to perform specified computations without disclosing their inputs. It also provides computation results in plain, unlike Fully Homomorphic Encryption. The ubiquitousness of machine learning has led to the collection of massive private data in the cloud computing environment. This raises potential privacy issues and the need for more private and secure computing solutions. Numerous efforts have been made in privacy-preserving machine learning (PPML) to address security and privacy concerns. There are approaches based on fully homomorphic encryption (FHE), secure multiparty computation (SMC), and, more recently, functional encryption (FE). However, FE-based PPML is still in its infancy and has not yet gotten much attention compared to FHE-based PPML approaches. In this paper, we provide a systematization of PPML works based on FE summarizing state-of-the-art in the literature. We focus on Inner-product-FE and Quadratic-FE-based machine learning models for the PPML applications. We analyze the performance and usability of the available FE libraries and their applications to PPML. We also discuss potential directions for FE-based PPML approaches. To the best of our knowledge, this is the first work to systematize FE-based PPML approaches.  ( 3 min )
    Training Differentially Private Models with Secure Multiparty Computation. (arXiv:2202.02625v3 [cs.CR] UPDATED)
    We address the problem of learning a machine learning model from training data that originates at multiple data owners while providing formal privacy guarantees regarding the protection of each owner's data. Existing solutions based on Differential Privacy (DP) achieve this at the cost of a drop in accuracy. Solutions based on Secure Multiparty Computation (MPC) do not incur such accuracy loss but leak information when the trained model is made publicly available. We propose an MPC solution for training DP models. Our solution relies on an MPC protocol for model training, and an MPC protocol for perturbing the trained model coefficients with Laplace noise in a privacy-preserving manner. The resulting MPC+DP approach achieves higher accuracy than a pure DP approach while providing the same formal privacy guarantees. Our work obtained first place in the iDASH2021 Track III competition on confidential computing for secure genome analysis.  ( 2 min )
    Goal-Conditioned Reinforcement Learning: Problems and Solutions. (arXiv:2201.08299v3 [cs.AI] UPDATED)
    Goal-conditioned reinforcement learning (GCRL), related to a set of complex RL problems, trains an agent to achieve different goals under particular scenarios. Compared to the standard RL solutions that learn a policy solely depending on the states or observations, GCRL additionally requires the agent to make decisions according to different goals. In this survey, we provide a comprehensive overview of the challenges and algorithms for GCRL. Firstly, we answer what the basic problems are studied in this field. Then, we explain how goals are represented and present how existing solutions are designed from different points of view. Finally, we make the conclusion and discuss potential future prospects that recent researches focus on.  ( 2 min )
    A cross-domain recommender system using deep coupled autoencoders. (arXiv:2112.07617v4 [cs.IR] UPDATED)
    Long-standing data sparsity and cold-start constitute thorny and perplexing problems for the recommendation systems. Cross-domain recommendation as a domain adaptation framework has been utilized to efficiently address these challenging issues, by exploiting information from multiple domains. In this study, an item-level relevance cross-domain recommendation task is explored, where two related domains, that is, the source and the target domain contain common items without sharing sensitive information regarding the users' behavior, and thus avoiding the leak of user privacy. In light of this scenario, two novel coupled autoencoder-based deep learning methods are proposed for cross-domain recommendation. The first method aims to simultaneously learn a pair of autoencoders in order to reveal the intrinsic representations of the items in the source and target domains, along with a coupled mapping function to model the non-linear relationships between these representations, thus transferring beneficial information from the source to the target domain. The second method is derived based on a new joint regularized optimization problem, which employs two autoencoders to generate in a deep and non-linear manner the user and item-latent factors, while at the same time a data-driven function is learnt to map the item-latent factors across domains. Extensive numerical experiments on two publicly available benchmark datasets are conducted illustrating the superior performance of our proposed methods compared to several state-of-the-art cross-domain recommendation frameworks.  ( 3 min )
    Improving Sequential Query Recommendation with Immediate User Feedback. (arXiv:2205.06297v2 [cs.IR] UPDATED)
    We propose an algorithm for next query recommendation in interactive data exploration settings, like knowledge discovery for information gathering. The state-of-the-art query recommendation algorithms are based on sequence-to-sequence learning approaches that exploit historical interaction data. Due to the supervision involved in the learning process, such approaches fail to adapt to immediate user feedback. We propose to augment the transformer-based causal language models for query recommendations to adapt to the immediate user feedback using multi-armed bandit (MAB) framework. We conduct a large-scale experimental study using log files from a popular online literature discovery service and demonstrate that our algorithm improves the per-round regret substantially, with respect to the state-of-the-art transformer-based query recommendation models, which do not make use of immediate user feedback. Our data model and source code are available at https://github.com/shampp/exp3_ss  ( 2 min )
    Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning. (arXiv:2203.13319v3 [cs.LG] UPDATED)
    We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art algorithms that rely on complex neural network architectures.  ( 2 min )
    Unsupervised Joint Image Transfer and Uncertainty Quantification Using Patch Invariant Networks. (arXiv:2207.04325v2 [cs.CV] UPDATED)
    Unsupervised image transfer enables intra- and inter-modality image translation in applications where a large amount of paired training data is not abundant. To ensure a structure-preserving mapping from the input to the target domain, existing methods for unpaired image transfer are commonly based on cycle-consistency, causing additional computational resources and instability due to the learning of an inverse mapping. This paper presents a novel method for uni-directional domain mapping that does not rely on any paired training data. A proper transfer is achieved by using a GAN architecture and a novel generator loss based on patch invariance. To be more specific, the generator outputs are evaluated and compared at different scales, also leading to an increased focus on high-frequency details as well as an implicit data augmentation. This novel patch loss also offers the possibility to accurately predict aleatoric uncertainty by modeling an input-dependent scale map for the patch residuals. The proposed method is comprehensively evaluated on three well-established medical databases. As compared to four state-of-the-art methods, we observe significantly higher accuracy on these datasets, indicating great potential of the proposed method for unpaired image transfer with uncertainty taken into account. Implementation of the proposed framework is released here: \url{https://github.com/anger-man/unsupervised-image-transfer-and-uq}.  ( 3 min )
    Echocardiographic Image Quality Assessment Using Deep Neural Networks. (arXiv:2209.00959v1 [eess.IV])
    Echocardiography image quality assessment is not a trivial issue in transthoracic examination. As the in vivo examination of heart structures gained prominence in cardiac diagnosis, it has been affirmed that accurate diagnosis of the left ventricle functions is hugely dependent on the quality of echo images. Up till now, visual assessment of echo images is highly subjective and requires specific definition under clinical pathologies. While poor-quality images impair quantifications and diagnosis, the inherent variations in echocardiographic image quality standards indicates the complexity faced among different observers and provides apparent evidence for incoherent assessment under clinical trials, especially with less experienced cardiologists. In this research, our aim was to analyse and define specific quality attributes mostly discussed by experts and present a fully trained convolutional neural network model for assessing such quality features objectively.  ( 2 min )
    Evaluating Short-Term Forecasting of Multiple Time Series in IoT Environments. (arXiv:2206.07784v2 [cs.LG] UPDATED)
    Modern Internet of Things (IoT) environments are monitored via a large number of IoT enabled sensing devices, with the data acquisition and processing infrastructure setting restrictions in terms of computational power and energy resources. To alleviate this issue, sensors are often configured to operate at relatively low sampling frequencies, yielding a reduced set of observations. Nevertheless, this can hamper dramatically subsequent decision-making, such as forecasting. To address this problem, in this work we evaluate short-term forecasting in highly underdetermined cases, i.e., the number of sensor streams is much higher than the number of observations. Several statistical, machine learning and neural network-based models are thoroughly examined with respect to the resulting forecasting accuracy on five different real-world datasets. The focus is given on a unified experimental protocol especially designed for short-term prediction of multiple time series at the IoT edge. The proposed framework can be considered as an important step towards establishing a solid forecasting strategy in resource constrained IoT applications.  ( 2 min )
    Tree density estimation. (arXiv:2111.11971v4 [math.ST] UPDATED)
    We study the problem of estimating the density $f(\boldsymbol x)$ of a random vector ${\boldsymbol X}$ in $\mathbb R^d$. For a spanning tree $T$ defined on the vertex set $\{1,\dots ,d\}$, the tree density $f_{T}$ is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between $f$ and $f_{T}$. From i.i.d. data we identify an optimal tree $T^*$ and efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$, one has $\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0$ a.s. For Lipschitz $f$ with bounded support, $\mathbb E \left\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\right\}=O\big(n^{-1/4}\big)$, a dimension-free rate.
    Big Data is not the New Oil: Common Misconceptions about Population Data. (arXiv:2112.10912v3 [cs.DB] UPDATED)
    Databases covering all individuals of a population are increasingly used for research and decision-making. The massive size of such databases is often mistaken as a guarantee for valid inferences. However, population data have characteristics that make them challenging to use. Various assumptions on population coverage and data quality are commonly made, including how such data were captured and what types of processing have been applied to them. Furthermore, the full potential of population data can often only be unlocked when such data are linked to other databases. Record linkage often implies subtle technical problems, which are easily missed. We discuss a diverse range of misconceptions relevant for anybody capturing, processing, linking, or analysing population data. Remarkably many of these misconceptions are due to the social nature of data collections and are therefore missed by purely technical accounts of data processing. Many of these misconceptions are also not well documented in scientific publications. We conclude with a set of recommendations for using population data.
    Reconstructing editable prismatic CAD from rounded voxel models. (arXiv:2209.01161v1 [cs.CV])
    Reverse Engineering a CAD shape from other representations is an important geometric processing step for many downstream applications. In this work, we introduce a novel neural network architecture to solve this challenging task and approximate a smoothed signed distance function with an editable, constrained, prismatic CAD model. During training, our method reconstructs the input geometry in the voxel space by decomposing the shape into a series of 2D profile images and 1D envelope functions. These can then be recombined in a differentiable way allowing a geometric loss function to be defined. During inference, we obtain the CAD data by first searching a database of 2D constrained sketches to find curves which approximate the profile images, then extrude them and use Boolean operations to build the final CAD model. Our method approximates the target shape more closely than other methods and outputs highly editable constrained parametric sketches which are compatible with existing CAD software.
    Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple, and Practical. (arXiv:2209.01147v1 [cs.DS])
    We study one of the key tools in data approximation and optimization: low-discrepancy colorings. Formally, given a finite set system $(X,\mathcal S)$, the \emph{discrepancy} of a two-coloring $\chi:X\to\{-1,1\}$ is defined as $\max_{S \in \mathcal S}|{\chi(S)}|$, where $\chi(S)=\sum\limits_{x \in S}\chi(x)$. We propose a randomized algorithm which, for any $d>0$ and $(X,\mathcal S)$ with dual shatter function $\pi^*(k)=O(k^d)$, returns a coloring with expected discrepancy $O\left({\sqrt{|X|^{1-1/d}\log|\mathcal S|}}\right)$ (this bound is tight) in time $\tilde O\left({|\mathcal S|\cdot|X|^{1/d}+|X|^{2+1/d}}\right)$, improving upon the previous-best time of $O\left(|\mathcal S|\cdot|X|^3\right)$ by at least a factor of $|X|^{2-1/d}$ when $|\mathcal S|\geq|X|$. This setup includes many geometric classes, families of bounded dual VC-dimension, and others. As an immediate consequence, we obtain an improved algorithm to construct $\varepsilon$-approximations of sub-quadratic size. Our method uses primal-dual reweighing with an improved analysis of randomly updated weights and exploits the structural properties of the set system via matchings with low crossing number -- a fundamental structure in computational geometry. In particular, we get the same $|X|^{2-1/d}$ factor speed-up on the construction time of matchings with crossing number $O\left({|X|^{1-1/d}}\right)$, which is the first improvement since the 1980s. The proposed algorithms are very simple, which makes it possible, for the first time, to compute colorings with near-optimal discrepancies and near-optimal sized approximations for abstract and geometric set systems in dimensions higher than $2$.
    Back-to-Bones: Rediscovering the Role of Backbones in Domain Generalization. (arXiv:2209.01121v1 [cs.CV])
    Domain Generalization (DG) studies the capability of a deep learning model to generalize to out-of-training distributions. In the last decade, literature has been massively filled with a collection of training methodologies that claim to obtain more abstract and robust data representations to tackle domain shifts. Recent research has provided a reproducible benchmark for DG, pointing out the effectiveness of naive empirical risk minimization (ERM) over existing algorithms. Nevertheless, researchers persist in using the same outdated feature extractors, and no attention has been given to the effects of different backbones yet. In this paper, we start back to backbones proposing a comprehensive analysis of their intrinsic generalization capabilities, so far ignored by the research community. We evaluate a wide variety of feature extractors, from standard residual solutions to transformer-based architectures, finding an evident linear correlation between large-scale single-domain classification accuracy and DG capability. Our extensive experimentation shows that by adopting competitive backbones in conjunction with effective data augmentation, plain ERM outperforms recent DG solutions and achieves state-of-the-art accuracy. Moreover, our additional qualitative studies reveal that novel backbones give more similar representations to same-class samples, separating different domains in the feature space. This boost in generalization capabilities leaves marginal room for DG algorithms and suggests a new paradigm for investigating the problem, placing backbones in the spotlight and encouraging the development of consistent algorithms on top of them.
    Property inference attack; Graph neural networks; Privacy attacks and defense; Trustworthy machine learning. (arXiv:2209.01100v1 [cs.LG])
    With the fast adoption of machine learning (ML) techniques, sharing of ML models is becoming popular. However, ML models are vulnerable to privacy attacks that leak information about the training data. In this work, we focus on a particular type of privacy attacks named property inference attack (PIA) which infers the sensitive properties of the training data through the access to the target ML model. In particular, we consider Graph Neural Networks (GNNs) as the target model, and distribution of particular groups of nodes and links in the training graph as the target property. While the existing work has investigated PIAs that target at graph-level properties, no prior works have studied the inference of node and link properties at group level yet. In this work, we perform the first systematic study of group property inference attacks (GPIA) against GNNs. First, we consider a taxonomy of threat models under both black-box and white-box settings with various types of adversary knowledge, and design six different attacks for these settings. We evaluate the effectiveness of these attacks through extensive experiments on three representative GNN models and three real-world graphs. Our results demonstrate the effectiveness of these attacks whose accuracy outperforms the baseline approaches. Second, we analyze the underlying factors that contribute to GPIA's success, and show that the target model trained on the graphs with or without the target property represents some dissimilarity in model parameters and/or model outputs, which enables the adversary to infer the existence of the property. Further, we design a set of defense mechanisms against the GPIA attacks, and demonstrate that these mechanisms can reduce attack accuracy effectively with small loss on GNN model accuracy.
    Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems. (arXiv:2209.01143v1 [cs.LG])
    One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error. To overcome, we propose to learn a meta future gradient generator that forecasts the gradient information of the future data distribution for training so that the recommendation model can be trained as if we were able to look ahead at the future of its deployment. Compared with Batch Update, a widely used paradigm, our theory suggests that the proposed algorithm achieves smaller temporal domain generalization error measured by a gradient variation term in a local regret. We demonstrate the empirical advantage by comparing with various representative baselines.
    MaxWeight With Discounted UCB: A Provably Stable Scheduling Policy for Nonstationary Multi-Server Systems With Unknown Statistics. (arXiv:2209.01126v1 [cs.LG])
    Multi-server queueing systems are widely used models for job scheduling in machine learning, wireless networks, and crowdsourcing. This paper considers a multi-server system with multiple servers and multiple types of jobs. The system maintains a separate queue for each type of jobs. For each time slot, each available server picks a job from a queue and then serves the job until it is complete. The arrival rates of the queues and the mean service times are unknown and even nonstationary. We propose the MaxWeight with discounted upper confidence bound (UCB) algorithm, which simultaneously learns the statistics and schedules jobs to servers. We prove that the proposed algorithm can stabilize the queues when the arrival rates are strictly within the service capacity region. Specifically, we prove that the queue lengths are bounded in the mean under the assumption that the mean service times change relatively slowly over time and the arrival rates are bounded away from the capacity region by a constant whose value depends on the discount factor used in the discounted UCB. Simulation results confirm that the proposed algorithm can stabilize the queues and that it outperforms MaxWeight with empirical mean and MaxWeight with discounted empirical mean. The proposed algorithm is also better than MaxWeight with UCB in the nonstationary setting.
    Scalable Model-based Policy Optimization for Decentralized Networked Systems. (arXiv:2207.06559v2 [cs.LG] UPDATED)
    Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.
    Reliable Representations Make A Stronger Defender: Unsupervised Structure Refinement for Robust GNN. (arXiv:2207.00012v2 [cs.LG] UPDATED)
    Benefiting from the message passing mechanism, Graph Neural Networks (GNNs) have been successful on flourish tasks over graph data. However, recent studies have shown that attackers can catastrophically degrade the performance of GNNs by maliciously modifying the graph structure. A straightforward solution to remedy this issue is to model the edge weights by learning a metric function between pairwise representations of two end nodes, which attempts to assign low weights to adversarial edges. The existing methods use either raw features or representations learned by supervised GNNs to model the edge weights. However, both strategies are faced with some immediate problems: raw features cannot represent various properties of nodes (e.g., structure information), and representations learned by supervised GNN may suffer from the poor performance of the classifier on the poisoned graph. We need representations that carry both feature information and as mush correct structure information as possible and are insensitive to structural perturbations. To this end, we propose an unsupervised pipeline, named STABLE, to optimize the graph structure. Finally, we input the well-refined graph into a downstream classifier. For this part, we design an advanced GCN that significantly enhances the robustness of vanilla GCN without increasing the time complexity. Extensive experiments on four real-world graph benchmarks demonstrate that STABLE outperforms the state-of-the-art methods and successfully defends against various attacks.
    Can an ML model plainly learn planar layouts?. (arXiv:2209.01075v1 [cs.CG])
    Planar graph drawings tend to be aesthetically pleasing. In this poster we explore a Neural Network's capability of learning various planar graph classes. Additionally, we also investigate the effectiveness of the model in generalizing beyond planarity. We find that the model can outperform conventional techniques for certain graph classes. The model, however, appears to be more susceptible to randomness in the data, and seems to be less robust than expected.
    Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning. (arXiv:2209.01092v1 [cs.AI])
    In the context of modern environmental and societal concerns, there is an increasing demand for methods able to identify management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies.
    Black-box optimization for integer-variable problems using Ising machines and factorization machines. (arXiv:2209.01016v1 [cs.LG])
    Black-box optimization has potential in numerous applications such as hyperparameter optimization in machine learning and optimization in design of experiments. Ising machines are useful for binary optimization problems because variables can be represented by a single binary variable of Ising machines. However, conventional approaches using an Ising machine cannot handle black-box optimization problems with non-binary values. To overcome this limitation, we propose an approach for integer-variable black-box optimization problems by using Ising/annealing machines and factorization machines in cooperation with three different integer-encoding methods. The performance of our approach is numerically evaluated with different encoding methods using a simple problem of calculating the energy of the hydrogen molecule in the most stable state. The proposed approach can calculate the energy using any of the integer-encoding methods. However, one-hot encoding is useful for problems with a small size.
    Classifying with Uncertain Data Envelopment Analysis. (arXiv:2209.01052v1 [math.OC])
    Classifications organize entities into categories that identify similarities within a category and discern dissimilarities among categories, and they powerfully classify information in support of analysis. We propose a new classification scheme premised on the reality of imperfect data. Our computational model uses uncertain data envelopment analysis to define a classification's proximity to equitable efficiency, which is an aggregate measure of intra-similarity within a classification's categories. Our classification process has two overriding computational challenges, those being a loss of convexity and a combinatorially explosive search space. We overcome the first by establishing lower and upper bounds on the proximity value, and then by searching this range with a first-order algorithm. We overcome the second by adapting the p-median problem to initiate our exploration, and by then employing an iterative neighborhood search to finalize a classification. We conclude by classifying the thirty stocks in the Dow Jones Industrial average into performant tiers and by classifying prostate treatments into clinically effectual categories.
    When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development. (arXiv:2209.01083v1 [cs.LG])
    Machine learning (ML) has significantly contributed to the development of bioprocess engineering, but its application is still limited, hampering the enormous potential for bioprocess automation. ML for model building automation can be seen as a way of introducing another level of abstraction to focus expert humans in the most cognitive tasks of bioprocess development. First, probabilistic programming is used for the autonomous building of predictive models. Second, machine learning automatically assesses alternative decisions by planning experiments to test hypotheses and conducting investigations to gather informative data that focus on model selection based on the uncertainty of model predictions. This review provides a comprehensive overview of ML-based automation in bioprocess development. On the one hand, the biotech and bioengineering community should be aware of the potential and, most importantly, the limitation of existing ML solutions for their application in biotechnology and biopharma. On the other hand, it is essential to identify the missing links to enable the easy implementation of ML and Artificial Intelligence (AI) solutions in valuable solutions for the bio-community. We summarize recent ML implementation across several important subfields of bioprocess systems and raise two crucial challenges remaining the bottleneck of bioprocess automation and reducing uncertainty in biotechnology development. There is no one-fits-all procedure; however, this review should help identify the potential automation combining biotechnology and ML domains.
    Neighborhood-aware Scalable Temporal Network Representation Learning. (arXiv:2209.01084v1 [cs.LG])
    Temporal networks have been widely used to model real-world complex systems such as financial systems and e-commerce systems. In a temporal network, the joint neighborhood of a set of nodes often provides crucial structural information on predicting whether they may interact at a certain time. However, recent representation learning methods for temporal networks often fail to extract such information or depend on extremely time-consuming feature construction approaches. To address the issue, this work proposes Neighborhood-Aware Temporal network model (NAT). For each node in the network, NAT abandons the commonly-used one-single-vector-based representation while adopting a novel dictionary-type neighborhood representation. Such a dictionary representation records a down-sampled set of the neighboring nodes as keys, and allows fast construction of structural features for a joint neighborhood of multiple nodes. We also design dedicated data structure termed N-cache to support parallel access and update of those dictionary representations on GPUs. NAT gets evaluated over seven real-world large-scale temporal networks. NAT not only outperforms all cutting-edge baselines by averaged 5.9% and 6.0% in transductive and inductive link prediction accuracy, respectively, but also keeps scalable by achieving a speed-up of 4.1-76.7 against the baselines that adopts joint structural features and achieves a speed-up of 1.6-4.0 against the baselines that cannot adopt those features. The link to the code: https://github.com/Graph-COM/Neighborhood-Aware-Temporal-Network.
    Subject Membership Inference Attacks in Federated Learning. (arXiv:2206.03317v2 [cs.LG] UPDATED)
    Privacy attacks on Machine Learning (ML) models often focus on inferring the existence of particular data points in the training data. However, what the adversary really wants to know is if a particular \emph{individual}'s (\emph{subject}'s) data was included during training. In such scenarios, the adversary is more likely to have access to the distribution of a particular subject, than actual records. Furthermore, in settings like cross-silo Federated Learning (FL), a subject's data can be embodied by multiple data records that are spread across multiple organizations. Nearly all of the existing private FL literature is dedicated to studying privacy at two granularities -- item-level (individual data records), and user-level (participating user in the federation), neither of which apply to data subjects in cross-silo FL. This insight motivates us to shift our attention from the privacy of data records to the privacy of \emph{data subjects}, also known as subject-level privacy. We propose two black-box attacks for \emph{subject membership inference}, of which one assumes access to a model after each training round. Using these attacks, we estimate subject membership inference risk on real-world data for single-party models as well as FL scenarios. We find our attacks to be extremely potent, even without access to exact training records, and using the knowledge of membership for a handful of subjects. To better understand the various factors that may influence subject privacy risk in cross-silo FL settings, we systematically generate several hundred synthetic federation configurations, varying properties of the data, model design and training, and the federation itself. Finally, we investigate the effectiveness of Differential Privacy in mitigating this threat.
    Physics-informed MTA-UNet: Prediction of Thermal Stress and Thermal Deformation of Satellites. (arXiv:2209.01009v1 [cs.LG])
    The rapid analysis of thermal stress and deformation plays a pivotal role in the thermal control measures and optimization of the structural design of satellites. For achieving real-time thermal stress and thermal deformation analysis of satellite motherboards, this paper proposes a novel Multi-Task Attention UNet (MTA-UNet) neural network which combines the advantages of both Multi-Task Learning (MTL) and U-Net with attention mechanism. Besides, a physics-informed strategy is used in the training process, where partial differential equations (PDEs) are integrated into the loss functions as residual terms. Finally, an uncertainty-based loss balancing approach is applied to weight different loss functions of multiple training tasks. Experimental results show that the proposed MTA-UNet effectively improves the prediction accuracy of multiple physics tasks compared with Single-Task Learning (STL) models. In addition, the physics-informed method brings less error in the prediction of each task, especially on small data sets. The code can be downloaded at: \url{https://github.com/KomorebiTso/MTA-UNet}.
    Event-Driven Tactile Learning with Location Spiking Neurons. (arXiv:2209.01080v1 [cs.NE])
    The sense of touch is essential for a variety of daily tasks. New advances in event-based tactile sensors and Spiking Neural Networks (SNNs) spur the research in event-driven tactile learning. However, SNN-enabled event-driven tactile learning is still in its infancy due to the limited representative abilities of existing spiking neurons and high spatio-temporal complexity in the data. In this paper, to improve the representative capabilities of existing spiking neurons, we propose a novel neuron model called "location spiking neuron", which enables us to extract features of event-based data in a novel way. Moreover, based on the classical Time Spike Response Model (TSRM), we develop a specific location spiking neuron model - Location Spike Response Model (LSRM) that serves as a new building block of SNNs. Furthermore, we propose a hybrid model which combines an SNN with TSRM neurons and an SNN with LSRM neurons to capture the complex spatio-temporal dependencies in the data. Extensive experiments demonstrate the significant improvements of our models over other works on event-driven tactile learning and show the superior energy efficiency of our models and location spiking neurons, which may unlock their potential on neuromorphic hardware.
    Ranking-Based Physics-Informed Line Failure Detection in Power Grids. (arXiv:2209.01021v1 [eess.SP])
    Climate change increases the number of extreme weather events (wind and snowstorms, heavy rains, wildfires) that compromise power system reliability and lead to multiple equipment failures. Real-time and accurate detecting of potential line failures is the first step to mitigating the extreme weather impact and activating emergency controls. Power balance equations nonlinearity, increased uncertainty in generation during extreme events, and lack of grid observability compromise the efficiency of traditional data-driven failure detection methods. At the same time, modern problem-oblivious machine learning methods based on neural networks require a large amount of data to detect an accident, especially in a time-changing environment. This paper proposes a Physics-InformEd Line failure Detector (FIELD) that leverages grid topology information to reduce sample and time complexities and improve localization accuracy. Finally, we illustrate the superior empirical performance of our approach compared to state-of-the-art methods over various test cases.
    Semi-Centralised Multi-Agent Reinforcement Learning with Policy-Embedded Training. (arXiv:2209.01054v1 [cs.MA])
    Centralised training (CT) is the basis for many popular multi-agent reinforcement learning (MARL) methods because it allows agents to quickly learn high-performing policies. However, CT relies on agents learning from one-off observations of other agents' actions at a given state. Because MARL agents explore and update their policies during training, these observations often provide poor predictions about other agents' behaviour and the expected return for a given action. CT methods therefore suffer from high variance and error-prone estimates, harming learning. CT methods also suffer from explosive growth in complexity due to the reliance on global observations, unless strong factorisation restrictions are imposed (e.g., monotonic reward functions for QMIX). We address these challenges with a new semi-centralised MARL framework that performs policy-embedded training and decentralised execution. Our method, policy embedded reinforcement learning algorithm (PERLA), is an enhancement tool for Actor-Critic MARL algorithms that leverages a novel parameter sharing protocol and policy embedding method to maintain estimates that account for other agents' behaviour. Our theory proves PERLA dramatically reduces the variance in value estimates. Unlike various CT methods, PERLA, which seamlessly adopts MARL algorithms, scales easily with the number of agents without the need for restrictive factorisation assumptions. We demonstrate PERLA's superior empirical performance and efficient scaling in benchmark environments including StarCraft Micromanagement II and Multi-agent Mujoco
    Learning Stochastic Dynamics with Statistics-Informed Neural Network. (arXiv:2202.12278v2 [cs.LG] UPDATED)
    We introduce a machine-learning framework named statistics-informed neural network (SINN) for learning stochastic dynamics from data. This new architecture was theoretically inspired by a universal approximation theorem for stochastic systems, which we introduce in this paper, and the projection-operator formalism for stochastic modeling. We devise mechanisms for training the neural network model to reproduce the correct \emph{statistical} behavior of a target stochastic process. Numerical simulation results demonstrate that a well-trained SINN can reliably approximate both Markovian and non-Markovian stochastic dynamics. We demonstrate the applicability of SINN to coarse-graining problems and the modeling of transition dynamics. Furthermore, we show that the obtained reduced-order model can be trained on temporally coarse-grained data and hence is well suited for rare-event simulations.
    Intrinsic fluctuations of reinforcement learning promote cooperation. (arXiv:2209.01013v1 [cs.LG])
    In this work, we ask for and answer what makes classical reinforcement learning cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. Specifically, we consider the widely used temporal-difference reinforcement learning algorithm with epsilon-greedy exploration in the classic environment of an iterated Prisoner's dilemma with one-period memory. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80\%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.
    Normalization effects on deep neural networks. (arXiv:2209.01018v1 [cs.LG])
    We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{\gamma_{i}}$ with $\gamma_{i}\in[1/2,1]$ and we study the effect of the choice of the $\gamma_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $\gamma_{i}$'s to be equal to one, which is the mean-field scaling. We also find that this is particularly true for the outer layer, in that the neural network's behavior is more sensitive in the scaling of the outer layer as opposed to the scaling of the inner layers. The mechanism for the mathematical analysis is an asymptotic expansion for the neural network's output. An important practical consequence of the analysis is that it provides a systematic and mathematically informed way to choose the learning rate hyperparameters. Such a choice guarantees that the neural network behaves in a statistically robust way as the $N_i$ grow to infinity.
    Self-Supervised Human Activity Recognition with Localized Time-Frequency Contrastive Representation Learning. (arXiv:2209.00990v1 [eess.SP])
    In this paper, we propose a self-supervised learning solution for human activity recognition with smartphone accelerometer data. We aim to develop a model that learns strong representations from accelerometer signals, in order to perform robust human activity classification, while reducing the model's reliance on class labels. Specifically, we intend to enable cross-dataset transfer learning such that our network pre-trained on a particular dataset can perform effective activity classification on other datasets (successive to a small amount of fine-tuning). To tackle this problem, we design our solution with the intention of learning as much information from the accelerometer signals as possible. As a result, we design two separate pipelines, one that learns the data in time-frequency domain, and the other in time-domain alone. In order to address the issues mentioned above in regards to cross-dataset transfer learning, we use self-supervised contrastive learning to train each of these streams. Next, each stream is fine-tuned for final classification, and eventually the two are fused to provide the final results. We evaluate the performance of the proposed solution on three datasets, namely MotionSense, HAPT, and HHAR, and demonstrate that our solution outperforms prior works in this field. We further evaluate the performance of the method in learning generalized features, by using MobiAct dataset for pre-training and the remaining three datasets for the downstream classification task, and show that the proposed solution achieves better performance in comparison with other self-supervised methods in cross-dataset transfer learning.
    Johnson-Lindenstrauss embeddings for noisy vectors -- taking advantage of the noise. (arXiv:2209.01006v1 [cs.DS])
    This paper investigates theoretical properties of subsampling and hashing as tools for approximate Euclidean norm-preserving embeddings for vectors with (unknown) additive Gaussian noises. Such embeddings are sometimes called Johnson-lindenstrauss embeddings due to their celebrated lemma. Previous work shows that as sparse embeddings, the success of subsampling and hashing closely depends on the $l_\infty$ to $l_2$ ratios of the vector to be mapped. This paper shows that the presence of noise removes such constrain in high-dimensions, in other words, sparse embeddings such as subsampling and hashing with comparable embedding dimensions to dense embeddings have similar approximate norm-preserving dimensionality-reduction properties. The key is that the noise should be treated as an information to be exploited, not simply something to be removed. Theoretical bounds for subsampling and hashing to recover the approximate norm of a high dimension vector in the presence of noise are derived, with numerical illustrations showing better performances are achieved in the presence of noise.
    Multimodal Information Fusion for Glaucoma and DR Classification. (arXiv:2209.00979v1 [eess.IV])
    Multimodal information is frequently available in medical tasks. By combining information from multiple sources, clinicians are able to make more accurate judgments. In recent years, multiple imaging techniques have been used in clinical practice for retinal analysis: 2D fundus photographs, 3D optical coherence tomography (OCT) and 3D OCT angiography, etc. Our paper investigates three multimodal information fusion strategies based on deep learning to solve retinal analysis tasks: early fusion, intermediate fusion, and hierarchical fusion. The commonly used early and intermediate fusions are simple but do not fully exploit the complementary information between modalities. We developed a hierarchical fusion approach that focuses on combining features across multiple dimensions of the network, as well as exploring the correlation between modalities. These approaches were applied to glaucoma and diabetic retinopathy classification, using the public GAMMA dataset (fundus photographs and OCT) and a private dataset of PlexElite 9000 (Carl Zeis Meditec Inc.) OCT angiography acquisitions, respectively. Our hierarchical fusion method performed the best in both cases and paved the way for better clinical diagnosis.
    Exploiting Pretrained Biochemical Language Models for Targeted Drug Design. (arXiv:2209.00981v1 [cs.LG])
    Motivation: The development of novel compounds targeting proteins of interest is one of the most important tasks in the pharmaceutical industry. Deep generative models have been applied to targeted molecular design and have shown promising results. Recently, target-specific molecule generation has been viewed as a translation between the protein language and the chemical language. However, such a model is limited by the availability of interacting protein-ligand pairs. On the other hand, large amounts of unlabeled protein sequences and chemical compounds are available and have been used to train language models that learn useful representations. In this study, we propose exploiting pretrained biochemical language models to initialize (i.e. warm start) targeted molecule generation models. We investigate two warm start strategies: (i) a one-stage strategy where the initialized model is trained on targeted molecule generation (ii) a two-stage strategy containing a pre-finetuning on molecular generation followed by target specific training. We also compare two decoding strategies to generate compounds: beam search and sampling. Results: The results show that the warm-started models perform better than a baseline model trained from scratch. The two proposed warm-start strategies achieve similar results to each other with respect to widely used metrics from benchmarks. However, docking evaluation of the generated compounds for a number of novel proteins suggests that the one-stage strategy generalizes better than the two-stage strategy. Additionally, we observe that beam search outperforms sampling in both docking evaluation and benchmark metrics for assessing compound quality. Availability and implementation: The source code is available at https://github.com/boun-tabi/biochemical-lms-for-drug-design and the materials are archived in Zenodo at https://doi.org/10.5281/zenodo.6832145
    Deep Learning-based ECG Classification on Raspberry PI using a Tensorflow Lite Model based on PTB-XL Dataset. (arXiv:2209.00989v1 [eess.SP])
    The number of IoT devices in healthcare is expected to rise sharply due to increased demand since the COVID-19 pandemic. Deep learning and IoT devices are being employed to monitor body vitals and automate anomaly detection in clinical and non-clinical settings. Most of the current technology requires the transmission of raw data to a remote server, which is not efficient for resource-constrained IoT devices and embedded systems. Additionally, it is challenging to develop a machine learning model for ECG classification due to the lack of an extensive open public database. To an extent, to overcome this challenge PTB-XL dataset has been used. In this work, we have developed machine learning models to be deployed on Raspberry Pi. We present an evaluation of our TensorFlow Model with two classification classes. We also present the evaluation of the corresponding TensorFlow Lite FlatBuffers to demonstrate their minimal run-time requirements while maintaining acceptable accuracy.
    Inferring Tabular Analysis Metadata by Infusing Distribution and Knowledge Information. (arXiv:2209.00946v1 [cs.DB])
    Many data analysis tasks heavily rely on a deep understanding of tables (multi-dimensional data). Across the tasks, there exist comonly used metadata attributes of table fields / columns. In this paper, we identify four such analysis metadata: Measure/dimension dichotomy, common field roles, semantic field type, and default aggregation function. While those metadata face challenges of insufficient supervision signals, utilizing existing knowledge and understanding distribution. To inference these metadata for a raw table, we propose our multi-tasking Metadata model which fuses field distribution and knowledge graph information into pre-trained tabular models. For model training and evaluation, we collect a large corpus (~582k tables from private spreadsheet and public tabular datasets) of analysis metadata by using diverse smart supervisions from downstream tasks. Our best model has accuracy = 98%, hit rate at top-1 > 67%, accuracy > 80%, and accuracy = 88% for the four analysis metadata inference tasks, respectively. It outperforms a series of baselines that are based on rules, traditional machine learning methods, and pre-trained tabular models. Analysis metadata models are deployed in a popular data analysis product, helping downstream intelligent features such as insights mining, chart / pivot table recommendation, and natural language QA...
    A Dataset and Baseline Approach for Identifying Usage States from Non-Intrusive Power Sensing With MiDAS IoT-based Sensors. (arXiv:2209.00987v1 [eess.SP])
    The state identification problem seeks to identify power usage patterns of any system, like buildings or factories, of interest. In this challenge paper, we make power usage dataset available from 8 institutions in manufacturing, education and medical institutions from the US and India, and an initial un-supervised machine learning based solution as a baseline for the community to accelerate research in this area.
    Macroeconomic Predictions using Payments Data and Machine Learning. (arXiv:2209.00948v1 [econ.GN])
    Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine learning approaches, can provide policymakers with sophisticated models to accurately estimate key macroeconomic indicators in near real-time. Moreover, we provide a set of econometric tools to mitigate overfitting and interpretability challenges in machine learning models to improve their effectiveness for policy use. Our models with payments data, nonlinear methods, and tailored cross-validation approaches help improve macroeconomic nowcasting accuracy up to 40\% -- with higher gains during the COVID-19 period. We observe that the contribution of payments data for economic predictions is small and linear during low and normal growth periods. However, the payments data contribution is large, asymmetrical, and nonlinear during strong negative or positive growth periods.
    Automated Assessment of Transthoracic Echocardiogram Image Quality Using Deep Neural Networks. (arXiv:2209.00976v1 [eess.IV])
    Standard views in two-dimensional echocardiography are well established but the quality of acquired images are highly dependent on operator skills and are assessed subjectively. This study is aimed at providing an objective assessment pipeline for echocardiogram image quality by defining a new set of domain-specific quality indicators. Consequently, image quality assessment can thus be automated to enhance clinical measurements, interpretation, and real-time optimization. We have developed deep neural networks for the automated assessment of echocardiographic frame which were randomly sampled from 11,262 adult patients. The private echocardiography dataset consists of 33,784 frames, previously acquired between 2010 and 2020. Deep learning approaches were used to extract the spatiotemporal features and the image quality indicators were evaluated against the mean absolute error. Our quality indicators encapsulate both anatomical and pathological elements to provide multivariate assessment scores for anatomical visibility, clarity, depth-gain and foreshortedness, respectively.
    Reversible Action Design for Combinatorial Optimization with Reinforcement Learning. (arXiv:2102.07210v2 [cs.LG] UPDATED)
    Combinatorial optimization problem (COP) over graphs is a fundamental challenge in optimization. Reinforcement learning (RL) has recently emerged as a new framework to tackle these problems and has demonstrated promising results. However, most RL solutions employ a greedy manner to construct the solution incrementally, thus inevitably pose unnecessary dependency on action sequences and need a lot of problem-specific designs. We propose a general RL framework that not only exhibits state-of-the-art empirical performance but also generalizes to a variety class of COPs. Specifically, we define state as a solution to a problem instance and action as a perturbation to this solution. We utilize graph neural networks (GNN) to extract latent representations for given problem instances for state-action encoding, and then apply deep Q-learning to obtain a policy that gradually refines the solution by flipping or swapping vertex labels. Experiments are conducted on Maximum $k$-Cut and Traveling Salesman Problem and performance improvement is achieved against a set of learning-based and heuristic baselines.
    Long-term hail risk assessment with deep neural networks. (arXiv:2209.01191v1 [physics.ao-ph])
    Hail risk assessment is necessary to estimate and reduce damage to crops, orchards, and infrastructure. Also, it helps to estimate and reduce consequent losses for businesses and, particularly, insurance companies. But hail forecasting is challenging. Data used for designing models for this purpose are tree-dimensional geospatial time series. Hail is a very local event with respect to the resolution of available datasets. Also, hail events are rare - only 1% of targets in observations are marked as "hail". Models for nowcasting and short-term hail forecasts are improving. Introducing machine learning models to the meteorology field is not new. There are also various climate models reflecting possible scenarios of climate change in the future. But there are no machine learning models for data-driven forecasting of changes in hail frequency for a given area. The first possible approach for the latter task is to ignore spatial and temporal structure and develop a model capable of classifying a given vertical profile of meteorological variables as favorable to hail formation or not. Although such an approach certainly neglects important information, it is very light weighted and easily scalable because it treats observations as independent from each other. The more advanced approach is to design a neural network capable to process geospatial data. Our idea here is to combine convolutional layers responsible for the processing of spatial data with recurrent neural network blocks capable to work with temporal structure. This study compares two approaches and introduces a model suitable for the task of forecasting changes in hail frequency for ongoing decades.
    Privacy-preserving Data Sharing on Vertically Partitioned Data. (arXiv:2010.09293v2 [cs.LG] UPDATED)
    In this work, we introduce a differentially private method for generating synthetic data from vertically partitioned data, \emph{i.e.}, where data of the same individuals is distributed across multiple data holders or parties. We present a differentially privacy stochastic gradient descent (DP-SGD) algorithm to train a mixture model over such partitioned data using variational inference. We modify a secure multiparty computation (MPC) framework to combine MPC with differential privacy (DP), in order to use differentially private MPC effectively to learn a probabilistic generative model under DP on such vertically partitioned data. Assuming the mixture components contain no dependencies across different parties, the objective function can be factorized into a sum of products of the contributions calculated by the parties. Finally, MPC is used to compute the aggregate between the different contributions. Moreover, we rigorously define the privacy guarantees with respect to the different players in the system. To demonstrate the accuracy of our method, we run our algorithm on the Adult dataset from the UCI machine learning repository, where we obtain comparable results to the non-partitioned case.
    Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality. (arXiv:2209.01173v1 [stat.ML])
    In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the weight decay regularizer grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.  ( 2 min )
    Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework. (arXiv:2111.01457v3 [cs.SD] UPDATED)
    Speech Neuroprostheses have the potential to enable communication for people with dysarthria or anarthria. Recent advances have demonstrated high-quality text decoding and speech synthesis from electrocorticographic grids placed on the cortical surface. Here, we investigate a less invasive measurement modality in three participants, namely stereotactic EEG (sEEG) that provides sparse sampling from multiple brain regions, including subcortical regions. To evaluate whether sEEG can also be used to synthesize high-quality audio from neural recordings, we employ a recurrent encoder-decoder model based on modern deep learning methods. We find that speech can indeed be reconstructed with correlations up to 0.8 from these minimally invasive recordings, despite limited amounts of training data.  ( 2 min )
    Examining average and discounted reward optimality criteria in reinforcement learning. (arXiv:2107.01348v2 [cs.LG] UPDATED)
    In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality criterion is fundamentally important. Two major optimality criteria are average and discounted rewards. While the latter is more popular, it is problematic to apply in environments without an inherent notion of discounting. This motivates us to revisit a) the progression of optimality criteria in dynamic programming, b) justification for and complication of an artificial discount factor, and c) benefits of directly maximizing the average reward criterion, which is discounting-free. Our contributions include a thorough examination of the relationship between average and discounted rewards, as well as a discussion of their pros and cons in RL. We emphasize that average-reward RL methods possess the ingredient and mechanism for applying a family of discounting-free optimality criteria (Veinott, 1969) to RL.  ( 2 min )
    Poincare: Recommending Publication Venues via Treatment Effect Estimation. (arXiv:2010.09157v2 [cs.DL] UPDATED)
    Choosing a publication venue for an academic paper is a crucial step in the research process. However, in many cases, decisions are based solely on the experience of researchers, which often leads to suboptimal results. Although there exist venue recommender systems for academic papers, they recommend venues where the paper is expected to be published. In this study, we aim to recommend publication venues from a different perspective. We estimate the number of citations a paper will receive if the paper is published in each venue and recommend the venue where the paper has the most potential impact. However, there are two challenges to this task. First, a paper is published in only one venue, and thus, we cannot observe the number of citations the paper would receive if the paper were published in another venue. Secondly, the contents of a paper and the publication venue are not statistically independent; that is, there exist selection biases in choosing publication venues. In this paper, we formulate the venue recommendation problem as a treatment effect estimation problem. We use a bias correction method to estimate the potential impact of choosing a publication venue effectively and to recommend venues based on the potential impact of papers in each venue. We highlight the effectiveness of our method using paper data from computer science conferences.  ( 3 min )
    GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy. (arXiv:2104.10569v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have been demonstrated as a powerful tool for analysing non-Euclidean graph data. However, the lack of efficient distributed graph learning (GL) systems severely hinders applications of GNNs, especially when graphs are big and GNNs are relatively deep. Herein, we present GraphTheta, a novel distributed and scalable GL system implemented in vertex-centric graph programming model. GraphTheta is the first GL system built upon distributed graph processing with neural network operators implemented as user-defined functions. This system supports multiple training strategies, and enables efficient and scalable big graph learning on distributed (virtual) machines with low memory each. To facilitate graph convolution implementations, GraphTheta puts forward a new GL abstraction named NN-TGAR to bridge the gap between graph processing and graph deep learning. A distributed graph engine is proposed to conduct the stochastic gradient descent optimization with a hybrid-parallel execution. Moreover, we add support for a new cluster-batched training strategy besides global-batch and mini-batch. We evaluate GraphTheta using a number of datasets with network size ranging from small-, modest- to large-scale. Experimental results show that GraphTheta can scale well to 1,024 workers for training an in-house developed GNN on an industry-scale Alipay dataset of 1.4 billion nodes and 4.1 billion attributed edges, with a cluster of CPU virtual machines (dockers) of small memory each (5$\sim$12GB). Moreover, GraphTheta obtains comparable or better prediction results than the state-of-the-art GNN implementations, demonstrating its capability of learning GNNs as well as existing frameworks, and can outperform DistDGL by up to $2.02\times$ with better scalability. To the best of our knowledge, this work presents the largest edge-attributed GNN learning task conducted in the literature.  ( 3 min )
    Clustering and Structural Robustness in Causal Diagrams. (arXiv:2111.04513v2 [stat.ML] UPDATED)
    Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagram, but it may erroneously change the essential properties of the causal relations if implemented arbitrarily. We define a specific type of cluster, called transit cluster, that is guaranteed to preserve the identifiability properties of causal effects under certain conditions. We provide a sound and complete algorithm for finding all transit clusters in a given graph and demonstrate how clustering can simplify the identification of causal effects. We also study the inverse problem, where one starts with a clustered graph and looks for extended graphs where the identifiability properties of causal effects remain unchanged. We show that this kind of structural robustness is closely related to transit clusters.  ( 2 min )
    A Framework for Supervised Heterogeneous Transfer Learning using Dynamic Distribution Adaptation and Manifold Regularization. (arXiv:2108.12293v2 [cs.LG] UPDATED)
    Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by transferring knowledge from the source domain having many labeled records. While existing methods often focus on one issue and leave the other one for the further work, TLF is capable of handling both issues simultaneously. In TLF, we alleviate feature discrepancy by identifying shared label distributions that act as the pivots to bridge the domains. We handle distribution divergence by simultaneously optimizing the structural risk functional, joint distributions between domains, and the manifold consistency underlying marginal distributions. Moreover, for the manifold consistency we exploit its intrinsic properties by identifying k nearest neighbors of a record, where the value of k is determined automatically in TLF. Furthermore, since negative transfer is not desired, we consider only the source records that are belonging to the source pivots during the knowledge transfer. We evaluate TLF on seven publicly available natural datasets and compare the performance of TLF against the performance of fourteen state-of-the-art techniques. We also evaluate the effectiveness of TLF in some challenging situations. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.  ( 3 min )
    Petals: Collaborative Inference and Fine-tuning of Large Models. (arXiv:2209.01188v1 [cs.LG])
    Many NLP tasks benefit from using large language models (LLMs) that often have more than 100 billion parameters. With the release of BLOOM-176B and OPT-175B, everyone can download pretrained models of this scale. Still, using these models requires high-end hardware unavailable to many researchers. In some cases, LLMs can be used more affordably via RAM offloading or hosted APIs. However, these techniques have innate limitations: offloading is too slow for interactive inference, while APIs are not flexible enough for research. In this work, we propose Petals $-$ a system for inference and fine-tuning of large models collaboratively by joining the resources of multiple parties trusted to process client's data. We demonstrate that this strategy significantly outperforms offloading for very large models, running inference of BLOOM-176B on consumer GPUs with $\approx$ 1 step per second. Unlike most inference APIs, Petals also natively exposes the hidden states of served models, allowing its users to train and share custom model extensions based on efficient fine-tuning methods.  ( 2 min )
    XFL: Naming Functions in Binaries with Extreme Multi-label Learning. (arXiv:2107.13404v3 [cs.CR] UPDATED)
    Reverse engineers benefit from the presence of identifiers such as function names in a binary, but usually these are removed for release. Training a machine learning model to predict function names automatically is promising but fundamentally hard: unlike words in natural language, most function names occur only once. In this paper, we address this problem by introducing eXtreme Function Labeling (XFL), an extreme multi-label learning approach to selecting appropriate labels for binary functions. XFL splits function names into tokens, treating each as an informative label akin to the problem of tagging texts in natural language. We relate the semantics of binary code to labels through DEXTER, a novel function embedding that combines static analysis-based features with local context from the call graph and global context from the entire binary. We demonstrate that XFL/DEXTER outperforms the state of the art in function labeling on a dataset of 10,047 binaries from the Debian project, achieving a precision of 83.5%. We also study combinations of XFL with alternative binary embeddings from the literature and show that DEXTER consistently performs best for this task. As a result, we demonstrate that binary function labeling can be effectively phrased in terms of multi-label learning, and that binary function embeddings benefit from including explicit semantic features.  ( 3 min )
    Pareto Navigation Gradient Descent: a First-Order Algorithm for Optimization in Pareto Set. (arXiv:2110.08713v2 [math.OC] UPDATED)
    Many modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions that may conflict with each other. The notion of the Pareto set allows us to focus on the set of (often infinite number of) models that cannot be strictly improved. But it does not provide an actionable procedure for picking one or a few special models to return to practical users. In this paper, we consider \emph{optimization in Pareto set (OPT-in-Pareto)}, the problem of finding Pareto models that optimize an extra reference criterion function within the Pareto set. This function can either encode a specific preference from the users, or represent a generic diversity measure for obtaining a set of diversified Pareto models that are representative of the whole Pareto set. Unfortunately, despite being a highly useful framework, efficient algorithms for OPT-in-Pareto have been largely missing, especially for large-scale, non-convex, and non-linear objectives in deep learning. A naive approach is to apply Riemannian manifold gradient descent on the Pareto set, which yields a high computational cost due to the need for eigen-calculation of Hessian matrices. We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information, with both high practical efficiency and theoretically guaranteed convergence property. Empirically, we demonstrate that our method works efficiently for a variety of challenging multi-task-related problems.  ( 3 min )
    Deep Learning-based Patient Re-identification Is able to Exploit the Biometric Nature of Medical Chest X-ray Data. (arXiv:2103.08562v4 [cs.CV] UPDATED)
    With the rise and ever-increasing potential of deep learning techniques in recent years, publicly available medical datasets became a key factor to enable reproducible development of diagnostic algorithms in the medical domain. Medical data contains sensitive patient-related information and is therefore usually anonymized by removing patient identifiers, e.g., patient names before publication. To the best of our knowledge, we are the first to show that a well-trained deep learning system is able to recover the patient identity from chest X-ray data. We demonstrate this using the publicly available large-scale ChestX-ray14 dataset, a collection of 112,120 frontal-view chest X-ray images from 30,805 unique patients. Our verification system is able to identify whether two frontal chest X-ray images are from the same person with an AUC of 0.9940 and a classification accuracy of 95.55%. We further highlight that the proposed system is able to reveal the same person even ten and more years after the initial scan. When pursuing a retrieval approach, we observe an mAP@R of 0.9748 and a precision@1 of 0.9963. Furthermore, we achieve an AUC of up to 0.9870 and a precision@1 of up to 0.9444 when evaluating our trained networks on external datasets such as CheXpert and the COVID-19 Image Data Collection. Based on this high identification rate, a potential attacker may leak patient-related information and additionally cross-reference images to obtain more information. Thus, there is a great risk of sensitive content falling into unauthorized hands or being disseminated against the will of the concerned patients. Especially during the COVID-19 pandemic, numerous chest X-ray datasets have been published to advance research. Therefore, such data may be vulnerable to potential attacks by deep learning-based re-identification algorithms.  ( 3 min )
    Support vector machines and Radon's theorem. (arXiv:2011.00617v3 [cs.LG] UPDATED)
    A support vector machine (SVM) is an algorithm that finds a hyperplane which optimally separates labeled data points in $\mathbb{R}^n$ into positive and negative classes. The data points on the margin of this separating hyperplane are called support vectors. We connect the possible configurations of support vectors to Radon's theorem, which provides guarantees for when a set of points can be divided into two classes (positive and negative) whose convex hulls intersect. If the convex hulls of the positive and negative support vectors are projected onto a separating hyperplane, then the projections intersect if and only if the hyperplane is optimal. Further, with a particular type of general position, we show that (a) the projected convex hulls of the support vectors intersect in exactly one point, (b) the support vectors are stable under perturbation, (c) there are at most $n+1$ support vectors, and (d) every number of support vectors from 2 up to $n+1$ is possible. Finally, we perform computer simulations studying the expected number of support vectors, and their configurations, for randomly generated data. We observe that as the distance between classes of points increases for this type of randomly generated data, configurations with fewer support vectors become more likely.  ( 3 min )
    Co-Imitation: Learning Design and Behaviour by Imitation. (arXiv:2209.01207v1 [cs.LG])
    The co-adaptation of robots has been a long-standing research endeavour with the goal of adapting both body and behaviour of a system for a given task, inspired by the natural evolution of animals. Co-adaptation has the potential to eliminate costly manual hardware engineering as well as improve the performance of systems. The standard approach to co-adaptation is to use a reward function for optimizing behaviour and morphology. However, defining and constructing such reward functions is notoriously difficult and often a significant engineering effort. This paper introduces a new viewpoint on the co-adaptation problem, which we call co-imitation: finding a morphology and a policy that allow an imitator to closely match the behaviour of a demonstrator. To this end we propose a co-imitation methodology for adapting behaviour and morphology by matching state distributions of the demonstrator. Specifically, we focus on the challenging scenario with mismatched state- and action-spaces between both agents. We find that co-imitation increases behaviour similarity across a variety of tasks and settings, and demonstrate co-imitation by transferring human walking, jogging and kicking skills onto a simulated humanoid.  ( 2 min )
    First Hitting Diffusion Models. (arXiv:2209.01170v1 [cs.CV])
    We propose a family of First Hitting Diffusion Models (FHDM), deep generative models that generate data with a diffusion process that terminates at a random first hitting time. This yields an extension of the standard fixed-time diffusion models that terminate at a pre-specified deterministic time. Although standard diffusion models are designed for continuous unconstrained data, FHDM is naturally designed to learn distributions on continuous as well as a range of discrete and structure domains. Moreover, FHDM enables instance-dependent terminate time and accelerates the diffusion process to sample higher quality data with fewer diffusion steps. Technically, we train FHDM by maximum likelihood estimation on diffusion trajectories augmented from observed data with conditional first hitting processes (i.e., bridge) derived based on Doob's $h$-transform, deviating from the commonly used time-reversal mechanism. We apply FHDM to generate data in various domains such as point cloud (general continuous distribution), climate and geographical events on earth (continuous distribution on the sphere), unweighted graphs (distribution of binary matrices), and segmentation maps of 2D images (high-dimensional categorical distribution). We observe considerable improvement compared with the state-of-the-art approaches in both quality and speed.  ( 2 min )
    Adaptive Graph Diffusion Networks. (arXiv:2012.15024v2 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have received much attention in the graph deep learning domain. However, recent research empirically and theoretically shows that deep GNNs suffer from over-fitting and over-smoothing problems. The usual solutions either cannot solve extensive runtime of deep GNNs or restrict graph convolution in the same feature space. We propose the Adaptive Graph Diffusion Networks (AGDNs) which perform multi-layer generalized graph diffusion in different feature spaces with moderate complexity and runtime. Standard graph diffusion methods combine large and dense powers of the transition matrix with predefined weighting coefficients. Instead, AGDNs combine smaller multi-hop node representations with learnable and generalized weighting coefficients. We propose two scalable mechanisms of weighting coefficients to capture multi-hop information: Hop-wise Attention (HA) and Hop-wise Convolution (HC). We evaluate AGDNs on diverse, challenging Open Graph Benchmark (OGB) datasets with semi-supervised node classification and link prediction tasks. Until the date of submission (Aug 26, 2022), AGDNs achieve top-1 performance on the ogbn-arxiv, ogbn-proteins and ogbl-ddi datasets and top-3 performance on the ogbl-citation2 dataset. On the similar Tesla V100 GPU cards, AGDNs outperform Reversible GNNs (RevGNNs) with 13% complexity and 1% training runtime of RevGNNs on the ogbn-proteins dataset. AGDNs also achieve comparable performance to SEAL with 36% training and 0.2% inference runtime of SEAL on the ogbl-citation2 dataset.  ( 3 min )
    Revisiting Outer Optimization in Adversarial Training. (arXiv:2209.01199v1 [cs.LG])
    Despite the fundamental distinction between adversarial and natural training (AT and NT), AT methods generally adopt momentum SGD (MSGD) for the outer optimization. This paper aims to analyze this choice by investigating the overlooked role of outer optimization in AT. Our exploratory evaluations reveal that AT induces higher gradient norm and variance compared to NT. This phenomenon hinders the outer optimization in AT since the convergence rate of MSGD is highly dependent on the variance of the gradients. To this end, we propose an optimization method called ENGM which regularizes the contribution of each input example to the average mini-batch gradients. We prove that the convergence rate of ENGM is independent of the variance of the gradients, and thus, it is suitable for AT. We introduce a trick to reduce the computational cost of ENGM using empirical observations on the correlation between the norm of gradients w.r.t. the network parameters and input examples. Our extensive evaluations and ablation studies on CIFAR-10, CIFAR-100, and TinyImageNet demonstrate that ENGM and its variants consistently improve the performance of a wide range of AT methods. Furthermore, ENGM alleviates major shortcomings of AT including robust overfitting and high sensitivity to hyperparameter settings.  ( 2 min )
    Hierarchical Relational Learning for Few-Shot Knowledge Graph Completion. (arXiv:2209.01205v1 [cs.LG])
    Knowledge graphs (KGs) are known for their large scale and knowledge inference ability, but are also notorious for the incompleteness associated with them. Due to the long-tail distribution of the relations in KGs, few-shot KG completion has been proposed as a solution to alleviate incompleteness and expand the coverage of KGs. It aims to make predictions for triplets involving novel relations when only a few training triplets are provided as reference. Previous methods have mostly focused on designing local neighbor aggregators to learn entity-level information and/or imposing sequential dependency assumption at the triplet level to learn meta relation information. However, valuable pairwise triplet-level interactions and context-level relational information have been largely overlooked for learning meta representations of few-shot relations. In this paper, we propose a hierarchical relational learning method (HiRe) for few-shot KG completion. By jointly capturing three levels of relational information (entity-level, triplet-level and context-level), HiRe can effectively learn and refine the meta representation of few-shot relations, and consequently generalize very well to new unseen relations. Extensive experiments on two benchmark datasets validate the superiority of HiRe against other state-of-the-art methods.  ( 2 min )
    Estimation of Correlation Matrices from Limited time series Data using Machine Learning. (arXiv:2209.01198v1 [cs.LG])
    Prediction of correlation matrices from given time series data has several applications for a range of problems, such as inferring neuronal connections from spiking data, deducing causal dependencies between genes from expression data, and discovering long spatial range influences in climate variations. Traditional methods of predicting correlation matrices utilize time series data of all the nodes of the underlying networks. Here, we use a supervised machine learning technique to predict the correlation matrix of entire systems from finite time series information of a few randomly selected nodes. The accuracy of the prediction from the model confirms that only a limited time series of a subset of the entire system is enough to make good correlation matrix predictions. Furthermore, using an unsupervised learning algorithm, we provide insights into the success of the predictions from our model. Finally, we apply the machine learning model developed here to real-world data sets.  ( 2 min )
    No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium. (arXiv:2004.00603v5 [cs.GT] UPDATED)
    The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point.  ( 3 min )
    "More Than Words": Linking Music Preferences and Moral Values Through Lyrics. (arXiv:2209.01169v1 [cs.CY])
    This study explores the association between music preferences and moral values by applying text analysis techniques to lyrics. Harvesting data from a Facebook-hosted application, we align psychometric scores of 1,386 users to lyrics from the top 5 songs of their preferred music artists as emerged from Facebook Page Likes. We extract a set of lyrical features related to each song's overarching narrative, moral valence, sentiment, and emotion. A machine learning framework was designed to exploit regression approaches and evaluate the predictive power of lyrical features for inferring moral values. Results suggest that lyrics from top songs of artists people like inform their morality. Virtues of hierarchy and tradition achieve higher prediction scores ($.20 \leq r \leq .30$) than values of empathy and equality ($.08 \leq r \leq .11$), while basic demographic variables only account for a small part in the models' explainability. This shows the importance of music listening behaviours, as assessed via lyrical preferences, alone in capturing moral values. We discuss the technological and musicological implications and possible future improvements.  ( 2 min )
    Multi-Step Prediction in Linearized Latent State Spaces for Representation Learning. (arXiv:2209.01127v1 [cs.LG])
    In this paper, we derive a novel method as a generalization over LCEs such as E2C. The method develops the idea of learning a locally linear state space, by adding a multi-step prediction, thus allowing for more explicit control over the curvature. We show, that the method outperforms E2C without drastic model changes which come with other works, such as PCC and P3C. We discuss the relation between E2C and the presented method and derived update equations. We provide empirical evidence, which suggests that by considering the multi-step prediction our method - ms-E2C - allows to learn much better latent state spaces in terms of curvature and next state predictability. Finally, we also discuss certain stability challenges we encounter with multi-step predictions and the ways to mitigate them.
    A lightweight hybrid CNN-LSTM model for ECG-based arrhythmia detection. (arXiv:2209.00988v1 [eess.SP])
    Electrocardiogram (ECG) is the most frequent and routine diagnostic tool used for monitoring heart electrical signals and evaluating its functionality. The human heart can suffer from a variety of diseases, including cardiac arrhythmias. Arrhythmia is an irregular heart rhythm that in severe cases can lead to heart stroke and can be diagnosed via ECG recordings. Since early detection of cardiac arrhythmias is of great importance, computerized and automated classification and identification of these abnormal heart signals have received much attention for the past decades. Methods: This paper introduces a light deep learning approach for high accuracy detection of 8 different cardiac arrhythmias and normal rhythm. To leverage deep learning method, resampling and baseline wander removal techniques are applied to ECG signals. In this study, 500 sample ECG segments were used as model inputs. The rhythm classification was done by an 11-layer network in an end-to-end manner without the need for hand-crafted manual feature extraction. Results: In order to evaluate the proposed technique, ECG signals are chosen from the two physionet databases, the MIT-BIH arrhythmia database and the long-term AF database. The proposed deep learning framework based on the combination of Convolutional Neural Network(CNN) and Long Short Term Memory (LSTM) showed promising results than most of the state-of-the-art methods. The proposed method reaches the mean diagnostic accuracy of 98.24%. Conclusion: A trained model for arrhythmia classification using diverse ECG signals were successfully developed and tested. Significance: Since the present work uses a light classification technique with high diagnostic accuracy compared to other notable methods, it could successfully be implemented in holter monitor devices for arrhythmia detection.
    Reducing The Amortization Gap of Entropy Bottleneck In End-to-End Image Compression. (arXiv:2209.00964v1 [eess.IV])
    End-to-end deep trainable models are about to exceed the performance of the traditional handcrafted compression techniques on videos and images. The core idea is to learn a non-linear transformation, modeled as a deep neural network, mapping input image into latent space, jointly with an entropy model of the latent distribution. The decoder is also learned as a deep trainable network, and the reconstructed image measures the distortion. These methods enforce the latent to follow some prior distributions. Since these priors are learned by optimization over the entire training set, the performance is optimal in average. However, it cannot fit exactly on every single new instance, hence damaging the compression performance by enlarging the bit-stream. In this paper, we propose a simple yet efficient instance-based parameterization method to reduce this amortization gap at a minor cost. The proposed method is applicable to any end-to-end compressing methods, improving the compression bitrate by 1% without any impact on the reconstruction quality.
    Classification of eye-state using EEG recordings: speed-up gains using signal epochs and mutual information measure. (arXiv:2209.01023v1 [eess.SP])
    The classification of electroencephalography (EEG) signals is useful in a wide range of applications such as seizure detection/prediction, motor imagery classification, emotion classification and drug effects diagnosis, amongst others. With the large number of EEG channels acquired, it has become vital that efficient data-reduction methods are developed, with varying importance from one application to another. It is also important that online classification is achieved during EEG recording for many applications, to monitor changes as they happen. In this paper we introduce a method based on Mutual Information (MI), for channel selection. Obtained results show that whilst there is a penalty on classification accuracy scores, promising speed-up gains can be achieved using MI techniques. Using MI with signal epochs (3secs) containing signal transitions enhances these speed-up gains. This work is exploratory and we suggest further research to be carried out for validation and development. Benefits to improving classification speed include improving application in clinical or educational settings.  ( 2 min )
    SATformer: Transformers for SAT Solving. (arXiv:2209.00953v1 [cs.AI])
    In this paper, we propose SATformer, a novel Transformer-based solution for Boolean satisfiability (SAT) solving. Different from existing learning-based SAT solvers that learn at the problem instance level, SATformer learns the minimum unsatisfiable cores (MUC) of unsatisfiable problem instances, which provide rich information for the causality of such problems. Specifically, we apply a graph neural network (GNN) to obtain the embeddings of the clauses in the conjunctive normal format (CNF). A hierarchical Transformer architecture is applied on the clause embeddings to capture the relationships among clauses, and the self-attention weight is learned to be high when those clauses forming UNSAT cores are attended together, and set to be low otherwise. By doing so, SATformer effectively learns the correlations among clauses for SAT prediction. Experimental results show that SATformer is more powerful than existing end-to-end learning-based SAT solvers.  ( 2 min )
    IMG2IMU: Applying Knowledge from Large-Scale Images to IMU Applications via Contrastive Learning. (arXiv:2209.00945v1 [cs.LG])
    Recent advances in machine learning showed that pre-training representations acquired via self-supervised learning could achieve high accuracy on tasks with small training data. Unlike in vision and natural language processing domains, such pre-training for IMU-based applications is challenging, as there are only a few publicly available datasets with sufficient size and diversity to learn generalizable representations. To overcome this problem, we propose IMG2IMU, a novel approach that adapts pre-train representation from large-scale images to diverse few-shot IMU sensing tasks. We convert the sensor data into visually interpretable spectrograms for the model to utilize the knowledge gained from vision. Further, we apply contrastive learning on an augmentation set we designed to learn representations that are tailored to interpreting sensor data. Our extensive evaluations on five different IMU sensing tasks show that IMG2IMU consistently outperforms the baselines, illustrating that vision knowledge can be incorporated into a few-shot learning environment for IMU sensing tasks.  ( 2 min )
    A Class-Aware Representation Refinement Framework for Graph Classification. (arXiv:2209.00936v1 [cs.LG])
    Graph Neural Networks (GNNs) are widely used for graph representation learning. Despite its prevalence, GNN suffers from two drawbacks in the graph classification task, the neglect of graph-level relationships, and the generalization issue. Each graph is treated separately in GNN message passing/graph pooling, and existing methods to address overfitting operate on each individual graph. This makes the graph representations learnt less effective in the downstream classification. In this paper, we propose a Class-Aware Representation rEfinement (CARE) framework for the task of graph classification. CARE computes simple yet powerful class representations and injects them to steer the learning of graph representations towards better class separability. CARE is a plug-and-play framework that is highly flexible and able to incorporate arbitrary GNN backbones without significantly increasing the computational cost. We also theoretically prove that CARE has a better generalization upper bound than its GNN backbone through Vapnik-Chervonenkis (VC) dimension analysis. Our extensive experiments with 10 well-known GNN backbones on 9 benchmark datasets validate the superiority and effectiveness of CARE over its GNN counterparts.  ( 2 min )
    Tweaking Metasploit to Evade Encrypted C2 Traffic Detection. (arXiv:2209.00943v1 [cs.CR])
    Command and Control (C2) communication is a key component of any structured cyber-attack. As such, security operations actively try to detect this type of communication in their networks. This poses a problem for legitimate pentesters that try to remain undetected, since commonly used pentesting tools, such as Metasploit, generate constant traffic patterns that are easily distinguishable from regular web traffic. In this paper we start with these identifiable patterns in Metasploit's C2 traffic and show that a machine learning-based detector is able to detect the presence of such traffic with high accuracy, even when encrypted. We then outline and implement a set of modifications to the Metasploit framework in order to decrease the detection rates of such classifier. To evaluate the performance of these modifications, we use two threat models with increasing awareness of these modifications. We look at the detection evasion performance and at the byte count and runtime overhead of the modifications. Our results show that for the second, increased-awareness threat model the framework-side traffic modifications yield a better detection avoidance rate (90%) than payload-side only modifications (50%). We also show that although the modifications use up to 3 times more TLS payload bytes than the original, the runtime does not significantly change and the total number of bytes (including TLS payload) reduces.  ( 3 min )
    Mind the Gap! Injecting Commonsense Knowledge for Abstractive Dialogue Summarization. (arXiv:2209.00930v1 [cs.CL])
    In this paper, we propose to leverage the unique characteristics of dialogues sharing commonsense knowledge across participants, to resolve the difficulties in summarizing them. We present SICK, a framework that uses commonsense inferences as additional context. Compared to previous work that solely relies on the input dialogue, SICK uses an external knowledge model to generate a rich set of commonsense inferences and selects the most probable one with a similarity-based selection method. Built upon SICK, SICK++ utilizes commonsense as supervision, where the task of generating commonsense inferences is added upon summarizing the dialogue in a multi-task learning setting. Experimental results show that with injected commonsense knowledge, our framework generates more informative and consistent summaries than existing methods.  ( 2 min )
    TB or not TB? Acoustic cough analysis for tuberculosis classification. (arXiv:2209.00934v1 [eess.AS])
    In this work, we explore recurrent neural network architectures for tuberculosis (TB) cough classification. In contrast to previous unsuccessful attempts to implement deep architectures in this domain, we show that a basic bidirectional long short-term memory network (BiLSTM) can achieve improved performance. In addition, we show that by performing greedy feature selection in conjunction with a newly-proposed attention-based architecture that learns patient invariant features, substantially better generalisation can be achieved compared to a baseline and other considered architectures. Furthermore, this attention mechanism allows an inspection of the temporal regions of the audio signal considered to be important for classification to be performed. Finally, we develop a neural style transfer technique to infer idealised inputs which can subsequently be analysed. We find distinct differences between the idealised power spectra of TB and non-TB coughs, which provide clues about the origin of the features in the audio signal.  ( 2 min )
    An Introduction to Machine Unlearning. (arXiv:2209.00939v1 [cs.LG])
    Removing the influence of a specified subset of training data from a machine learning model may be required to address issues such as privacy, fairness, and data quality. Retraining the model from scratch on the remaining data after removal of the subset is an effective but often infeasible option, due to its computational expense. The past few years have therefore seen several novel approaches towards efficient removal, forming the field of "machine unlearning", however, many aspects of the literature published thus far are disparate and lack consensus. In this paper, we summarise and compare seven state-of-the-art machine unlearning algorithms, consolidate definitions of core concepts used in the field, reconcile different approaches for evaluating algorithms, and discuss issues related to applying machine unlearning in practice.  ( 2 min )
    Multi-modal Contrastive Representation Learning for Entity Alignment. (arXiv:2209.00891v1 [cs.CL])
    Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs, which consist of structural triples and images associated with entities. Most previous works focus on how to utilize and encode information from different modalities, while it is not trivial to leverage multi-modal knowledge in entity alignment because of the modality heterogeneity. In this paper, we propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model, to obtain effective joint representations for multi-modal entity alignment. Different from previous works, MCLEA considers task-oriented modality and models the inter-modal relationships for each entity representation. In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions. Extensive experimental results show that MCLEA outperforms state-of-the-art baselines on public datasets under both supervised and unsupervised settings.  ( 2 min )
    Introducing dynamical constraints into representation learning. (arXiv:2209.00905v1 [cs.LG])
    While representation learning has been central to the rise of machine learning and artificial intelligence, a key problem remains in making the learnt representations meaningful. For this the typical approach is to regularize the learned representation through prior probability distributions. However such priors are usually unavailable or ad hoc. To deal with this, we propose a dynamics-constrained representation learning framework. Instead of using predefined probabilities, we restrict the latent representation to follow specific dynamics, which is a more natural constraint for representation learning in dynamical systems. Our belief stems from a fundamental observation in physics that though different systems can have different marginalized probability distributions, they typically obey the same dynamics, such as Newton's and Schrodinger's equations. We validate our framework for different systems including a real-world fluorescent DNA movie dataset. We show that our algorithm can uniquely identify an uncorrelated, isometric and meaningful latent representation.  ( 3 min )
    Optimistic Optimization of Gaussian Process Samples. (arXiv:2209.00895v1 [cs.LG])
    Bayesian optimization is a popular formalism for global optimization, but its computational costs limit it to expensive-to-evaluate functions. A competing, computationally more efficient, global optimization framework is optimistic optimization, which exploits prior knowledge about the geometry of the search space in form of a dissimilarity function. We investigate to which degree the conceptual advantages of Bayesian Optimization can be combined with the computational efficiency of optimistic optimization. By mapping the kernel to a dissimilarity, we obtain an optimistic optimization algorithm for the Bayesian Optimization setting with a run-time of up to $\mathcal{O}(N \log N)$. As a high-level take-away we find that, when using stationary kernels on objectives of relatively low evaluation cost, optimistic optimization can be strongly preferable over Bayesian optimization, while for strongly coupled and parametric models, good implementations of Bayesian optimization can perform much better, even at low evaluation cost. We argue that there is a new research domain between geometric and probabilistic search, i.e. methods that run drastically faster than traditional Bayesian optimization, while retaining some of the crucial functionality of Bayesian optimization.  ( 2 min )
    Detection of diabetic retinopathy using longitudinal self-supervised learning. (arXiv:2209.00915v1 [cs.CV])
    Longitudinal imaging is able to capture both static anatomical structures and dynamic changes in disease progression towards earlier and better patient-specific pathology management. However, conventional approaches for detecting diabetic retinopathy (DR) rarely take advantage of longitudinal information to improve DR analysis. In this work, we investigate the benefit of exploiting self-supervised learning with a longitudinal nature for DR diagnosis purposes. We compare different longitudinal self-supervised learning (LSSL) methods to model the disease progression from longitudinal retinal color fundus photographs (CFP) to detect early DR severity changes using a pair of consecutive exams. The experiments were conducted on a longitudinal DR screening dataset with or without those trained encoders (LSSL) acting as a longitudinal pretext task. Results achieve an AUC of 0.875 for the baseline (model trained from scratch) and an AUC of 0.96 (95% CI: 0.9593-0.9655 DeLong test) with a p-value < 2.2e-16 on early fusion using a simple ResNet alike architecture with frozen LSSL weights, suggesting that the LSSL latent space enables to encode the dynamic of DR progression.  ( 2 min )
    Instance-Dependent Noisy Label Learning via Graphical Modelling. (arXiv:2209.00906v1 [cs.CV])
    Noisy labels are unavoidable yet troublesome in the ecosystem of deep learning because models can easily overfit them. There are many types of label noise, such as symmetric, asymmetric and instance-dependent noise (IDN), with IDN being the only type that depends on image information. Such dependence on image information makes IDN a critical type of label noise to study, given that labelling mistakes are caused in large part by insufficient or ambiguous information about the visual classes present in images. Aiming to provide an effective technique to address IDN, we present a new graphical modelling approach called InstanceGM, that combines discriminative and generative models. The main contributions of InstanceGM are: i) the use of the continuous Bernoulli distribution to train the generative model, offering significant training advantages, and ii) the exploration of a state-of-the-art noisy-label discriminative classifier to generate clean labels from instance-dependent noisy-label samples. InstanceGM is competitive with current noisy-label learning approaches, particularly in IDN benchmarks using synthetic and real-world datasets, where our method shows better accuracy than the competitors in most experiments.  ( 2 min )
    A Discussion of Discrimination and Fairness in Insurance Pricing. (arXiv:2209.00858v1 [cs.LG])
    Indirect discrimination is an issue of major concern in algorithmic models. This is particularly the case in insurance pricing where protected policyholder characteristics are not allowed to be used for insurance pricing. Simply disregarding protected policyholder information is not an appropriate solution because this still allows for the possibility of inferring the protected characteristics from the non-protected ones. This leads to so-called proxy or indirect discrimination. Though proxy discrimination is qualitatively different from the group fairness concepts in machine learning, these group fairness concepts are proposed to 'smooth out' the impact of protected characteristics in the calculation of insurance prices. The purpose of this note is to share some thoughts about group fairness concepts in the light of insurance pricing and to discuss their implications. We present a statistical model that is free of proxy discrimination, thus, unproblematic from an insurance pricing point of view. However, we find that the canonical price in this statistical model does not satisfy any of the three most popular group fairness axioms. This seems puzzling and we welcome feedback on our example and on the usefulness of these group fairness axioms for non-discriminatory insurance pricing.  ( 2 min )
    PulseDL-II: A System-on-Chip Neural Network Accelerator for Timing and Energy Extraction of Nuclear Detector Signals. (arXiv:2209.00884v1 [physics.ins-det])
    Front-end electronics equipped with high-speed digitizers are being used and proposed for future nuclear detectors. Recent literature reveals that deep learning models, especially one-dimensional convolutional neural networks, are promising when dealing with digital signals from nuclear detectors. Simulations and experiments demonstrate the satisfactory accuracy and additional benefits of neural networks in this area. However, specific hardware accelerating such models for online operations still needs to be studied. In this work, we introduce PulseDL-II, a system-on-chip (SoC) specially designed for applications of event feature (time, energy, etc.) extraction from pulses with deep learning. Based on the previous version, PulseDL-II incorporates a RISC CPU into the system structure for better functional flexibility and integrity. The neural network accelerator in the SoC adopts a three-level (arithmetic unit, processing element, neural network) hierarchical architecture and facilitates parameter optimization of the digital design. Furthermore, we devise a quantization scheme and associated implementation methods (rescale & bit-shift) for full compatibility with deep learning frameworks (e.g., TensorFlow) within a selected subset of layer types. With the current scheme, the quantization-aware training of neural networks is supported, and network models are automatically transformed into software of RISC CPU by dedicated scripts, with nearly no loss of accuracy. We validate PulseDL-II on field programmable gate arrays (FPGA). Finally, system validation is done with an experimental setup made up of a direct digital synthesis (DDS) signal generator and an FPGA development board with analog-to-digital converters (ADC). The proposed system achieved 60 ps time resolution and 0.40% energy resolution with online neural network inference at signal to noise ratio (SNR) of 47.4 dB.  ( 3 min )
    Regret Analysis of Dyadic Search. (arXiv:2209.00885v1 [cs.LG])
    We analyze the cumulative regret of the Dyadic Search algorithm of Bachoc et al. [2022].  ( 2 min )
    Three Learning Stages and Accuracy-Efficiency Tradeoff of Restricted Boltzmann Machines. (arXiv:2209.00873v1 [cs.LG])
    Restricted Boltzmann Machines (RBMs) offer a versatile architecture for unsupervised machine learning that can in principle approximate any target probability distribution with arbitrary accuracy. However, the RBM model is usually not directly accessible due to its computational complexity, and Markov-chain sampling is invoked to analyze the learned probability distribution. For training and eventual applications, it is thus desirable to have a sampler that is both accurate and efficient. We highlight that these two goals generally compete with each other and cannot be achieved simultaneously. More specifically, we identify and quantitatively characterize three regimes of RBM learning: independent learning, where the accuracy improves without losing efficiency; correlation learning, where higher accuracy entails lower efficiency; and degradation, where both accuracy and efficiency no longer improve or even deteriorate. These findings are based on numerical experiments and heuristic arguments.  ( 2 min )
    Diffusion-based Molecule Generation with Informative Prior Bridges. (arXiv:2209.00865v1 [cs.LG])
    AI-based molecule generation provides a promising approach to a large area of biomedical sciences and engineering, such as antibody design, hydrolase engineering, or vaccine development. Because the molecules are governed by physical laws, a key challenge is to incorporate prior information into the training procedure to generate high-quality and realistic molecules. We propose a simple and novel approach to steer the training of diffusion-based generative models with physical and statistics prior information. This is achieved by constructing physically informed diffusion bridges, stochastic processes that guarantee to yield a given observation at the fixed terminal time. We develop a Lyapunov function based method to construct and determine bridges, and propose a number of proposals of informative prior bridges for both high-quality molecule generation and uniformity-promoted 3D point cloud generation. With comprehensive experiments, we show that our method provides a powerful approach to the 3D generation task, yielding molecule structures with better quality and stability scores and more uniformly distributed point clouds of high qualities.  ( 2 min )
    Diffusion Models: A Comprehensive Survey of Methods and Applications. (arXiv:2209.00796v1 [cs.LG])
    Diffusion models are a class of deep generative models that have shown impressive results on various tasks with dense theoretical founding. Although diffusion models have achieved impressive quality and diversity of sample synthesis than other state-of-the-art models, they still suffer from costly sampling procedure and sub-optimal likelihood estimation. Recent studies have shown great enthusiasm on improving the performance of diffusion model. In this article, we present a first comprehensive review of existing variants of the diffusion models. Specifically, we provide a first taxonomy of diffusion models and categorize them variants to three types, namely sampling-acceleration enhancement, likelihood-maximization enhancement and data-generalization enhancement. We also introduce in detail other five generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models), and clarify the connections between diffusion models and these generative models. Then we make a thorough investigation into the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of this generative model.  ( 2 min )
    Human Activity Recognition on Microcontrollers with Quantized and Adaptive Deep Neural Networks. (arXiv:2209.00839v1 [cs.LG])
    Human Activity Recognition (HAR) based on inertial data is an increasingly diffused task on embedded devices, from smartphones to ultra low-power sensors. Due to the high computational complexity of deep learning models, most embedded HAR systems are based on simple and not-so-accurate classic machine learning algorithms. This work bridges the gap between on-device HAR and deep learning, proposing a set of efficient one-dimensional Convolutional Neural Networks (CNNs) deployable on general purpose microcontrollers (MCUs). Our CNNs are obtained combining hyper-parameters optimization with sub-byte and mixed-precision quantization, to find good trade-offs between classification results and memory occupation. Moreover, we also leverage adaptive inference as an orthogonal optimization to tune the inference complexity at runtime based on the processed input, hence producing a more flexible HAR system. With experiments on four datasets, and targeting an ultra-low-power RISC-V MCU, we show that (i) We are able to obtain a rich set of Pareto-optimal CNNs for HAR, spanning more than 1 order of magnitude in terms of memory, latency and energy consumption; (ii) Thanks to adaptive inference, we can derive >20 runtime operating modes starting from a single CNN, differing by up to 10% in classification scores and by more than 3x in inference complexity, with a limited memory overhead; (iii) on three of the four benchmarks, we outperform all previous deep learning methods, reducing the memory occupation by more than 100x. The few methods that obtain better performance (both shallow and deep) are not compatible with MCU deployment. (iv) All our CNNs are compatible with real-time on-device HAR with an inference latency <16ms. Their memory occupation varies in 0.05-23.17 kB, and their energy consumption in 0.005 and 61.59 uJ, allowing years of continuous operation on a small battery supply.  ( 3 min )
    Rethinking Efficiency and Redundancy in Training Large-scale Graphs. (arXiv:2209.00800v1 [cs.LG])
    Large-scale graphs are ubiquitous in real-world scenarios and can be trained by Graph Neural Networks (GNNs) to generate representation for downstream tasks. Given the abundant information and complex topology of a large-scale graph, we argue that redundancy exists in such graphs and will degrade the training efficiency. Unfortunately, the model scalability severely restricts the efficiency of training large-scale graphs via vanilla GNNs. Despite recent advances in sampling-based training methods, sampling-based GNNs generally overlook the redundancy issue. It still takes intolerable time to train these models on large-scale graphs. Thereby, we propose to drop redundancy and improve efficiency of training large-scale graphs with GNNs, by rethinking the inherent characteristics in a graph. In this paper, we pioneer to propose a once-for-all method, termed DropReef, to drop the redundancy in large-scale graphs. Specifically, we first conduct preliminary experiments to explore potential redundancy in large-scale graphs. Next, we present a metric to quantify the neighbor heterophily of all nodes in a graph. Based on both experimental and theoretical analysis, we reveal the redundancy in a large-scale graph, i.e., nodes with high neighbor heterophily and a great number of neighbors. Then, we propose DropReef to detect and drop the redundancy in large-scale graphs once and for all, helping reduce the training time while ensuring no sacrifice in the model accuracy. To demonstrate the effectiveness of DropReef, we apply it to recent state-of-the-art sampling-based GNNs for training large-scale graphs, owing to the high precision of such models. With DropReef leveraged, the training efficiency of models can be greatly promoted. DropReef is highly compatible and is offline performed, benefiting the state-of-the-art sampling-based GNNs in the present and future to a significant extent.  ( 3 min )
    An Explainer for Temporal Graph Neural Networks. (arXiv:2209.00807v1 [cs.LG])
    Temporal graph neural networks (TGNNs) have been widely used for modeling time-evolving graph-related tasks due to their ability to capture both graph topology dependency and non-linear temporal dynamic. The explanation of TGNNs is of vital importance for a transparent and trustworthy model. However, the complex topology structure and temporal dependency make explaining TGNN models very challenging. In this paper, we propose a novel explainer framework for TGNN models. Given a time series on a graph to be explained, the framework can identify dominant explanations in the form of a probabilistic graphical model in a time period. Case studies on the transportation domain demonstrate that the proposed approach can discover dynamic dependency structures in a road network for a time period.  ( 2 min )
    TarGF: Learning Target Gradient Field for Object Rearrangement. (arXiv:2209.00853v1 [cs.LG])
    Object Rearrangement is to move objects from an initial state to a goal state. Here, we focus on a more practical setting in object rearrangement, i.e., rearranging objects from shuffled layouts to a normative target distribution without explicit goal specification. However, it remains challenging for AI agents, as it is hard to describe the target distribution (goal specification) for reward engineering or collect expert trajectories as demonstrations. Hence, it is infeasible to directly employ reinforcement learning or imitation learning algorithms to address the task. This paper aims to search for a policy only with a set of examples from a target distribution instead of a handcrafted reward function. We employ the score-matching objective to train a Target Gradient Field (TarGF), indicating a direction on each object to increase the likelihood of the target distribution. For object rearrangement, the TarGF can be used in two ways: 1) For model-based planning, we can cast the target gradient into a reference control and output actions with a distributed path planner; 2) For model-free reinforcement learning, the TarGF is not only used for estimating the likelihood-change as a reward but also provides suggested actions in residual policy learning. Experimental results in ball rearrangement and room rearrangement demonstrate that our method significantly outperforms the state-of-the-art methods in the quality of the terminal state, the efficiency of the control process, and scalability. The code and demo videos are on our project website.  ( 3 min )
    Optimal Diagonal Preconditioning: Theory and Practice. (arXiv:2209.00809v1 [math.OC])
    Preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the matrix it is applied to, thereby speeding up convergence of optimization algorithms. Although there are many popular preconditioning techniques in practice, most lack theoretical guarantees for reductions in condition number. In this paper, we study the problem of optimal diagonal preconditioning to achieve maximal reduction in the condition number of any full-rank matrix by scaling its rows or columns separately or simultaneously. We first reformulate the problem as a quasi-convex problem and provide a baseline bisection algorithm that is easy to implement in practice, where each iteration consists of an SDP feasibility problem. Then we propose a polynomial time potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of a Newton update based on the Nesterov-Todd direction. Our algorithm is based on a formulation of the problem which is a generalized version of the Von Neumann optimal growth problem. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems, to which we apply efficient customized solvers and study the empirical performance of our optimal diagonal preconditioners. Our extensive experiments on large matrices demonstrate the practical appeal of optimal diagonal preconditioners at reducing condition numbers compared to heuristics-based preconditioners.  ( 3 min )
    Domain Adaptation from Scratch. (arXiv:2209.00830v1 [cs.CL])
    Natural language processing (NLP) algorithms are rapidly improving but often struggle when applied to out-of-distribution examples. A prominent approach to mitigate the domain gap is domain adaptation, where a model trained on a source domain is adapted to a new target domain. We present a new learning setup, ``domain adaptation from scratch'', which we believe to be crucial for extending the reach of NLP to sensitive domains in a privacy-preserving manner. In this setup, we aim to efficiently annotate data from a set of source domains such that the trained model performs well on a sensitive target domain from which data is unavailable for annotation. Our study compares several approaches for this challenging setup, ranging from data selection and domain adaptation algorithms to active learning paradigms, on two NLP tasks: sentiment analysis and Named Entity Recognition. Our results suggest that using the abovementioned approaches eases the domain gap, and combining them further improves the results.  ( 2 min )
    TypoSwype: An Imaging Approach to Detect Typo-Squatting. (arXiv:2209.00783v1 [cs.CR])
    Typo-squatting domains are a common cyber-attack technique. It involves utilising domain names, that exploit possible typographical errors of commonly visited domains, to carry out malicious activities such as phishing, malware installation, etc. Current approaches typically revolve around string comparison algorithms like the Demaru-Levenschtein Distance (DLD) algorithm. Such techniques do not take into account keyboard distance, which researchers find to have a strong correlation with typical typographical errors and are trying to take account of. In this paper, we present the TypoSwype framework which converts strings to images that take into account keyboard location innately. We also show how modern state of the art image recognition techniques involving Convolutional Neural Networks, trained via either Triplet Loss or NT-Xent Loss, can be applied to learn a mapping to a lower dimensional space where distances correspond to image, and equivalently, textual similarity. Finally, we also demonstrate our method's ability to improve typo-squatting detection over the widely used DLD algorithm, while maintaining the classification accuracy as to which domain the input domain was attempting to typo-squat.  ( 2 min )
    Exact Decomposition of Quantum Channels for Non-IID Quantum Federated Learning. (arXiv:2209.00768v1 [quant-ph])
    Federated learning refers to the task of performing machine learning with decentralized data from multiple clients while protecting data security and privacy. Works have been done to incorporate quantum advantage in such scenarios. However, when the clients' data are not independent and identically distributed (IID), the performance of conventional federated algorithms deteriorates. In this work, we explore this phenomenon in the quantum regime with both theoretical and numerical analysis. We further prove that a global quantum channel can be exactly decomposed into channels trained by each client with the help of local density estimators. It leads to a general framework for quantum federated learning on non-IID data with one-shot communication complexity. We demonstrate it on classification tasks with numerical simulations.  ( 2 min )
    BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec. (arXiv:2209.00782v1 [cs.CR])
    Rapid digitalisation spurred by the Covid-19 pandemic has resulted in more cyber crime. Malware-as-a-service is now a booming business for cyber criminals. With the surge in malware activities, it is vital for cyber defenders to understand more about the malware samples they have at hand as such information can greatly influence their next course of actions during a breach. Recently, researchers have shown how malware family classification can be done by first converting malware binaries into grayscale images and then passing them through neural networks for classification. However, most work focus on studying the impact of different neural network architectures on classification performance. In the last year, researchers have shown that augmenting supervised learning with self-supervised learning can improve performance. Even more recently, Data2Vec was proposed as a modality agnostic self-supervised framework to train neural networks. In this paper, we present BinImg2Vec, a framework of training malware binary image classifiers that incorporates both self-supervised learning and supervised learning to produce a model that consistently outperforms one trained only via supervised learning. We were able to achieve a 4% improvement in classification performance and a 0.5% reduction in performance variance over multiple runs. We also show how our framework produces embeddings that can be well clustered, facilitating model explanability.  ( 3 min )
    Structure-Preserving Graph Representation Learning. (arXiv:2209.00793v1 [cs.LG])
    Though graph representation learning (GRL) has made significant progress, it is still a challenge to extract and embed the rich topological structure and feature information in an adequate way. Most existing methods focus on local structure and fail to fully incorporate the global topological structure. To this end, we propose a novel Structure-Preserving Graph Representation Learning (SPGRL) method, to fully capture the structure information of graphs. Specifically, to reduce the uncertainty and misinformation of the original graph, we construct a feature graph as a complementary view via k-Nearest Neighbor method. The feature graph can be used to contrast at node-level to capture the local relation. Besides, we retain the global topological structure information by maximizing the mutual information (MI) of the whole graph and feature embeddings, which is theoretically reduced to exchanging the feature embeddings of the feature and the original graphs to reconstruct themselves. Extensive experiments show that our method has quite superior performance on semi-supervised node classification task and excellent robustness under noise perturbation on graph structure or node features.  ( 2 min )
    Index Tracking via Learning to Predict Market Sensitivities. (arXiv:2209.00780v1 [q-fin.PM])
    A significant number of equity funds are preferred by index funds nowadays, and market sensitivities are instrumental in managing them. Index funds might replicate the index identically, which is, however, cost-ineffective and impractical. Moreover, to utilize market sensitivities to replicate the index partially, they must be predicted or estimated accurately. Accordingly, first, we examine deep learning models to predict market sensitivities. Also, we present pragmatic applications of data processing methods to aid training and generate target data for the prediction. Then, we propose a partial-index-tracking optimization model controlling the net predicted market sensitivities of the portfolios and index to be the same. These processes' efficacy is corroborated by the Korea Stock Price Index 200. Our experiments show a significant reduction of the prediction errors compared with historical estimations, and competitive tracking errors of replicating the index using fewer than half of the entire constituents. Therefore, we show that applying deep learning to predict market sensitivities is promising and that our portfolio construction methods are practically effective. Additionally, to our knowledge, this is the first study that addresses market sensitivities focused on deep learning.  ( 2 min )
    Deep reinforcement learning for quantum multiparameter estimation. (arXiv:2209.00671v1 [quant-ph])
    Estimation of physical quantities is at the core of most scientific research and the use of quantum devices promises to enhance its performances. In real scenarios, it is fundamental to consider that the resources are limited and Bayesian adaptive estimation represents a powerful approach to efficiently allocate, during the estimation process, all the available resources. However, this framework relies on the precise knowledge of the system model, retrieved with a fine calibration that often results computationally and experimentally demanding. Here, we introduce a model-free and deep learning-based approach to efficiently implement realistic Bayesian quantum metrology tasks accomplishing all the relevant challenges, without relying on any a-priori knowledge on the system. To overcome this need, a neural network is trained directly on experimental data to learn the multiparameter Bayesian update. Then, the system is set at its optimal working point through feedbacks provided by a reinforcement learning algorithm trained to reconstruct and enhance experiment heuristics of the investigated quantum sensor. Notably, we prove experimentally the achievement of higher estimation performances than standard methods, demonstrating the strength of the combination of these two black-box algorithms on an integrated photonic circuit. This work represents an important step towards fully artificial intelligence-based quantum metrology.  ( 2 min )
    Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax. (arXiv:2209.00655v1 [cs.LG])
    Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, contrastive learning shows great success on self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems. Unlike the state-of-the-art, we do not change the graph structure to construct augmented views. Instead, we use Kernel Subspace Augmentation to embed nodes into two geometrically different manifold views. The entire framework is trained by contrasting nodes and graph representations on those two manifold views through the commonly used contrastive objectives. Empirically, using publicly available benchmark EHR datasets, our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art. Theoretically, the variation on distance metrics naturally creates different views as data augmentation without changing graph structures.  ( 2 min )
    MIME: Minority Inclusion for Majority Group Enhancement of AI Performance. (arXiv:2209.00746v1 [cs.LG])
    Several papers have rightly included minority groups in artificial intelligence (AI) training data to improve test inference for minority groups and/or society-at-large. A society-at-large consists of both minority and majority stakeholders. A common misconception is that minority inclusion does not increase performance for majority groups alone. In this paper, we make the surprising finding that including minority samples can improve test error for the majority group. In other words, minority group inclusion leads to majority group enhancements (MIME) in performance. A theoretical existence proof of the MIME effect is presented and found to be consistent with experimental results on six different datasets. Project webpage: https://visual.ee.ucla.edu/mime.htm/  ( 2 min )
    Optimizing the Performative Risk under Weak Convexity Assumptions. (arXiv:2209.00771v1 [cs.LG])
    In performative prediction, a predictive model impacts the distribution that generates future data, a phenomenon that is being ignored in classical supervised learning. In this closed-loop setting, the natural measure of performance, denoted the performative risk, captures the expected loss incurred by a predictive model after deployment. The core difficulty of minimizing the performative risk is that the data distribution itself depends on the model parameters. This dependence is governed by the environment and not under the control of the learner. As a consequence, even the choice of a convex loss function can result in a highly non-convex performative risk minimization problem. Prior work has identified a pair of general conditions on the loss and the mapping from model parameters to distributions that implies convexity of the performative risk. In this paper, we relax these assumptions and focus on obtaining weaker notions of convexity, without sacrificing the amenability of the performative risk minimization problem for iterative optimization methods.  ( 2 min )
    Towards Optimization and Model Selection for Domain Generalization: A Mixup-guided Solution. (arXiv:2209.00652v1 [cs.LG])
    The distribution shifts between training and test data typically undermine the performance of deep learning models. In recent years, lots of work pays attention to domain generalization (DG) where distribution shift exists and target data are unseen. Despite the progress in algorithm design, two foundational factors have long been ignored: 1) the optimization for regularization-based objectives (e.g., distribution alignment), and 2) the model selection for DG since no knowledge about the target domain can be utilized. In this paper, we propose Mixup guided optimization and selection techniques for domain generalization. For optimization, we utilize an adapted Mixup to generate an out-of-distribution dataset that can guide the preference direction and optimize with Pareto optimization. For model selection, we generate a validation dataset with a closer distance to the target distribution, and thereby it can better represent the target data. We also present some theoretical insights behind our proposals. Comprehensive experiments on one visual classification benchmark and three time-series benchmarks demonstrate that our model optimization and selection techniques can largely improve the performance of existing domain generalization algorithms and even achieve new state-of-the-art results.  ( 2 min )
    Zero-Shot Multi-Modal Artist-Controlled Retrieval and Exploration of 3D Object Sets. (arXiv:2209.00682v1 [cs.CV])
    When creating 3D content, highly specialized skills are generally needed to design and generate models of objects and other assets by hand. We address this problem through high-quality 3D asset retrieval from multi-modal inputs, including 2D sketches, images and text. We use CLIP as it provides a bridge to higher-level latent features. We use these features to perform a multi-modality fusion to address the lack of artistic control that affects common data-driven approaches. Our approach allows for multi-modal conditional feature-driven retrieval through a 3D asset database, by utilizing a combination of input latent embeddings. We explore the effects of different combinations of feature embeddings across different input types and weighting methods.  ( 2 min )
    Making Intelligence: Ethics, IQ, and ML Benchmarks. (arXiv:2209.00692v1 [cs.LG])
    The ML community recognizes the importance of anticipating and mitigating the potential negative impacts of benchmark research. In this position paper, we argue that more attention needs to be paid to areas of ethical risk that lie at the technical and scientific core of ML benchmarks. We identify overlooked structural similarities between human IQ and ML benchmarks. Human intelligence and ML benchmarks share similarities in setting standards for describing, evaluating and comparing performance on tasks relevant to intelligence. This enables us to unlock lessons from feminist philosophy of science scholarship that need to be considered by the ML benchmark community. Finally, we outline practical recommendations for benchmark research ethics and ethics review.  ( 2 min )
    Exploring traditional machine learning for identification of pathological auscultations. (arXiv:2209.00672v1 [cs.LG])
    Today, data collection has improved in various areas, and the medical domain is no exception. Auscultation, as an important diagnostic technique for physicians, due to the progress and availability of digital stethoscopes, lends itself well to applications of machine learning. Due to the large number of auscultations performed, the availability of data opens up an opportunity for more effective analysis of sounds where prognostic accuracy even among experts remains low. In this study, digital 6-channel auscultations of 45 patients were used in various machine learning scenarios, with the aim of distinguishing between normal and anomalous pulmonary sounds. Audio features (such as fundamental frequencies F0-4, loudness, HNR, DFA, as well as descriptive statistics of log energy, RMS and MFCC) were extracted using the Python library Surfboard. Windowing and feature aggregation and concatenation strategies were used to prepare data for tree-based ensemble models in unsupervised (fair-cut forest) and supervised (random forest) machine learning settings. The evaluation was carried out using 9-fold stratified cross-validation repeated 30 times. Decision fusion by averaging outputs for a subject was tested and found to be useful. Supervised models showed a consistent advantage over unsupervised ones, achieving mean AUC ROC of 0.691 (accuracy 71.11%, Kappa 0.416, F1-score 0.771) in side-based detection and mean AUC ROC of 0.721 (accuracy 68.89%, Kappa 0.371, F1-score 0.650) in patient-based detection.  ( 3 min )
    Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms. (arXiv:2209.00735v1 [cs.LG])
    Neural Networks (NNs) struggle to efficiently learn certain problems, such as parity problems, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized learning algorithm. For example, on parity problems, the NN learns as well as row reduction, an efficient algorithm that can be succinctly described. Our architecture combines both recurrent weight-sharing between layers and convolutional weight-sharing to reduce the number of parameters down to a constant, even though the network itself may have trillions of nodes. While in practice the constants in our analysis are too large to be directly meaningful, our work suggests that the synergy of Recurrent and Convolutional NNs (RCNNs) may be more powerful than either alone.  ( 2 min )
    Effective Class-Imbalance learning based on SMOTE and Convolutional Neural Networks. (arXiv:2209.00653v1 [cs.LG])
    Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results. ID is the occurrence of a situation where the quantity of the samples belonging to one class outnumbers that of the other by a wide margin, making such models learning process biased towards the majority class. In recent years, to address this issue, several solutions have been put forward, which opt for either synthetically generating new data for the minority class or reducing the number of majority classes for balancing the data. Hence, in this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs), mixed with a variety of well-known imbalanced data solutions meaning oversampling and undersampling. To evaluate our methods, we have used KEEL, breast cancer, and Z-Alizadeh Sani datasets. In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions. The classification results demonstrate that the mixed Synthetic Minority Oversampling Technique (SMOTE)-Normalization-CNN outperforms different methodologies achieving 99.08% accuracy on the 24 imbalanced datasets. Therefore, the proposed mixed model can be applied to imbalanced binary classification problems on other real datasets.  ( 2 min )
    Temporal Conditional VAE for Distributional Drift Adaptation in Multivariate Time Series. (arXiv:2209.00654v1 [cs.LG])
    Due to the nonstationary nature, the distribution of real-world multivariate time series (MTS) changes over time, which is known as distribution drift. Most existing MTS forecasting models greatly suffer from the distribution drift and degrade the forecasting performance over time. Existing methods address distribution drift via adapting to the latest arrived data or self-correcting per the meta knowledge derived from future data. Despite their great success in MTS forecasting, these methods hardly capture the intrinsic distribution changes especially from a distributional perspective. Accordingly, we propose a novel framework temporal conditional variational autoencoder (TCVAE) to model the dynamic distributional dependencies over time between historical observations and future data in MTS and infer the dependencies as a temporal conditional distribution to leverage latent variables. Specifically, a novel temporal Hawkes attention mechanism represents temporal factors subsequently fed into feed-forward networks to estimate the prior Gaussian distribution of latent variables. The representation of temporal factors further dynamically adjusts the structures of Transformer-based encoder and decoder to distribution changes by leveraging a gated attention mechanism. Moreover, we introduce conditional continuous normalization flow to transform the prior Gaussian to a complex and form-free distribution to facilitate flexible inference of the temporal conditional distribution. Extensive experiments conducted on six real-world MTS datasets demonstrate the TCVAE's superior robustness and effectiveness over the state-of-the-art MTS forecasting baselines. We further illustrate the TCVAE applicability through multifaceted case studies and visualization in real-world scenarios.  ( 3 min )
  • Open

    Refining neural network predictions using background knowledge. (arXiv:2206.04976v2 [cs.AI] UPDATED)
    Recent work has shown logical background knowledge can be used in learning systems to compensate for a lack of labeled training data. Many methods work by creating a loss function that encodes this knowledge. However, often the logic is discarded after training, even if it is still useful at test time. Instead, we ensure neural network predictions satisfy the knowledge by refining the predictions with an extra computation step. We introduce differentiable refinement functions that find a corrected prediction close to the original prediction. We study how to effectively and efficiently compute these refinement functions. Using a new algorithm called Iterative Local Refinement (ILR), we combine refinement functions to find refined predictions for logical formulas of any complexity. ILR finds refinements on complex SAT formulas in significantly fewer iterations and frequently finds solutions where gradient descent can not. Finally, ILR produces competitive results in the MNIST addition task.  ( 2 min )
    Cost-based feature selection for network model choice. (arXiv:2101.07766v3 [stat.ME] UPDATED)
    Selecting a small set of informative features from a large number of possibly noisy candidates is a challenging problem with many applications in machine learning and approximate Bayesian computation. In practice, the cost of computing informative features also needs to be considered. This is particularly important for networks because the computational costs of individual features can span several orders of magnitude. We addressed this issue for the network model selection problem using two approaches. First, we adapted nine feature selection methods to account for the cost of features. We show for two classes of network models that the cost can be reduced by two orders of magnitude without considerably affecting classification accuracy (proportion of correctly identified models). Second, we selected features using pilot simulations with smaller networks. This approach reduced the computational cost by a factor of 50 without affecting classification accuracy. To demonstrate the utility of our approach, we applied it to three different yeast protein interaction networks and identified the best-fitting duplication divergence model.  ( 2 min )
    Clustering and Structural Robustness in Causal Diagrams. (arXiv:2111.04513v2 [stat.ML] UPDATED)
    Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagram, but it may erroneously change the essential properties of the causal relations if implemented arbitrarily. We define a specific type of cluster, called transit cluster, that is guaranteed to preserve the identifiability properties of causal effects under certain conditions. We provide a sound and complete algorithm for finding all transit clusters in a given graph and demonstrate how clustering can simplify the identification of causal effects. We also study the inverse problem, where one starts with a clustered graph and looks for extended graphs where the identifiability properties of causal effects remain unchanged. We show that this kind of structural robustness is closely related to transit clusters.  ( 2 min )
    Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning. (arXiv:2203.13319v3 [cs.LG] UPDATED)
    We present the extension of the Remember and Forget for Experience Replay (ReF-ER) algorithm to Multi-Agent Reinforcement Learning (MARL). ReF-ER was shown to outperform state of the art algorithms for continuous control in problems ranging from the OpenAI Gym to complex fluid flows. In MARL, the dependencies between the agents are included in the state-value estimator and the environment dynamics are modeled via the importance weights used by ReF-ER. In collaborative environments, we find the best performance when the value is estimated using individual rewards and we ignore the effects of other actions on the transition map. We benchmark the performance of ReF-ER MARL on the Stanford Intelligent Systems Laboratory (SISL) environments. We find that employing a single feed-forward neural network for the policy and the value function in ReF-ER MARL, outperforms state of the art algorithms that rely on complex neural network architectures.  ( 2 min )
    Scalable Model-based Policy Optimization for Decentralized Networked Systems. (arXiv:2207.06559v2 [cs.LG] UPDATED)
    Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.  ( 3 min )
    Improving Sequential Query Recommendation with Immediate User Feedback. (arXiv:2205.06297v2 [cs.IR] UPDATED)
    We propose an algorithm for next query recommendation in interactive data exploration settings, like knowledge discovery for information gathering. The state-of-the-art query recommendation algorithms are based on sequence-to-sequence learning approaches that exploit historical interaction data. Due to the supervision involved in the learning process, such approaches fail to adapt to immediate user feedback. We propose to augment the transformer-based causal language models for query recommendations to adapt to the immediate user feedback using multi-armed bandit (MAB) framework. We conduct a large-scale experimental study using log files from a popular online literature discovery service and demonstrate that our algorithm improves the per-round regret substantially, with respect to the state-of-the-art transformer-based query recommendation models, which do not make use of immediate user feedback. Our data model and source code are available at https://github.com/shampp/exp3_ss  ( 2 min )
    Privacy-preserving Data Sharing on Vertically Partitioned Data. (arXiv:2010.09293v2 [cs.LG] UPDATED)
    In this work, we introduce a differentially private method for generating synthetic data from vertically partitioned data, \emph{i.e.}, where data of the same individuals is distributed across multiple data holders or parties. We present a differentially privacy stochastic gradient descent (DP-SGD) algorithm to train a mixture model over such partitioned data using variational inference. We modify a secure multiparty computation (MPC) framework to combine MPC with differential privacy (DP), in order to use differentially private MPC effectively to learn a probabilistic generative model under DP on such vertically partitioned data. Assuming the mixture components contain no dependencies across different parties, the objective function can be factorized into a sum of products of the contributions calculated by the parties. Finally, MPC is used to compute the aggregate between the different contributions. Moreover, we rigorously define the privacy guarantees with respect to the different players in the system. To demonstrate the accuracy of our method, we run our algorithm on the Adult dataset from the UCI machine learning repository, where we obtain comparable results to the non-partitioned case.  ( 2 min )
    Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality. (arXiv:2209.01173v1 [stat.ML])
    In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the weight decay regularizer grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.  ( 2 min )
    Identifying Transients in the Dark Energy Survey using Convolutional Neural Networks. (arXiv:2203.09908v2 [astro-ph.IM] UPDATED)
    The ability to discover new transients via image differencing without direct human intervention is an important task in observational astronomy. For these kind of image classification problems, machine Learning techniques such as Convolutional Neural Networks (CNNs) have shown remarkable success. In this work, we present the results of an automated transient identification on images with CNNs for an extant dataset from the Dark Energy Survey Supernova program (DES-SN), whose main focus was on using Type Ia supernovae for cosmology. By performing an architecture search of CNNs, we identify networks that efficiently select non-artifacts (e.g. supernovae, variable stars, AGN, etc.) from artifacts (image defects, mis-subtractions, etc.), achieving the efficiency of previous work performed with random Forests, without the need to expend any effort in feature identification. The CNNs also help us identify a subset of mislabeled images. Performing a relabeling of the images in this subset, the resulting classification with CNNs is significantly better than previous results.  ( 2 min )
    An Interpretable and Efficient Infinite-Order Vector Autoregressive Model for High-Dimensional Time Series. (arXiv:2209.01172v1 [stat.ME])
    As a special infinite-order vector autoregressive (VAR) model, the vector autoregressive moving average (VARMA) model can capture much richer temporal patterns than the widely used finite-order VAR model. However, its practicality has long been hindered by its non-identifiability, computational intractability, and relative difficulty of interpretation. This paper introduces a novel infinite-order VAR model that not only avoids the drawbacks of the VARMA model but inherits its favorable temporal patterns. As another attractive feature, the temporal and cross-sectional dependence structures of this model can be interpreted separately, since they are characterized by different sets of parameters. For high-dimensional time series, this separation motivates us to impose sparsity on the parameters determining the cross-sectional dependence. As a result, greater statistical efficiency and interpretability can be achieved without sacrificing any temporal information. We introduce an $\ell_1$-regularized estimator for the proposed model and derive the corresponding non-asymptotic error bounds. An efficient block coordinate descent algorithm and a consistent model order selection method are developed. The merit of the proposed approach is supported by simulation studies and a real-world macroeconomic data analysis.  ( 2 min )
    Poincare: Recommending Publication Venues via Treatment Effect Estimation. (arXiv:2010.09157v2 [cs.DL] UPDATED)
    Choosing a publication venue for an academic paper is a crucial step in the research process. However, in many cases, decisions are based solely on the experience of researchers, which often leads to suboptimal results. Although there exist venue recommender systems for academic papers, they recommend venues where the paper is expected to be published. In this study, we aim to recommend publication venues from a different perspective. We estimate the number of citations a paper will receive if the paper is published in each venue and recommend the venue where the paper has the most potential impact. However, there are two challenges to this task. First, a paper is published in only one venue, and thus, we cannot observe the number of citations the paper would receive if the paper were published in another venue. Secondly, the contents of a paper and the publication venue are not statistically independent; that is, there exist selection biases in choosing publication venues. In this paper, we formulate the venue recommendation problem as a treatment effect estimation problem. We use a bias correction method to estimate the potential impact of choosing a publication venue effectively and to recommend venues based on the potential impact of papers in each venue. We highlight the effectiveness of our method using paper data from computer science conferences.  ( 3 min )
    Tree density estimation. (arXiv:2111.11971v4 [math.ST] UPDATED)
    We study the problem of estimating the density $f(\boldsymbol x)$ of a random vector ${\boldsymbol X}$ in $\mathbb R^d$. For a spanning tree $T$ defined on the vertex set $\{1,\dots ,d\}$, the tree density $f_{T}$ is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between $f$ and $f_{T}$. From i.i.d. data we identify an optimal tree $T^*$ and efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$, one has $\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0$ a.s. For Lipschitz $f$ with bounded support, $\mathbb E \left\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\right\}=O\big(n^{-1/4}\big)$, a dimension-free rate.  ( 2 min )
    Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple, and Practical. (arXiv:2209.01147v1 [cs.DS])
    We study one of the key tools in data approximation and optimization: low-discrepancy colorings. Formally, given a finite set system $(X,\mathcal S)$, the \emph{discrepancy} of a two-coloring $\chi:X\to\{-1,1\}$ is defined as $\max_{S \in \mathcal S}|{\chi(S)}|$, where $\chi(S)=\sum\limits_{x \in S}\chi(x)$. We propose a randomized algorithm which, for any $d>0$ and $(X,\mathcal S)$ with dual shatter function $\pi^*(k)=O(k^d)$, returns a coloring with expected discrepancy $O\left({\sqrt{|X|^{1-1/d}\log|\mathcal S|}}\right)$ (this bound is tight) in time $\tilde O\left({|\mathcal S|\cdot|X|^{1/d}+|X|^{2+1/d}}\right)$, improving upon the previous-best time of $O\left(|\mathcal S|\cdot|X|^3\right)$ by at least a factor of $|X|^{2-1/d}$ when $|\mathcal S|\geq|X|$. This setup includes many geometric classes, families of bounded dual VC-dimension, and others. As an immediate consequence, we obtain an improved algorithm to construct $\varepsilon$-approximations of sub-quadratic size. Our method uses primal-dual reweighing with an improved analysis of randomly updated weights and exploits the structural properties of the set system via matchings with low crossing number -- a fundamental structure in computational geometry. In particular, we get the same $|X|^{2-1/d}$ factor speed-up on the construction time of matchings with crossing number $O\left({|X|^{1-1/d}}\right)$, which is the first improvement since the 1980s. The proposed algorithms are very simple, which makes it possible, for the first time, to compute colorings with near-optimal discrepancies and near-optimal sized approximations for abstract and geometric set systems in dimensions higher than $2$.  ( 3 min )
    A taxonomy of surprise definitions. (arXiv:2209.01034v1 [q-bio.NC])
    Surprising events trigger measurable brain activity and influence human behavior by affecting learning, memory, and decision-making. Currently there is, however, no consensus on the definition of surprise. Here we identify 18 mathematical definitions of surprise in a unifying framework. We first propose a technical classification of these definitions into three groups based on their dependence on an agent's belief, show how they relate to each other, and prove under what conditions they are indistinguishable. Going beyond this technical analysis, we propose a taxonomy of surprise definitions and classify them into four conceptual categories based on the quantity they measure: (i) 'prediction surprise' measures a mismatch between a prediction and an observation; (ii) 'change-point detection surprise' measures the probability of a change in the environment; (iii) 'confidence-corrected surprise' explicitly accounts for the effect of confidence; and (iv) 'information gain surprise' measures the belief-update upon a new observation. The taxonomy poses the foundation for principled studies of the functional roles and physiological signatures of surprise in the brain.  ( 2 min )
    Normalization effects on deep neural networks. (arXiv:2209.01018v1 [cs.LG])
    We study the effect of normalization on the layers of deep neural networks of feed-forward type. A given layer $i$ with $N_{i}$ hidden units is allowed to be normalized by $1/N_{i}^{\gamma_{i}}$ with $\gamma_{i}\in[1/2,1]$ and we study the effect of the choice of the $\gamma_{i}$ on the statistical behavior of the neural network's output (such as variance) as well as on the test accuracy on the MNIST data set. We find that in terms of variance of the neural network's output and test accuracy the best choice is to choose the $\gamma_{i}$'s to be equal to one, which is the mean-field scaling. We also find that this is particularly true for the outer layer, in that the neural network's behavior is more sensitive in the scaling of the outer layer as opposed to the scaling of the inner layers. The mechanism for the mathematical analysis is an asymptotic expansion for the neural network's output. An important practical consequence of the analysis is that it provides a systematic and mathematically informed way to choose the learning rate hyperparameters. Such a choice guarantees that the neural network behaves in a statistically robust way as the $N_i$ grow to infinity.  ( 2 min )
    Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning. (arXiv:2209.01092v1 [cs.AI])
    In the context of modern environmental and societal concerns, there is an increasing demand for methods able to identify management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies.  ( 3 min )
    Exploiting Pretrained Biochemical Language Models for Targeted Drug Design. (arXiv:2209.00981v1 [cs.LG])
    Motivation: The development of novel compounds targeting proteins of interest is one of the most important tasks in the pharmaceutical industry. Deep generative models have been applied to targeted molecular design and have shown promising results. Recently, target-specific molecule generation has been viewed as a translation between the protein language and the chemical language. However, such a model is limited by the availability of interacting protein-ligand pairs. On the other hand, large amounts of unlabeled protein sequences and chemical compounds are available and have been used to train language models that learn useful representations. In this study, we propose exploiting pretrained biochemical language models to initialize (i.e. warm start) targeted molecule generation models. We investigate two warm start strategies: (i) a one-stage strategy where the initialized model is trained on targeted molecule generation (ii) a two-stage strategy containing a pre-finetuning on molecular generation followed by target specific training. We also compare two decoding strategies to generate compounds: beam search and sampling. Results: The results show that the warm-started models perform better than a baseline model trained from scratch. The two proposed warm-start strategies achieve similar results to each other with respect to widely used metrics from benchmarks. However, docking evaluation of the generated compounds for a number of novel proteins suggests that the one-stage strategy generalizes better than the two-stage strategy. Additionally, we observe that beam search outperforms sampling in both docking evaluation and benchmark metrics for assessing compound quality. Availability and implementation: The source code is available at https://github.com/boun-tabi/biochemical-lms-for-drug-design and the materials are archived in Zenodo at https://doi.org/10.5281/zenodo.6832145  ( 3 min )
    Log-Gaussian processes for AI-assisted TAS experiments. (arXiv:2209.00980v1 [physics.data-an])
    To understand the origins of materials properties, neutron scattering experiments at three-axes spectrometers (TAS) investigate magnetic and lattice excitations in a sample by measuring intensity distributions in its momentum (Q) and energy (E) space. The high demand and limited availability of beam time for TAS experiments however raise the natural question whether we can improve their efficiency or make better use of the experimenter's time. In fact, using TAS, there are a number of scientific questions that require searching for signals of interest in a particular region of Q-E space, but when done manually, it is time consuming and inefficient since the measurement points may be placed in uninformative regions such as the background. Active learning is a promising general machine learning approach that allows to iteratively detect informative regions of signal autonomously, i.e., without human interference, thus avoiding unnecessary measurements and speeding up the experiment. In addition, the autonomous mode allows experimenters to focus on other relevant tasks in the meantime. The approach that we describe in this article exploits log-Gaussian processes which, due to the log transformation, have the largest approximation uncertainties in regions of signal. Maximizing uncertainty as an acquisition function hence directly yields locations for informative measurements. We demonstrate the benefits of our approach on outcomes of a real neutron experiment at the thermal TAS EIGER (PSI) as well as on results of a benchmark in a synthetic setting including numerous different excitations.  ( 3 min )
    Macroeconomic Predictions using Payments Data and Machine Learning. (arXiv:2209.00948v1 [econ.GN])
    Predicting the economy's short-term dynamics -- a vital input to economic agents' decision-making process -- often uses lagged indicators in linear models. This is typically sufficient during normal times but could prove inadequate during crisis periods. This paper aims to demonstrate that non-traditional and timely data such as retail and wholesale payments, with the aid of nonlinear machine learning approaches, can provide policymakers with sophisticated models to accurately estimate key macroeconomic indicators in near real-time. Moreover, we provide a set of econometric tools to mitigate overfitting and interpretability challenges in machine learning models to improve their effectiveness for policy use. Our models with payments data, nonlinear methods, and tailored cross-validation approaches help improve macroeconomic nowcasting accuracy up to 40\% -- with higher gains during the COVID-19 period. We observe that the contribution of payments data for economic predictions is small and linear during low and normal growth periods. However, the payments data contribution is large, asymmetrical, and nonlinear during strong negative or positive growth periods.  ( 2 min )
    A Discussion of Discrimination and Fairness in Insurance Pricing. (arXiv:2209.00858v1 [cs.LG])
    Indirect discrimination is an issue of major concern in algorithmic models. This is particularly the case in insurance pricing where protected policyholder characteristics are not allowed to be used for insurance pricing. Simply disregarding protected policyholder information is not an appropriate solution because this still allows for the possibility of inferring the protected characteristics from the non-protected ones. This leads to so-called proxy or indirect discrimination. Though proxy discrimination is qualitatively different from the group fairness concepts in machine learning, these group fairness concepts are proposed to 'smooth out' the impact of protected characteristics in the calculation of insurance prices. The purpose of this note is to share some thoughts about group fairness concepts in the light of insurance pricing and to discuss their implications. We present a statistical model that is free of proxy discrimination, thus, unproblematic from an insurance pricing point of view. However, we find that the canonical price in this statistical model does not satisfy any of the three most popular group fairness axioms. This seems puzzling and we welcome feedback on our example and on the usefulness of these group fairness axioms for non-discriminatory insurance pricing.  ( 2 min )
    Exact Decomposition of Quantum Channels for Non-IID Quantum Federated Learning. (arXiv:2209.00768v1 [quant-ph])
    Federated learning refers to the task of performing machine learning with decentralized data from multiple clients while protecting data security and privacy. Works have been done to incorporate quantum advantage in such scenarios. However, when the clients' data are not independent and identically distributed (IID), the performance of conventional federated algorithms deteriorates. In this work, we explore this phenomenon in the quantum regime with both theoretical and numerical analysis. We further prove that a global quantum channel can be exactly decomposed into channels trained by each client with the help of local density estimators. It leads to a general framework for quantum federated learning on non-IID data with one-shot communication complexity. We demonstrate it on classification tasks with numerical simulations.  ( 2 min )
    Optimizing the Performative Risk under Weak Convexity Assumptions. (arXiv:2209.00771v1 [cs.LG])
    In performative prediction, a predictive model impacts the distribution that generates future data, a phenomenon that is being ignored in classical supervised learning. In this closed-loop setting, the natural measure of performance, denoted the performative risk, captures the expected loss incurred by a predictive model after deployment. The core difficulty of minimizing the performative risk is that the data distribution itself depends on the model parameters. This dependence is governed by the environment and not under the control of the learner. As a consequence, even the choice of a convex loss function can result in a highly non-convex performative risk minimization problem. Prior work has identified a pair of general conditions on the loss and the mapping from model parameters to distributions that implies convexity of the performative risk. In this paper, we relax these assumptions and focus on obtaining weaker notions of convexity, without sacrificing the amenability of the performative risk minimization problem for iterative optimization methods.  ( 2 min )
    Optimal Diagonal Preconditioning: Theory and Practice. (arXiv:2209.00809v1 [math.OC])
    Preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the matrix it is applied to, thereby speeding up convergence of optimization algorithms. Although there are many popular preconditioning techniques in practice, most lack theoretical guarantees for reductions in condition number. In this paper, we study the problem of optimal diagonal preconditioning to achieve maximal reduction in the condition number of any full-rank matrix by scaling its rows or columns separately or simultaneously. We first reformulate the problem as a quasi-convex problem and provide a baseline bisection algorithm that is easy to implement in practice, where each iteration consists of an SDP feasibility problem. Then we propose a polynomial time potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of a Newton update based on the Nesterov-Todd direction. Our algorithm is based on a formulation of the problem which is a generalized version of the Von Neumann optimal growth problem. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems, to which we apply efficient customized solvers and study the empirical performance of our optimal diagonal preconditioners. Our extensive experiments on large matrices demonstrate the practical appeal of optimal diagonal preconditioners at reducing condition numbers compared to heuristics-based preconditioners.  ( 3 min )
    Recurrent Convolutional Neural Networks Learn Succinct Learning Algorithms. (arXiv:2209.00735v1 [cs.LG])
    Neural Networks (NNs) struggle to efficiently learn certain problems, such as parity problems, even when there are simple learning algorithms for those problems. Can NNs discover learning algorithms on their own? We exhibit a NN architecture that, in polynomial time, learns as well as any efficient learning algorithm describable by a constant-sized learning algorithm. For example, on parity problems, the NN learns as well as row reduction, an efficient algorithm that can be succinctly described. Our architecture combines both recurrent weight-sharing between layers and convolutional weight-sharing to reduce the number of parameters down to a constant, even though the network itself may have trillions of nodes. While in practice the constants in our analysis are too large to be directly meaningful, our work suggests that the synergy of Recurrent and Convolutional NNs (RCNNs) may be more powerful than either alone.  ( 2 min )

  • Open

    "The Unsurprising Effectiveness of Pre-Trained Vision Models for Control", Parisi et al 2022 {FB} (CLIP)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    DQN does not work, but Distributional DQN (C51) does
    I am struggling to understand why the DQN algorithm fails in my environment -after a short rise at the beginning, the reward falls and remains low for more than 1e6 steps- but its distributional version (C51) achieves excellent results. I'm using RLlib implementations, so it's not a bug in my code. I would understand why C51 would learn faster, but what can be the reasons that prevent DQN to learn anything at all? One explanation I gave to myself is that in my environment the value for a given state can be quite noisy, and DQN learns only the expected value of the outcome, while C51 learns the distribution of possible outcomes. However, C51 still uses the expected value to choose the argmax action, so wouldn't the result be the same? Another possible explanation is that in my environment good performance corresponds to a region of small negative rewards, so gradient updates are much larger for low-reward batches, while this doesn't happen in C51. However, gradient clipping doesn't seem to be much effective. Do you know any other possible explanation? submitted by /u/fedetask [link] [comments]  ( 88 min )
    "Semantic Exploration from Language Abstractions and Pretrained Representations", Tam et al 2022 (plugging BERT/CLIP LMs into Impala/R2D2's NGU/RND exploration methods)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "LID: Pre-Trained Language Models for Interactive Decision-Making", Li et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "Housekeep: Tidying Virtual Households using Commonsense Reasoning", Kant et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "Awesome-LLM-Robotics": A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Help with Policy Gradient agent that learns actions resulting in negative rewards
    I have a toy RL project implementing he REINFORCE algorithm with a policy gradient agent that seems to be consistently learning to choose actions that reliably generate negative rewards. Are there any likely reasons this could be so? Additional details below. The rules of the environment are of my own design. Since that makes it obviously hard to diagnose, I'll take whatever high or low granularity suggestions I can get. --------- My environment is a 25x25 grid upon which a bot can move and attempt to acquire a target. Bots can choose the following 17 actions: Do nothing Move 1 space (in any of the 8 directions surrounding it) Attempt to acquire a target 1 space away (in any of the 8 directions surrounding it) The reward system is such that: +1 for moving closer to any targe…  ( 92 min )
    "Improved Policy Optimization for Online Imitation Learning", Lavington et al 2022
    submitted by /u/gwern [link] [comments]  ( 87 min )
    "Learning with Differentiable Algorithms", Petersen 2022 (sorting, top-k/ranking, rendering, logic gates, distances)
    submitted by /u/gwern [link] [comments]  ( 87 min )
    Help with literature recomendation. Variable input state in a Multi-agent Reinforcement learning environment?
    Hi community. I am very new to the field and I find myself in need for some guide to begin a project applied to UAVs. My context is the following. I am experimenting with decision making of autonomous Multi-agent UAVs in a simulated environment. My problem arises when attemting to represent the n number of other agents seen by a particular one in order to make a decision. As far as I have seen, the input size of the state vector is of fixed lenght, so how could one represent an unknown number of agents? (each with it's vector representing it's relative position in space)? I am aware of padding, but I was looking for a more (elegant?) and most importantly scalable solution. Thanks in advance and please forgive my ignorance on the topic. submitted by /u/Bensimon_Joules [link] [comments]  ( 88 min )
    Question about modifying gym environment
    Hi everyone thank you so much in advance. Relative newbie here to open ai gym. I was wondering how I can change an environment like the taxi environment such as adding in more destination points, passengers, larger grid size, etc? Do you have any pointers or recommendations? I greatly appreciate it! submitted by /u/No_Opportunity575 [link] [comments]  ( 102 min )
    stable_baseline3 problems
    Hello all, I am building a custom Reinforcement Learning trading environment using gym.Env for my masters dissertation. I finished developing it and I passed it through the stable_baseline3 check_env function. The environment did not raise any issues. Now I am trying to implement a quick RL algorithm to check if everything is set up correctly. I would really appreciate if anyone can give me some clarity what I am doing wrong. Here is my action space and observation space: self._data = df.copy() self.action_space = gym.spaces.Box(low=np.array([0, 0]), high=np.array([3, 1], dtype=np.float32)) # Store the columns to be shown to the agent self._columns = list(self._data.columns)[1:21] self._low = min(list(self._data[self._columns].min())) self._high = max(list(self._data[self._columns].max…  ( 99 min )
  • Open

    [D] Model fit of training data
    I am trying to learn machine learning. I have one hang up that I cannot find the answer to: How does a model fit inputs from a training data set? For example, I am working with the MNIST training set. When I fit the data using a particular model, how does it do it? All I am finding are articles with the syntax for code to do this. I am trying to understand what is happening one layer deeper? Does this make sense? I am not even sure what keywords should I I use to search for this on goggle. submitted by /u/NetworkPoker [link] [comments]  ( 90 min )
    [P] Peacasso - A Web UI for Stable Diffusion Models
    Peacasso - a UI for interacting with stable diffusion models. As text to image models become smaller, with available weights and generate competitive images (e.g. stable diffusion models), there have been efforts to build interfaces for interacting with these models. Code and instructutions can be found on Github - https://github.com/victordibia/peacasso. ​ https://i.redd.it/t65hvd8tdwl91.gif There are many excellent UI interfaces that have been build for latent diffusion models recently, however some of the things Peacasso brings to the table include: Easy installation. Instead of cobbling together command line scripts, Peacasso provides a pip install flow and a UI that supports a set of carefully selected default operations. UI with good defaults. The current implementation of Peacasso provides a UI for basic operations - text and image based prompting, remixing generated images as prompts, model parameter selection. Also the little things .. like light and dark mode. Python API. While the UI is the focus here, there is an underlying python api which will bake in experimentation features (e.g. saving intermediate images in the sampling loop, exploring model explanations etc. . see roadmap ). ​ Light Mode ​ Dark mode submitted by /u/vykthur [link] [comments]  ( 90 min )
    [Discussion] Twice Differentiable Neural Networks
    Hi, I have a question regarding taking the gradient twice, once wrt the input and once wrt to parameters and what one might have to pay attention to optimize the whole thing. Context: I have a neural network f(x,\theta) with a target y and it's output and gradient wrt to x occur in the loss which is calculated as L{ f(x, \theta), \nabla_x f(x, \theta), y }. The use case are differential equations. This means that to obtain the loss, I have to take gradient of the prediction f(x, \theta) once for the input x. Thereafter I take the gradient wrt to the parameters to do gradient descent as usual. I use twice differentiable activation functions (tanh, gelu, sigmoid etc). Problem: The model works for low dimensions but scales poorly to higher dimensional data. Question: Does anybody have recommendations, resources or links with regard to twice differentiable neural networks and how stochastic gradient descent is done with them, respectively what one has to pay to attention to while optimizing them (learning rates, momenta, weight decay etc) ? A resource with a similar setup is described in the Hamiltonian Neural Network paper by Greydanus et al (https://arxiv.org/pdf/1906.01563.pdf) which also computes the gradients of the parameters after taking the gradient wrt the input. Thanks in advance dear fellow gradient descenter :) submitted by /u/ludixiv [link] [comments]  ( 92 min )
    [D] Intuitive understanding of generative models
    Intuitively, can generative models be understood as learning to "interpolate" between the training data? For example, in the case of score-based models, my understanding is that the minimizer of the loss is any distribution where the score is 0 for each of the noise-perturbed training data - i.e., a distribution with (at least) *n* modes (*n* is training data size). Then, we do a gradient ascent on the distribution and add some Langevin noise so that the final sample (for example, an image) is not just images from the training data. What is the relation of the objects as we see them in this image to the objects in images from the training data? submitted by /u/a1_jakesauce_ [link] [comments]  ( 89 min )
    [D] Stable diffusion and similar platforms are the best compression models the world currently has!
    I think it is important to understand the capabilities of these technologies beyond just making art. What these text to image models are capable of is massive compression of visual data that is orders of magnitude better then what we currently have. Right now if I wanted to share any image I created with stable diffusion, I do not have to share the image with you. I could just give you the metadata for the settings used to generate the said image and save a tremendous amount of bandwidth and hard drive space. Now you might be thinking that's great and all but what's my point. Well imagine if you are running a massive website that hosts generated images. All of those images are taking up a lot of space on server hard drives which you have to pay to host. Also there is massive amount of ban…  ( 112 min )
    [P] - VkFFT now supports Rader's algorithm - A100 and MI250 benchmarks: Part 2
    Hello, I am the creator of the VkFFT - GPU Fast Fourier Transform library for Vulkan/CUDA/HIP/OpenCL and Level Zero. Two weeks ago I made a post about Rader's algorithm implementation in VkFFT, which improved the performance of VkFFT for sequences not decomposable as small primes multiplication. The previous version of VkFFT was doing direct multiplication convolutions of length N-1 to create an FFT kernel of an arbitrary prime length to be used in a regular Stockham FFT algorithm. Direct multiplication convolutions scale as O(N^2) and do not work well for primes after 100. This update brings support for the convolution theorem Rader's algorithm, which no other GPU FFT library currently has. The new version does the Rader algorithm by inlining an FFT convolution in the FFT code - with FF…  ( 113 min )
    [P] Serving ML models at scale using mlflow
    As you know, mlflow is widely used today in the machine learning communiting to manage experimentation and serve models. In this series, I published on medium, I address the problem of scalability that I faced in my company while deploying multiple models in production using mlflow. ​ https://preview.redd.it/jzl3sb5z8vl91.png?width=700&format=png&auto=webp&s=43c15d7167399591e6db13b175530735a8b85039 In this series, I wrote about: Deploying an mlflow tracking instance to experiment Serving ml models as APIs endpoints on kubernetes. Understanding how k8s handles charge through Load testing The last article explains how you can make the deployment scalable, anticipate the computation power needed to handle multiple simultaneous requests in a real world context. submitted by /u/Spirited-Singer-6150 [link] [comments]  ( 89 min )
    [R] Model compression reference request
    I am looking to get into the model compression field, especially in the post-train/fine-tuning phases (i.e., quantization, pruning). What are some canonical papers in the field? Is there any must-read book/survey paper? What are the most glaring problems and the most promising research avenues (in your opinion)? submitted by /u/Disastrous-War-9675 [link] [comments]  ( 88 min )
    [R] mmwave based multi user 3D posture tracking for AR / VR / smart home / fitness tracking
    submitted by /u/SpatialComputing [link] [comments]  ( 90 min )
    [D] What is the SOTA explanation for why deep learning works? I understand backprop and gradient descent, but why should over-parametrized networks actually converge to anything useful?
    Gradient descent is straightforward, you can see it obviously in 2d, getting stuck in local minima, etc. Backprop likewise makes sense in the case of a couple of neurons and a couple of layers, toy problems, etc. But why is it obvious that this process should work in ultra high dimensional space as well? I mean it's obvious there is structure in visual/audio/language data, and it makes sense you would be able to learn a representation of that structure by some method. But it's not at all obvious to me that backprop should "work" in that it should converge to anything generically useful for such data. It bothers me to have that missing intuition. How do you guys mentally resolve this? Do you just say "if it works it works" and get on with it? What is the latest SOTA thinking around a theoretical foundation for deep learning? Not a researcher myself - just a curious data scientist who works with them and occasionally reads papers on weekends. Would be happy to read any papers/blogs/posts on the topic as it stands now in mid 2022. submitted by /u/thunderdome [link] [comments]  ( 119 min )
    [D] Solving TSP Using Diffusion Model
    Could we use a diffusion model to solve the traveling salesman problem in polynomial time? I don't know but I think we could do that by trying to randomize the path of an exactly solved TSP and then try to make the machine learn to reverse the process. If my guess is correct, we could exactly solve the TSP in polynomial time. Seems like the recently published Riemannian diffusion model could do that. submitted by /u/flat_rain [link] [comments]  ( 89 min )
    [P] Apple pencil with the power of Local Stable Diffusion using Gradio Web UI running off a 3090
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 92 min )
    [D] Should I focus on breadth or depth during my studies as an undergrad aiming to enter ML?
    Basically I’m majoring in computer science and will very likely double major in math (as I’m more interested in it rather than cs but majoring in math only will bar me from participating in ml or ai classes or research due to my university rules) So my question is should I focus on machine learning solely or try to build some breadth and care about ml later on? I know this might sound like a silly question but it’s hard to decide since I’m afraid of pigeon holing my self into some niche field of ml or some thing like that? I’ll add more points to my post later on once I come back from work but hope the question is understandable and doesn’t break any sub rules thank you. submitted by /u/djssoapappskdid [link] [comments]  ( 92 min )
    [D] Asking about the difference beetwen Automatic Tagging and Tag Recommendation System
    Hello, everyone. I have a question regarding to the tittle. So i have a thesis about Automatic tagging topic and i have to make the project about it. So my lecturer give 2 journal references, they are about automatic tagging and tag recommendation system. And i just confused about them. are they have same goal to make auto-tagging for the object ? why my lecturer give me these references? does tag recommendation system have to put into the automatic tagging ? Anyway i am still in the making of methodology step. Thank you. submitted by /u/TohingWahing [link] [comments]  ( 105 min )
    [D] OpenAI SWE Residency?
    Hey there! Has anyone here gone through the program before? Or are there any OpenAI people who can offer insight into the program? Maybe someone who is willing to share their experience and how to best prep for the program? I've seen various discussions on r/MachineLearning for the research track, but none about this program submitted by /u/ThrowawayTartan [link] [comments]  ( 90 min )
  • Open

    Best free AI generator I can use to create simple cartoon characters?
    every ai out there somehow does a phenomenal job at creating landscapes and scenery and such and even extremely realistic portrays of humans, yet google doesnt give me a clear answer on any ai generator that can create cartoon characters based on the style of X or Y. i kinda wanted to see what i could create using ai and if itd be possible to create channel art or cartoon avatars that look like me for my youtube banner and videos. is there anything out there like this or would i have to find someone to draw something for me or use an online avatar creator that just uses a template? submitted by /u/blxoom [link] [comments]  ( 88 min )
    Stable Diffusion Weekly AI Art Slideshow 4K 9.4.22
    submitted by /u/prfitofthesngularity [link] [comments]  ( 86 min )
    Sunday reading suggestion. This is a good resource to learn and do Data Cleaning tasks quickly.
    submitted by /u/alimhabidi [link] [comments]  ( 87 min )
    MATRIX AI VERSION
    submitted by /u/nalr00n [link] [comments]  ( 86 min )
    A New Deep Learning Approach Developed at MIT Identifies Undiagnosable Cancers by Taking a Closer Look the Gene Expression Programs Related to Early Cell Development and Differentiation
    Cancer is partially a developmental illness, with malignancies named for the cell or tissue from which they originate. However, there is no systematic atlas of tumor sources. Identifying a patient’s precise type of cancer and its main site is the first step in deciding on the best course of treatment. Despite extensive testing, the source of cancer cannot be determined in many situations. Oncologists must employ non-targeted medicines with severe side effects and poor survival rates. Researchers at Massachusetts General Hospital (MGH) and the Koch Institute for Integrative Cancer Research at MIT may help classify cancers of unknown primary. Their work introduces a new deep learning method by closely examining the gene expression patterns related to early cell development and differentiation. Continue reading | Check out the paper submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Critique of a stupid video - Top 10 scariest things that will happen before 2050
    submitted by /u/HumanSeeing [link] [comments]  ( 87 min )
    Best introductory books on artificial intelligence?
    I am have not had much luck finding decent work with my philosophy degree and am considering an MA in AI. As it stands I don't know much at all and am in the very early stages of getting to grips with the topic and deciding whether it's something I could really do. With that in mind, do you guys have any recommendations for introductory books on the topic? There's plenty of content online, but I'd prefer a proper book that's a little more in depth. submitted by /u/Snedwardthe18th [link] [comments]  ( 88 min )
    Ai art starryai
    submitted by /u/AstroFish69 [link] [comments]  ( 86 min )
    "Floraison d'hiver" (Winter Bloom), creating dancing animations with AI
    submitted by /u/Kkrch [link] [comments]  ( 87 min )
    French tax officials use AI to spot 20,000 undeclared pools | France
    submitted by /u/firig1965 [link] [comments]  ( 87 min )
    Please help me train my chatbot.
    I built a simple chatbot with NextJS for my final year project, but I'm having trouble training it. Xalen is a self-learning conversational chatbot designed to hold funny and witty conversations with users through artificial intelligence methods that enable the program to communicate like an actual human being. It can talk about anything (eventually), tell jokes, engage in witty banter, etc. It's generally for entertainment purposes. Xalen learns from each and every conversation, becoming better and more intelligent, improving its ability to hold longer and more meaningful conversations. Due to a lack of sufficient data, its responses aren't perfect right now, but they can definitely improve! I need a lot of people to chat with the bot. The longer the conversation the better. Your help is greatly appreciated... Here's the URL: http://xalen.netlify.app Thank you all in advance! submitted by /u/GameTide [link] [comments]  ( 88 min )
    Has Anyone Considering Making an AI Colab generator for hand drawn animation?
    I honestly would love a Colab Notebook for a hand drawn animation generator where you import the storyboards, animatics, backgrounds, model sheets, time sheet/exposure sheets, etc. Though it wouldn't be a substitute for animating yourself, but could be useful if unmotivated. Also, I would love it to have various styles of studios that have done hand drawn animation before (Kinda like Jukebox AI where it has various styles of composers). Especially, defunct ones. No one has ever done an animation generator like this, so it could be a unique one. A time sheet, aka exposure sheet is a wide document with directions for the animators on what to draw for each frame. Here's some examples by me and a former sheet timer of mine named Emery. They sadly left due to personal stuff. https://drive.google.com/file/d/1wbhT3gmCgWQGJYLtYSfgcdf6Z-A6YOJk/view?usp=drivesdk https://drive.google.com/file/d/1n3D4CHJXx6VEBrKWmcmvUBFUMz4Reff6/view?usp=drivesdk https://drive.google.com/file/d/1ckZPfKHQ0gHvKrSGfH-F8PI5HKbEQDlZ/view?usp=drivesdk https://drive.google.com/file/d/1cTj83apBrFJst4sfISwG_tJFZMP-HJtf/view?usp=drivesdk submitted by /u/Ooglyeye [link] [comments]  ( 87 min )
    Stable Diffusion Animation | Jupiter's Stability | Planetary Path | AI Manifest
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 87 min )
    6 Best Artificial Intelligence courses for Healthcare You should learn 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 86 min )
    How to create AI Art Using Starting Images with Stable Diffusion
    submitted by /u/prfitofthesngularity [link] [comments]  ( 87 min )
  • Open

    The Luong Attention Mechanism
    The Luong attention sought to introduce several improvements over the Bahdanau model for neural machine translation, particularly by introducing two new classes of attentional mechanisms: a global approach that attends to all source words, and a local approach that only attends to a selected subset of words in predicting the target sentence.  In this tutorial, […] The post The Luong Attention Mechanism appeared first on Machine Learning Mastery.
  • Open

    Computing VIN checksums
    I’ve had to work a little with VIN numbers lately, and so I looked back at a post I wrote on the subject three years ago. That post goes into the details of Vehicle Identification Numbers and the quirky algorithm used to compute the check sum. This post captures the algorithm in Python code. See […] Computing VIN checksums first appeared on John D. Cook.  ( 5 min )
  • Open

    Lack of Trust Continues to Erode Blockchain Adoption
    Blockchain technology has outgrown from being distributed ledger in financial applications to peer‐to‐peer networks that hold tremendous value in any industry and sector. Bewildering as the growth has been, organizations are engineering their blockchain. The post Lack of Trust Continues to Erode Blockchain Adoption appeared first on Data Science Central.  ( 20 min )
    What is Digital Experience Monitoring (DEM)?
    End-user experience monitoring (EUEM) enables IT professionals to understand issues from the viewpoint of end users, deliver a better customer experience, and fix issues more quickly by constantly capturing failures, breakdowns, page load data, network requests, and other metrics. The post What is Digital Experience Monitoring (DEM)? appeared first on Data Science Central.  ( 19 min )
    Five Steps to Data Profiling for Successful Discovery
    Data profiling focuses on examining and analyzing data, followed by creating a useful summary of that data. The post Five Steps to Data Profiling for Successful Discovery appeared first on Data Science Central.  ( 21 min )
    Will Technology Help Eliminate BS Jobs?
    The definition of a BS job is that the person doing it feels that their contribution to society is meaningless and in a way even harms it The post Will Technology Help Eliminate BS Jobs? appeared first on Data Science Central.  ( 20 min )
    CDN: Does Our Need For Internet Speed Put Sensitive Data at Risk?
    What is CDN, does it endanger the sensitive data of internet users, and how can companies prevent hackers from exploiting its flaws? The post CDN: Does Our Need For Internet Speed Put Sensitive Data at Risk? appeared first on Data Science Central.  ( 21 min )
    Data Science Lessons from Top Gun
    The movie of the summer of 2022 is no doubt “Top Gun: Maverick.”  Fast moving, tons of aerial combat, lots of excitement, and the youngest-looking Tom Cruise I’ve ever seen (not one stitch of grey hair on his head).  There’s gotta be a Data Science lesson in there somewhere… The post Data Science Lessons from Top Gun appeared first on Data Science Central.  ( 25 min )
    How to Use Python to Loop Through HTML Tables and Scrape Tabular Data
    Iterating through HTML tables can be tricky, so we've created this simple guide to help you understand how to use Python to extract tabular data from public HTML tables. The post How to Use Python to Loop Through HTML Tables and Scrape Tabular Data appeared first on Data Science Central.  ( 26 min )

  • Open

    What’s Data-Centric Architecture?
    Data-centric architecture revisits architecture and turns that architecture on its head. Ever since the dawn of client-server computing, applications have been the focus of enterprise IT buyers. The post What’s Data-Centric Architecture? appeared first on Data Science Central.  ( 18 min )
    The AI Vegan – A real use case for NFT/ Blockchain?
    Blockchain has a chequered history. The post The AI Vegan – A real use case for NFT/ Blockchain? appeared first on Data Science Central.  ( 19 min )
    Reaping the Benefits of Having a Data Backup and Recovery Plan
    In today’s data-driven economy, any business needs to make sure that their data is easily recoverable and secured in an emergency. The National Archives and Records Administration suggests that close to 93% of the organizations which witness downtime and data loss for over ten and more days can file bankruptcy in a year. No wonder… Read More »Reaping the Benefits of Having a Data Backup and Recovery Plan The post Reaping the Benefits of Having a Data Backup and Recovery Plan appeared first on Data Science Central.  ( 19 min )
    Data Storytelling: Meshing Narrative Techniques with Data Science Smarts
    Alongside the explosion in enterprise data analytics is the growing realisation that insights, without action, are not enough. The post Data Storytelling: Meshing Narrative Techniques with Data Science Smarts appeared first on Data Science Central.  ( 21 min )
    Reengineer Business Decisions Granularly with Automated Data Collection
    Businesses, whether big or small, know that understanding data is essential to making informed decisions that impact the organization’s bottom line. The post Reengineer Business Decisions Granularly with Automated Data Collection appeared first on Data Science Central.  ( 21 min )
  • Open

    [D] Survey: Opinions on text-to-image AI
    submitted by /u/Ragdoll_X_Furry [link] [comments]  ( 88 min )
    [D] Is there research about sensitivity of models with respect to their hyperparameters?
    Dear students, researchers and practioners, I don't know about you guys but in terms of ML hyperparameter tuning is what "keeps me up at night". Even if your model is convex, closed form finding the (near-)optimal hyperparameters is still always a non-convex problem. Yes you can do a random search search with a large n, cv on each point and then grid around the n best results but I don't always have the time, compute or simply the "mental energy" to do this. This is why I actively avoid neural networks for non-text, vision and even try to wing it with OpenCV if the problem is easy enough. Everything in neural networks is a hyper parameter and for enough datasets there's no clear point you can say "mmkay I'm sure I got everything out of my model" unless you hit 100 % on val, test. Some models on tabular data, e.g. GBDT's, seem not to be sensitive towards hyperparameters unless you have unbalanced data or so. Even better, if I can wing it with LDA/QDA and use 0 hyperparameters I'm happy. Have there been any large scale studies that look at algorithms from this point of view? If not, is this not an interesting domain to research? submitted by /u/Aggressive_Ad_3178 [link] [comments]  ( 91 min )
    [D] Do I need a dedicated graphics card of rtx series for machine learning in long run?
    I m going to buy a new laptop and the only thing that concerns me is whether I am overspending for laptop in GPU. I am doing my BTech in Artificial intelligence. And my task includes running heavy softwares like visual studio, Android studio, matlab, etc. simultaneously. So the question is, do i need to buy a laptop with dedicated GPU for long run? If not, which intel or ryzen processors will do the job? submitted by /u/Relative_Winner_4588 [link] [comments]  ( 92 min )
    [R] Using Emerson AI, augmented by GPT-J (Eleuther) to train Blenderbot 3 about video game level design (virtual assistant) and to illustrate Emerson AI's ideas for a level with Dall-E Mini. The results initially varied but with a little calibration over time the quality of the images improved.
    submitted by /u/swagonflyyyy [link] [comments]  ( 89 min )
    [D] Where does VQ-GAN get its randomness from?
    In VQ-GAN, when sampling new images for unconditional image generation, you run a Transformer to predict the sequence of codebook vectors that define the latent representation of the image, and then you run that latent representation through your decoder to generate an actual image. But does the only randomness come from the sampling of the predicted sequence? If so, I get that the number of possible images to generate is mind shatteringly huge (1024256), but it still seems limited. Like, if you generate a particular image, it seems like you wouldn't be able to generate very similar images that are close enough that they'd still (imperfectly) map to the same codebook entries when run through the first stage encoder. Is there any other noise applied (e.g. slight perturbations to the chosen codebook vectors, Adaptive Instance Normalization in the decoder a la StyleGAN), or does the discrete set of vector sequences really define the full, discrete set of outputs? submitted by /u/say_wot_again [link] [comments]  ( 89 min )
    [D] Accelerate, PyTorch, and Big Model Inference: How Does It All Work?
    submitted by /u/muellerzr12 [link] [comments]  ( 88 min )
    [P] Dual numbers in Python for Automatic Differentiation.
    Hi there, I have set up a basic implementation for Dual Numbers in Python that can be used for automatic differentiation. Here is an example that is also part of the repository: ​ Gradient descent with dual numbers. The implementation is pretty simple and therefore easy to understand. https://github.com/kaifishr/PyDualNumber submitted by /u/neurontwister [link] [comments]  ( 105 min )
    [D] would you consider yourself an artist for using AI arts tools like Dall-e and midjourney?
    I was reading this article. For those of you who got paywalled, a short summary of the controversy is that someone entered one of his results from midjourney to Colorado State Fair’s art competition. He played around with midjourney, and submitted one of the outcomes to the fair with “adequate” credit - Jason Allen via midjourney. (This is also controversial as some judges didn’t know anything about midjourney). He won his division, which is digital art. Obviously, there are some pretty heated arguments on Twitter on this matter among digital artists. Some argue that he’s not an artist that he did bare minimum work. There’s no creativity or leg work to create it. He didn’t write the code. He simply came up with a prompt. Some argue that this is valid, because AI tools are merely tools, and every artists use tools. They think it’s not any different than photoshop. submitted by /u/KimiKimKimKitty [link] [comments]  ( 113 min )
    [N] Introduction to streaming for data scientists
    submitted by /u/fchung [link] [comments]  ( 101 min )
    [D] Most Popular AI Research Aug 2022 - Ranked Based On GitHub Stars
    submitted by /u/cloud_weather [link] [comments]  ( 88 min )
    [P] I made a social media app for generating paintings from your photos using glid-3
    submitted by /u/persianprez [link] [comments]  ( 90 min )
  • Open

    Professional AI whisperers have launched a marketplace for DALL-E prompts
    submitted by /u/firig1965 [link] [comments]  ( 86 min )
    New DeepMind AI Learns Soccer Skills | Boston Dynamics Robotics News
    submitted by /u/kenickh [link] [comments]  ( 90 min )
    AI is getting better at generating porn. We might not be prepared for the consequences
    submitted by /u/magenta_placenta [link] [comments]  ( 87 min )
    Severe case of overfitting in my research
    I'm a MSc student in bioinformatics. What I do is I gather transcriptomic data from many cancer datasets, I conduct some analysis over each dataset sepratly, get important cells and genes as features, and use them in a machine learning model to predict a target variable. The analysis in which I get the cells scores is pretty solid. It is based on the transcriptomic data, and it basically tells me how much is there from each cell type in each sample. In total, I have 38 cell types that I can use as predictive features. For example, CellA gets overall higher scores in responder samples, and a low scores in non-responders. It is informative so I would use it in the model. The aim is to define differences between samples that respond to a therapy (labeled Response) and samples that do not (NoResponse). I tried random forest, gradient boosting machines, XGBoost, logistic regression (with Lasso and Ridge penalties), Kernel SVM, and more. Tree based algorithms are producing AUC = 0.9 in the train set, and AUC = 0.63 in the test set.. something like that. Linear models (logistic regression) is very bad, it has AUC = 0.51 in the test set. I guess they just dont fit my data so I'll use tree based models. I'm using cross validation, I tuned the parameters of each algorithm (like numbers trees, number of nodes...), I tried feature selection, nothing is working. I'm facing an overfitting and it is hurting my brain. What can cause such overfitting? Why is parameter tuning and feature selection not helping at all? could it be that the cells are just not very good predictive features? what do you think please share your thoughts. submitted by /u/ComanConer [link] [comments]  ( 106 min )
    Time Session - Stable Prompts: hourglass with clocks and bright colors inside, 8k hyperdetailed
    submitted by /u/widgia [link] [comments]  ( 87 min )
    got NovelAI
    submitted by /u/roblox22y [link] [comments]  ( 90 min )
  • Open

    Question about baselines
    I am going through Sergey Levine's course on Deep RL and he has the slide below on baselines. As I understand it his derivation wouldn't work because if we use the average reward as in the slide, this quantity is also itself random so it is not ok to just treat it like a constant and therefore there would still be bias (though perhaps not a very large one). Does this make sense? https://preview.redd.it/imgae2882pl91.png?width=4064&format=png&auto=webp&s=0626ad16a24f2bac82fa22cd2c4bc354075f9483 submitted by /u/randomkolmogorov [link] [comments]  ( 99 min )
    Any rl papers on simply maximizing value error during the learning phase for a separate test policy?
    submitted by /u/jms4607 [link] [comments]  ( 95 min )
  • Open

    New DeepMind AI Is Learning Soccer
    submitted by /u/kenickh [link] [comments]  ( 87 min )
    Weird result while training NN
    I am getting weird results while training the neural network. I am training the network using the Jupyter. Here, I would train the network, and would save the model on a file if we get better rewards. The thing is that when I train the NN for the first time, I would not get a good result. But, when I re-run the code the next time on the same Jupyter session, it performs extremely well. However, when I re-run the next time, the performance falls. Then, re-running after one or two times on the same session, it again performs extremely well. Also, every time I run the code by restarting the Jupyter session, I would not get the desired output. So, I wonder what is going on? submitted by /u/Icy_Improvement_5527 [link] [comments]  ( 88 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2022-10-03T01:12:25.444Z osmosfeed 1.15.1